Home
6.0 Platform
Provides a brief overview of what the MapR data platform is and how you can use it to solve enterprise business needs.
MapR-DB
MapR-DB is an enterprise-grade, high-performance, NoSQL database management system. You can use it for real-time, operational analytics capabilities.
Architecture
MapR-DB is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and wide column data models.
Cluster Scalability
Information about and location of tables (and files) is not tracked directly, but through MapR-FS containers by the CLDB. Because this architecture keeps the CLDB size small, it becomes practical to store 10s of exabytes in a MapR cluster, regardless of the number of tables and files.

MapR 6.0 Documentation

6.0 Platform
Provides a brief overview of what the MapR data platform is and how you can use it to solve enterprise business needs.
- MapR-XD
- MapR-DB
  MapR-DB is an enterprise-grade, high-performance, NoSQL database management system. You can use it for real-time, operational analytics capabilities.
  - Architecture
    MapR-DB is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and wide column data models.
    - MapR-DB and MapR-FS
      This topic describes how MapR-DB tables are implemented directly in the MapR file system which allows MapR-DB to leverages the same architecture as the rest of the MapR platform which provides minimal additional management.
    - Cluster Scalability
      Information about and location of tables (and files) is not tracked directly, but through MapR-FS containers by the CLDB. Because this architecture keeps the CLDB size small, it becomes practical to store 10s of exabytes in a MapR cluster, regardless of the number of tables and files.
    - High Availability
      Because of the way updates to table regions (also called tablets) are applied and replicated, data in table regions are instantly available. Tables and table regions are part of abstract entities called containers that provide the automatic replication of table regions (with a default of three) across the nodes of a cluster.
    - Multi-Tenancy
      Since MapR-DB tables are created in volumes, when you restrict the volume, you also restrict the table data. If a volume is restricted to a subset of a cluster's nodes, then it allows you order to isolate sensitive data or applications, and even use heterogeneous hardware in the cluster for specific workloads.
    - Snapshots
      Since MapR-DB tables are created in volumes, you can use a volume snapshot to capture the state of a volume's directories, MapR-DB tables, and files at an exact point in time.
    - Mirroring and Replication
      Since MapR-DB tables are created in volumes, mirroring of volumes lets you automatically replicate differential data in real-time across clusters. You might want mirror volumes to create disaster recovery solutions for databases or to provide read-only access to data from multiple locations.
    - OJAI Distributed Query Service
      OJAI queries either directly access MapR-DB JSON or leverage the OJAI Distributed Query Service. The OJAI Distributed Query Service provides distributed query support for MapR-DB JSON, powered by Apache Drill. The MapR client automatically determines whether OJAI queries benefit from using the OJAI Distributed Query Service, when the service is available. This section describes the architecture, including the code paths and components involved. It also discusses queries that originate from Drill SQL, which leverage the full functionality of MapR Drill.
  - Data Models
    MapR-DB can be used as both a document database and a wide-column database. As a document database, JSON documents are stored in MapR-DB JSON table. As a wide-column database, binary files are in stored MapR-DB binary tables.
  - Secondary Indexes
    Beginning with MapR 6.0, MapR-DB JSON natively supports secondary indexes on fields in JSON tables. Indexes provide you with flexible, high performance access to data stored in MapR-DB.
  - Change Data Capture
    The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR-DB tables (JSON or binary) and propagate them to a MapR-ES topic.
  - Table Replication
    Data in one table can be replicated to another table that is in the same cluster or in a separate cluster. This type of replication is in addition to the automatic replication that occurs with table regions within a volume.
  - Gateways for Indexing MapR-DB Data in Elasticsearch
    As of with MapR 6.0, MapR-DB Elastic Search integration capability is deprecated and no longer available in the MapR-DB product.
- MapR-ES
  MapR-ES brings integrated publish and subscribe messaging to the MapR Converged Data Platform.
- Cluster Management
- Security
- YARN
- Client Connections

Cluster Scalability

Information about and location of tables (and files) is not tracked directly, but through MapR-FS containers by the CLDB. Because this architecture keeps the CLDB size small, it becomes practical to store 10s of exabytes in a MapR cluster, regardless of the number of tables and files.

The location of containers in a cluster is tracked by that cluster's container location database (CLDB). CLDBs are updated only when a container is moved, a node fails, or as a result of periodic block change reports. The update rate, even for very large clusters, is therefore relatively low. The MapR file system does not have to query the CLDB often, so it can cache container locations for very long times.

Moreover, CLDBs are very small in comparison to Apache Hadoop namenodes. Namenodes track metadata and block information for all files, and they track locations for all blocks in every file. Because blocks are typically 200 MB or less in size, the total number of objects that a namenode tracks is very large. CLDBs, however, track containers, which are much larger objects, so the size of the location information can be 100 to 1000 times smaller than the location information in a namenode. CLDBs also do not track information about tables and files.