Home
6.0 Platform
Provides a brief overview of what the MapR data platform is and how you can use it to solve enterprise business needs.
MapR-DB
MapR-DB is an enterprise-grade, high-performance, NoSQL database management system. You can use it for real-time, operational analytics capabilities.
Architecture
MapR-DB is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and wide column data models.
MapR-DB and MapR-FS
This topic describes how MapR-DB tables are implemented directly in the MapR file system which allows MapR-DB to leverages the same architecture as the rest of the MapR platform which provides minimal additional management.

MapR 6.0 Documentation

6.0 Platform
Provides a brief overview of what the MapR data platform is and how you can use it to solve enterprise business needs.
- MapR-XD
- MapR-DB
  MapR-DB is an enterprise-grade, high-performance, NoSQL database management system. You can use it for real-time, operational analytics capabilities.
  - Architecture
    MapR-DB is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and wide column data models.
    - MapR-DB and MapR-FS
      This topic describes how MapR-DB tables are implemented directly in the MapR file system which allows MapR-DB to leverages the same architecture as the rest of the MapR platform which provides minimal additional management.
    - Cluster Scalability
      Information about and location of tables (and files) is not tracked directly, but through MapR-FS containers by the CLDB. Because this architecture keeps the CLDB size small, it becomes practical to store 10s of exabytes in a MapR cluster, regardless of the number of tables and files.
    - High Availability
      Because of the way updates to table regions (also called tablets) are applied and replicated, data in table regions are instantly available. Tables and table regions are part of abstract entities called containers that provide the automatic replication of table regions (with a default of three) across the nodes of a cluster.
    - Multi-Tenancy
      Since MapR-DB tables are created in volumes, when you restrict the volume, you also restrict the table data. If a volume is restricted to a subset of a cluster's nodes, then it allows you order to isolate sensitive data or applications, and even use heterogeneous hardware in the cluster for specific workloads.
    - Snapshots
      Since MapR-DB tables are created in volumes, you can use a volume snapshot to capture the state of a volume's directories, MapR-DB tables, and files at an exact point in time.
    - Mirroring and Replication
      Since MapR-DB tables are created in volumes, mirroring of volumes lets you automatically replicate differential data in real-time across clusters. You might want mirror volumes to create disaster recovery solutions for databases or to provide read-only access to data from multiple locations.
    - OJAI Distributed Query Service
      OJAI queries either directly access MapR-DB JSON or leverage the OJAI Distributed Query Service. The OJAI Distributed Query Service provides distributed query support for MapR-DB JSON, powered by Apache Drill. The MapR client automatically determines whether OJAI queries benefit from using the OJAI Distributed Query Service, when the service is available. This section describes the architecture, including the code paths and components involved. It also discusses queries that originate from Drill SQL, which leverage the full functionality of MapR Drill.
  - Data Models
    MapR-DB can be used as both a document database and a wide-column database. As a document database, JSON documents are stored in MapR-DB JSON table. As a wide-column database, binary files are in stored MapR-DB binary tables.
  - Secondary Indexes
    Beginning with MapR 6.0, MapR-DB JSON natively supports secondary indexes on fields in JSON tables. Indexes provide you with flexible, high performance access to data stored in MapR-DB.
  - Change Data Capture
    The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR-DB tables (JSON or binary) and propagate them to a MapR-ES topic.
  - Table Replication
    Data in one table can be replicated to another table that is in the same cluster or in a separate cluster. This type of replication is in addition to the automatic replication that occurs with table regions within a volume.
  - Gateways for Indexing MapR-DB Data in Elasticsearch
    As of with MapR 6.0, MapR-DB Elastic Search integration capability is deprecated and no longer available in the MapR-DB product.
- MapR-ES
  MapR-ES brings integrated publish and subscribe messaging to the MapR Converged Data Platform.
- Cluster Management
- Security
- YARN
- Client Connections

MapR-DB and MapR-FS

This topic describes how MapR-DB tables are implemented directly in the MapR file system which allows MapR-DB to leverages the same architecture as the rest of the MapR platform which provides minimal additional management.

MapR-DB tables are created in logical units called volumes.
MapR-DB tables are sharded by implementing table regions (also called tablets)
Table regions are stored in abstract entities called data containers.
Data containers belong to MapR-FS volumes.

Tables and Volumes

Because volumes are a management entity that logically organizes a cluster’s data, they can be used to enforce disk usage limits, set replication levels, define snapshots and mirrors, and establish ownership and accountability.

Volumes do not have a fixed size and they do not occupy disk space until MapR file system writes data to a container within the volume. A large volume may contain anywhere from 50-100 million containers.

Because tables are stored in containers and implemented in volumes, the following capabilities can be leveraged:

Multi-Tenancy
Snapshots
Mirroring and Replication

Table Regions and Containers

Each region of a table, along with its corresponding write-ahead log (WAL) files, b-trees, and other associated structures, is stored in one container. Each container (which can be from 16 to 32 GB in size) can store more than one region (which default in size to 4096 MB). The recommended practice is to use the default size for region and allow them to be split automatically. Massive regions can affect synchronization of containers and load balancing across a cluster. Smaller regions spread data better across more nodes.

NOTE: Since a container always belongs to exactly one volume, that container’s replicas all belong to the same volume as well.

The following are important advantages to storing table regions in containers:

Cluster Scalability
High Data Availability

For more information about containers, see Containers and the CLDB.

(Topic last modified: 2018-07-03)