Architecture

MapR Database is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and key-value data models.

Why use MapR Database?

  • Integrated analytics with SQL: MapR Database's integration with Drill for MapR provides a low latency, distributed, SQL query engine for large-scale datasets, including structured and semi-structured, nested data.
  • Operational analytics: MapR Database can run in the same cluster as Apache™ Hadoop® and Apache Spark, letting you immediately analyze or process live, interactive data. This also enables you to eliminate data silos to speed the data-to-action cycle, providing a more efficient data architecture.
  • Global distribution of applications: Application access to MapR Database tables is distributable on a global scale.
  • Flexible data model: You can use MapR Database as both a document database and a column-oriented database. As a document database, MapR Database stores JSON documents in JSON tables. As a column-oriented database, it stores binary files in binary tables.

How is MapR Database Related to MapR XD Distributed File and Object Store?

MapR Database implements tables within the framework of the MapR filesystem. MapR Database creates tables (both binary and JSON tables) in logical units called volumes.



What are MapR Database's Architectural Advantages?

MapR Database's architecture has the following advantages:

  • It reduces process overhead because it has no extra layers to pass through when performing operations on data.

    MapR Database, like several other NoSQL databases, is a log-based database. MapR Database runs inside of the MapR filesystem process, which enables it to read from and write to disks directly. In contrast, other NoSQL databases must communicate with a separate process to performs disk reads and writes. The approach taken by MapR Database eliminates extra process hops, duplicate caching, and needless abstractions, with the consequence of optimizing I/O operations on your data.

  • It minimizes compaction delays because it avoids I/O storms when it merges logged operations with structures on disk.

    As a log-based database, MapR Database must write logged operations to disk. MapR Database stores table regions (also called tablets) and smaller structures within them partially as b-trees. Together with write-ahead logs (WAL), these b-trees comprise log-structured-merge trees. Write-ahead logs for the smaller structures within regions are periodically restructured by rolling merge operations on the b-trees. As MapR Database performs these merges at small scales, applications running against MapR Database see no significant effects on latency while the merges are taking place.
    NOTE Apache HBase also uses the term regions.

What Design Factors are Important when Using MapR Database?

  • Rowkey Optimization: The design of a table's rowkeys affects the speed at which client applications can access data. It also impacts database performance if hotspotting occurs. The better the design, the faster the data access. See Table Rowkey Design for more information.
  • Column Family Optimization: Column families enable you to group related sets of data and restrict queries to a defined subset, leading to better performance. When you design a column family, think about what kinds of queries you are going to use most often, and group your columns accordingly. See Column Families in JSON Tables and Column Families in Binary Tables for more information.
  • Replication Implementation: The design of table replication (in addition to the automatic replication that occurs with table regions within a volume) depends on your desired outcome and the complexity of your environment. See Table Replication for more information.
  • Security Implementation: You can implement security at various levels including for table replication, JSON documents, and general access. Determining what level and where is part of the architectural design. See Security on JSON Tables, and Security.