HPE Ezmeral Data Fabric 6.1.x is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About MapR 6.1
This site contains the main documentation for Version 6.1 of the MapR Converged Data Platform, including installation, configuration, administration, and reference information.
6.1 Installation
This section contains information about installing and upgrading MapR software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a MapR cluster.
6.1 MapR Data Platform
MapR Data Platform is the industry-leading data platform for AI and analytics that solves enterprise business needs.
6.1 Administration
This section describes how to manage the nodes and services that make up a cluster.
- Administering Users and Clusters
  Lists topics that help manage a MapR cluster.
- Administering Nodes
  Provides a synopsis of managing nodes in a cluster.
- Administering Volumes
  This section provide information about how to organize and manage data using volumes, a unique feature of MapR clusters.
- Administering Files and Directories
- Administering Tables
  Administration of the MapR Database is done primarily via the command line (maprcli) or with the Managed Control System (MCS). Regardless of whether the MapR Database table is used for binary files or JSON documents, the same types of commands are used with slightly different parameter options. MapR Database administration is associated with tables, columns and column families, and table regions.
- Administering Streams
- Administering MapR Gateways
  A MapR gateway mediates one-way communication between a source MapR cluster and a destination cluster. You can replicate MapR Database tables (binary and JSON) and MapR Event Store For Apache Kafka streams. MapR gateways also apply updates from JSON tables to their secondary indexes and propagate Change Data Capture (CDC) logs.
  - Configuring Gateways for Table and Stream Replication
    Configuring gateways involves installing the mapr-gateway package on nodes on a MapR destination cluster and then configuring the MapR source cluster to communicate with the destination cluster. The MapR source cluster is configured by specifying the destination cluster's CLDB node and gateway nodes.
  - Managing Gateways
    Describes the commands for listing gateways, checking status of gateways, managing gateways if they fail, and troubleshooting gateways.
- Administering Services
- Monitoring the Cluster
  This section describes how to monitor the health and performance of a MapR cluster.
- Configuring Security
  Describes how to configure security and manage secure clusters.
- Managing Secure Clusters
  Provides procedures that will enable you to use MapR clusters securely.
- Administering the MapR Data Access Gateway
  The MapR Data Access Gateway is a service that acts as a proxy and gateway for translating requests between lightweight client applications and the MapR cluster. This section describes considerations when upgrading the service, how to modify configuration settings, and how to administer and manage the service.
- Planning for High Availability
- Administrator's Reference
  This section contains in-depth reference information for the administrator.
- Troubleshooting Cluster Administration
  Lists the common errors and their solutions.
- Best Practices for Backing Up MapR Information
  Lists the best practices and performance considerations to follow when backing up MapR information.
6.1 Development
This section contains information related to application development for Ezmeral ecosystem components and MapR Data Platform products, including the file system, Database (Key-Value and JSON), and Event Streams.
Other Docs
This section contains release-independent information, including: MapR Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other MapR version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Administering MapR Gateways

A MapR gateway mediates one-way communication between a source MapR cluster and a destination cluster. You can replicate MapR Database tables (binary and JSON) and MapR Event Store For Apache Kafka streams. MapR gateways also apply updates from JSON tables to their secondary indexes and propagate Change Data Capture (CDC) logs.

The initial task for setting up your gateways is to decide where you want to put them:

If you are going to replicate MapR Database tables, see Gateways for Replicating MapR Database Tables.
If you are going to replicate streams, see Gateways and Stream Replication.
If your MapR Database JSON tables have secondary indexes, see Preparing Clusters for Querying using Secondary Indexes on JSON Tables.
If you are using CDC, see Getting Started with CDC.

NOTE Gateways perform negligible disk I/O and use negligible amounts of memory, though gateways require significant CPU usage.

However, the resource that gateways use the most is network bytes. For example, if the peak network throughput for puts is about 40 MB per second per node, in a 10-node source cluster the peak network throughput will be about 400 MB per second. So, the aggregate network throughput required on the nodes running gateways will be 400 MB per second for both incoming and outgoing traffic. The aggregate network throughput for a 50 node cluster would be 2GB per second.

For another example, in the following diagram there are two source clusters of three nodes each and the clusters are replicating to one destination cluster. The peak traffic on the gateways will be 40MB per second per cluster node, which means that these gateways together will experience a peak network load of 240MB per second.

Although the load is balanced across the two gateways, so that each gateway experiences a peak network load of 120MB per second, each gateway should be able to tolerate the full aggregate network load in case the other gateway fails unexpectedly.