HPE Ezmeral Data Fabric 6.1.x is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About MapR 6.1
This site contains the main documentation for Version 6.1 of the MapR Converged Data Platform, including installation, configuration, administration, and reference information.
6.1 Installation
This section contains information about installing and upgrading MapR software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a MapR cluster.
6.1 MapR Data Platform
MapR Data Platform is the industry-leading data platform for AI and analytics that solves enterprise business needs.
6.1 Administration
This section describes how to manage the nodes and services that make up a cluster.
- Administering Users and Clusters
  Lists topics that help manage a MapR cluster.
- Administering Nodes
  Provides a synopsis of managing nodes in a cluster.
- Administering Volumes
  This section provide information about how to organize and manage data using volumes, a unique feature of MapR clusters.
- Administering Files and Directories
- Administering Tables
  Administration of the MapR Database is done primarily via the command line (maprcli) or with the Managed Control System (MCS). Regardless of whether the MapR Database table is used for binary files or JSON documents, the same types of commands are used with slightly different parameter options. MapR Database administration is associated with tables, columns and column families, and table regions.
- Administering Streams
- Administering MapR Gateways
  A MapR gateway mediates one-way communication between a source MapR cluster and a destination cluster. You can replicate MapR Database tables (binary and JSON) and MapR Event Store For Apache Kafka streams. MapR gateways also apply updates from JSON tables to their secondary indexes and propagate Change Data Capture (CDC) logs.
- Administering Services
- Monitoring the Cluster
  This section describes how to monitor the health and performance of a MapR cluster.
- Configuring Security
  Describes how to configure security and manage secure clusters.
- Managing Secure Clusters
  Provides procedures that will enable you to use MapR clusters securely.
- Administering the MapR Data Access Gateway
  The MapR Data Access Gateway is a service that acts as a proxy and gateway for translating requests between lightweight client applications and the MapR cluster. This section describes considerations when upgrading the service, how to modify configuration settings, and how to administer and manage the service.
- Planning for High Availability
- Administrator's Reference
  This section contains in-depth reference information for the administrator.
  - maprcli and REST API Syntax
    This section provides information about the MapR command API. Most commands can be run on the command-line interface (CLI), or by making REST requests programmatically or in a browser.
  - Utilities
    Contains information about various scripts and utilities, that help setup, maintain, and monitor clusters.
  - Configuration Files
    This section contains reference information about various configuration files.
  - Alarms Reference
    The pages in this section provide details about all of the types of alarms.
  - MapR Data Platform Environment
    This section provides information associated with the MapR Data Platform environment.
    - MapR Data Platform Parameters
      Describes MapR Data Platform parameters and their default values.
    - Default MapR Data Platform Configurations
      Lists sources from which default MapR Data Platform configuration parameters are derived.
    - Environment Variables
      Describes the environment variables specific to the MapR Data Platform.
    - Ports Used by MapR Data Platform Software
      Lists the ports used by Data Fabric services.
    - Log Files
      Lists the log files for each MapR Data Platform component.
      - Increasing Log Retention
      - Setting the Tracing Level
      - Configuring Profiling for Operations
      - Archiving CLDB Logs
      - Enabling Runtime Logging
      - Viewing Audit Logs
      - Viewing Application Logs
        Logging Options
        YARN Log Aggregation
        Enabling YARN Log Aggregation
        Enabling YARN Local-Node Log Aggregation
        Viewing Logs for Completed Applications
        Editing the Retention Settings of Aggregated Logs
        Centralized Logging
        Describes the centralized logging feature of MapR Data Platform.
      - Viewing the Service Log
        Esplains how to view service logs using Kibana.
    - Cluster Maintenance Schedule
      Lists a sample maintenance schedule for the cluster.
    - Language Support for MapR Database Tables
      This section lists the human languages that MapR Data Platform tables can store, retrieve, and process.
  - Sample JSON File for Metering
    A sample metering JSON file for an 8-node cluster with no workloads enabled.
  - Metering Data Descriptions
    This table lists the metrics collected by the metering feature.
- Troubleshooting Cluster Administration
  Lists the common errors and their solutions.
- Best Practices for Backing Up MapR Information
  Lists the best practices and performance considerations to follow when backing up MapR information.
6.1 Development
This section contains information related to application development for Ezmeral ecosystem components and MapR Data Platform products, including the file system, Database (Key-Value and JSON), and Event Streams.
Other Docs
This section contains release-independent information, including: MapR Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other MapR version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Enabling YARN Log Aggregation

To enable YARN log aggregation, add or edit the following properties in yarn-site.xml:

Set the value of the yarn.log-aggregation-enable to true.

Configure the yarn.log.server.url property to contain the URL of the YARN HistoryServer, which should look like the following:

secure cluster	`https://<historyserver-host>:19890/jobhistory/logs`
non-secure cluster	`http://<historyserver-host>:19888/jobhistory/logs`

Optional: Set the yarn.nodemanager.remote-app-log-dir value to a location in the MapR Data Platform file system. By default, the location is maprfs:///tmp/logs.
Optional: Set the yarn.nodemanager.remote-app-log-dir-suffix value to the name of the folder that should contain the logs for each user. By default, the folder name is logs.

On a non-secure cluster, you must also add the following property to /opt/mapr/hadoop/hadoop-2.x/etc/hadoop/yarn-env.sh on the Node Manager nodes:

export MAPR_IMPERSONATION_ENABLED=1

Afterwards, restart Node Manager services. This setting enables impersonation for Node Manager processes so that log files can be created with the correct user ownership.

Aggregated logs are owned by the user who runs the job. For example, if user admin runs a job, the logs are stored to maprfs:///tmp/logs/admin. If user analyst runs a job, the logs are stored to maprfs:///tmp/logs/analyst. If these two users do not share the same UNIX group, they will be unable to see each other's logs.

NOTE If centralized logging and YARN log aggregation are enabled, the logs for MapReduce version 2 applications are managed by Centralized Logging while the logs for non-MapReduce applications are managed by YARN log aggregation.