Select Services
In a typical cluster, most nodes are dedicated to data processing and storage, and a smaller number of nodes run services that provide cluster coordination and management. Some applications run on cluster nodes and others run on client nodes that can communicate with the cluster.
The services that you choose to run on each node will likely evolve over the life of the cluster. Services can be added and removed over time.
The following table shows some of the services that can be run on a node.
Service Category | Service |
Description |
---|---|---|
Management | Warden |
Warden runs on every node, coordinating the node's contribution to the cluster. |
MapReduce | TaskTracker |
Hadoop TaskTracker starts and tracks MapReduce tasks on a node. The TaskTracker service receives task assignments from the JobTracker service and manages task execution. |
MapReduce | NodeManager | Hadoop YARN NodeManager service. The NodeManager manages node resources and monitors the health of the node. It works with the ResourceManager to manage YARN containers that run on the node. |
Storage | FileServer |
FileServer is the MapR service that manages disk storage for MapR-FS and MapR-DB on each node. |
Storage | CLDB |
Maintains the container location database (CLDB) service. The CLDB service coordinates data storage services among MapR-FS FileServer nodes, MapR NFS gateways, and MapR clients. |
Storage | NFS |
Provides read-write MapR Direct Access NFS™ access to the cluster, with full support for concurrent read and write access. |
Storage | MapR HBase Client |
Provides access to MapR-DB binary tables via HBase APIs. Required on all nodes that will access table data in MapR-FS, typically all TaskTracker nodes and edge nodes for accessing table data. |
Management | JobTracker |
Hadoop JobTracker service. The JobTracker service coordinates the execution of MapReduce jobs by assigning tasks to TaskTracker nodes and monitoring task execution. |
Management | ResourceManager | Hadoop YARN ResourceManager service. The ResourceManager manages cluster resources, and tracks resource usage and node health. |
Management | ZooKeeper |
Enables high availability (HA) and fault tolerance for MapR clusters by providing coordination. |
Management | HistoryServer | Archives MapReduce job metrics and metadata. |
Management | HBase Master |
The HBase master service manages the region servers that make up HBase table storage. NOTE: This service is only needed for Apache HBase. Your cluster supports MapR-DB without this service.
|
Management | Web Server |
Runs the MapR Control System. |
Management | Metrics |
Provides optional real-time analytics data on cluster and job performance through the Analyzing Job Metrics interface. If used, the Metrics service is required on all JobTracker and Web Server nodes. |
Application | Hue | Hue is Hadoop user interface that interacts with Apache Hadoop and its ecosystem components, such as Hive, Pig, and Oozie. |
Application | HBase Region Server |
HBase region server is used with the HBase Master service and provides storage for an individual HBase region. NOTE: This service is only needed for Apache HBase. Your cluster supports MapR-DB without this service.
|
Application | Pig |
Pig is a high-level data-flow language and execution framework. |
Application | Hive |
Hive is a data warehouse that supports SQL-like ad hoc querying and data summarization. |
Application | Flume |
Flume is a service for aggregating large amounts of log data |
Application | Oozie |
Oozie is a workflow scheduler system for managing Hadoop jobs. |
Application | HCatalog |
HCatalog aggregates HBase data. |
Application | Cascading |
Cascading is an application framework for analyzing and managing big data. |
Application | Mahout |
Mahout is a set of scalable machine-learning libraries that analyze user behavior. |
Application | Myriad | Myriad is an application framework that enables YARN applications and Mesos frameworks to run side-by-side while dynamically sharing cluster resources. |
Application | Spark | Spark is an processing engine for large datasets. |
Application | Sqoop |
Sqoop is a tool for transferring bulk data between Hadoop and relational databases. |
MapR is a complete Hadoop distribution, but not all services are required. Every Hadoop installation requires services to manage jobs and applications. JobTracker and TaskTracker manage MapReduce v1 jobs. ResourceManager and NodeManager manage MapReduce v2 and other applications that can run on YARN. In addition, MapR requires the ZooKeeper service to coordinate the cluster, and at least one node must run the CLDB service. The WebServer service is required if the browser-based MapR Control System will be used.
MapR Hadoop includes tested versions of the services listed here. MapR provides a more robust, read-write storage system based on volumes and containers. MapR data nodes typically run FileServer, TaskTracker, and NodeManager. Do not plan to use packages from other sources in place of the MapR distribution.
When using Myriad, the ResourceManager is deployed using Marathon and NodeManager is run as a Mesos task.