Select Services
This section describes some of the services that can be run on a node.
Every installation requires services to manage jobs and applications. ResourceManager and NodeManager manage MapReduce version 2 and other applications that can run on YARN. In addition, MapR requires the ZooKeeper service to coordinate the cluster, and at least one node must run the CLDB service. The WebServer service is required if the browser-based Control System will be used.
After you install MapR core, you can install ecosystem components that belong to a Ezmeral Ecosystem Pack (EEP). A EEP provides a set of ecosystem components that work together. When a newer version or a revision to a component becomes available, the EEP version is updated to reflect the fact that an update was made. For details on the ecosystem components available in each EEP and the list of EEPs supported by your MapR cluster version, see Ezmeral Ecosystem Packs (EEPs).
The following table shows some of the services that can be run on a node:
Service Category | Service |
Description |
---|---|---|
Management | Warden |
Warden runs on every node, coordinating the node's contribution to the cluster. Warden is also responsible for managing the service state and its resource allocations on that node. |
YARN | NodeManager | Hadoop YARN NodeManager service. The NodeManager manages node resources and monitors the health of the node. It works with the ResourceManager to manage YARN containers that run on the node. |
MapR Core | FileServer |
FileServer is the MapR service that manages disk storage for MapR File System and MapR Database on each node. |
MapR Core | CLDB |
Maintains the container location database (CLDB) (CLDB) service. The CLDB service coordinates data storage services among MapR File System file server nodes, and access across MapR NFS gateways, and MapR clients. |
MapR Core | NFS |
Provides read-write MapR Direct Access NFS™ access to the cluster, with full support for concurrent read and write access. |
Storage | MapR HBase Client |
Provides access to MapR Database binary tables via HBase APIs. Required on all nodes that will access table data in MapR File System, typically all edge nodes for accessing table data. HBase API can also be accessed through the HBase Thrift and Rest Gateways. |
YARN | ResourceManager | Hadoop YARN ResourceManager service. The ResourceManager manages cluster resources, and tracks resource usage and node health. |
Management | ZooKeeper |
Internal service. Enables high availability (HA) and fault tolerance for MapR clusters by providing coordination. |
YARN | HistoryServer | Archives MapReduce application metrics and metadata. |
Management | Web Server |
Contains static Control System user interface pages. |
Management | Apiserver | Allows you to perform cluster administration programmatically, and supports the Control System (see Setting Up the Control System). |
OJAI Distributed Query Service | Drill |
Provides the distributed query service powered by Apache Drill for
MapR Database JSON. Supports the following
functionality:
|
Application | Hue | Hue is the Hadoop User Interface that interacts with Apache Hadoop and its ecosystem components, such as Hive, Pig, and Oozie. It also provides interactive notebook access to Spark through Livy. |
Application | Pig |
Pig is a high-level data-flow language and execution framework. |
Application | Hive |
Hive is a data warehouse engine that supports SQL-like adhoc querying and data summarization. |
Application | Flume |
Flume is a service for piping and aggregating large amounts of log data |
Application | Oozie |
Oozie is a workflow scheduler system for managing Hadoop jobs. |
Application | HCatalog |
HCatalog provides applications with a table view of the MapR File System layer of the cluster, expanding your options from read/write data streams to add-[Hive]-table operations such as get row and store row. |
Application | Cascading |
Cascading is an application framework for analyzing and managing big data. |
Application | Myriad | Myriad is an application
framework that enables YARN applications and Mesos frameworks to run side-by-side
while dynamically sharing cluster resources. When using Myriad, the ResourceManager is deployed using Marathon, and NodeManager is run as a Mesos task. |
Application | Spark | Spark is a processing engine for large datasets. While it can be deployed locally or standalone, the recommended deployment is on YARN. The application timeline server component provides a historical view of query details. |
Application | Sqoop |
Sqoop is a library for transferring bulk data between Hadoop and relational databases. |