YARN

YARN is a resource-management and scheduling framework that distributes resource-management and job-management duties. YARN assigns the resource-management and job-management duties as follows:
  • ResourceManager: manages cluster resources and tracks resource usage and node health.
  • ApplicationMaster: a framework-specific process that negotiates resources for a single application (a single job or a directed acyclic graph of jobs), which runs in the first container allocated for the application.
  • A YARN component called the HistoryServer archives job metrics and metadata. Status on completed applications is available via REST APIs.

The ResourceManager allocates resources among all the applications running the cluster. The ResourceManager includes a pluggable scheduler, which is responsible for allocating resources according to the resource requirements of the running applications. Current MapReduce schedulers, including the Capacity Scheduler and the Fair Scheduler, can be plugged into the YARN scheduler directly.

Label-based scheduling provides job placement control on a multi-tenant Hadoop cluster. Administrators can control exactly which nodes are chosen to run jobs submitted by different users and groups. An administrator assigns node labels in a text file, then composes queue labels or job labels based on the node labels. When users run jobs, they can place them on specified nodes on a per-job basis (using a job label) or on a per-queue level (using a queue label).

The ResourceManager caches the mapping file, and checks every two minutes (the default monitoring period) for updates. If the file has been modified, the ResourceManager updates the labels for all active ApplicationMasters immediately.

Each application runs an ApplicationMaster to negotiate resources from the ResourceManager. The ApplicationMaster works with the NodeManagers to execute and monitor tasks. The duties of the ApplicationMaster are divided as follows:
  • NodeManager: One instance runs on each node, to manage that node's resources.
  • Container: An abstraction representing a unit of resources on a node.

The NodeManager provides containers to an application. The ResourceManager and the NodeManager provide the system for distributed management of applications and resources.