Using Node Labels to Schedule YARN Applications

About this task

To set up node labels for the purpose of scheduling YARN applications (including MapReduce applications) on a specific node or group of nodes:

Procedure

  1. Create a text file and specify the labels you want to use for the nodes in your cluster. In this example, the file is named node.labels.
  2. Copy the file to a location on MapR filesystem where it will not be modified or deleted, such as /var/mapr.
    hadoop fs -put ~/node.labels /var/mapr
  3. Edit yarn-site.xml on all ResourceManager nodes and set the node.labels.file parameter and the optional node.labels.monitor.interval parameter as shown:
    <property>
       <name>node.labels.file</name>
       <value>/var/mapr/node.labels</value>
       <description>The path to the node labels file.</description>
    </property>
    
    <property>
       <name>node.labels.monitor.interval</name>
       <value>120000</value>
       <description>Interval for checking the labels file for updates (default is 120000 ms)</description>
    </property>
  4. For this and subsequent changes to take effect, issue either of the following commands to manually tell the ResourceManager to reload the node labels file:
    • For any YARN applications, including MapReduce applications, enter yarn rmadmin -refreshLabels
    • For MapReduce applications, enter mapred job -refreshLabels
  5. Verify that labels are implemented correctly by running either of the following commands:
    yarn rmadmin -showLabels
    mapred job -showlabels 

Results

The following flowchart summarizes these steps. In addition, the flowchart introduces the concept of queue labels for the Fair Scheduler and the Capacity Scheduler.