Installing Spark on YARN

This topic describes how to use package managers to download and install Spark on YARN from the MEP repository.

To set up the MEP repository, see Step 10: Install Ecosystem Components Manually.

Spark is distributed as three separate packages:

Package Description
mapr-spark Install this package on any node where you want to install Spark. This package is dependent on the mapr-client, mapr-hadoop-client, mapr-hadoop-util, and mapr-librdkafka packages.
mapr-spark-historyserver Install this optional package on Spark History Server nodes. This package is dependent on the mapr-spark and mapr-core packages.
mapr-spark-thriftserver

Install this optional package on Spark Thrift Server nodes. This package is available starting in the MEP 4.0 release. It is dependent on the mapr‍-‍spark and mapr‍-‍core packages.

To install Spark on YARN (Hadoop 2), execute the following commands as root or using sudo:

  1. Verify that JDK 11 or later is installed on the node where you want to install Spark.
  2. Create the /apps/spark directory on the cluster filesystem, and set the correct permissions on the directory:
    hadoop fs -mkdir /apps/spark
    hadoop fs -chmod 777 /apps/spark
    Note: Beginning with MEP 6.2.0, the configure.sh script creates the /apps/spark directory automatically.
  3. Install the packages:
    On Ubuntu
    apt-get install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
    On CentOS 8.x / Red Hat 8.x
    dnf install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
    On SUSE
    zypper install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
    Note: The mapr-spark-historyserver and mapr-spark-thriftserver packages are optional.
  4. If you want to integrate Spark with HPE Ezmeral Data Fabric Event Store, install the Streams Client on each Spark node:
    • On Ubuntu:
       apt-get install mapr-kafka
    • On CentOS / Red Hat:
      yum install mapr-kafka
  5. If you want to use a Streaming Producer, add the spark-streaming-kafka-producer_2.12.jar from the data-fabric Maven repository to the Spark classpath (/opt/mapr/spark/spark-<versions>/jars/).
    For repository-specific information, see Maven Artifacts for the HPE Ezmeral Data Fabric
  6. After installing Spark on YARN but before running your Spark jobs, follow the steps outlined at Configuring Spark on YARN.