Installing Spark Standalone
This topic includes instructions for using package managers to download and install Spark Standalone from the MEP repository.
Prerequisites
About this task
Package | Description |
---|---|
mapr-spark | Install this package on Spark worker nodes. This package is dependent on the mapr-client package. |
mapr-spark-master | Install this package on Spark master nodes. Spark master nodes must be able to communicate with Spark worker nodes over SSH without using passwords. This package is dependent on the mapr-spark and the mapr-core packages. |
mapr-spark-historyserver | Install this optional package on Spark History Server nodes. This package is dependent on the mapr-spark and mapr-core packages. |
mapr-spark-thriftserver |
Install this optional package on Spark Thrift Server nodes. This package is available starting in the MEP 4.0 release. It is dependent on the mapr-spark and mapr-core packages. |
root
or using
sudo
.Procedure
-
Create the
/apps/spark
directory on MapR file system, and set the correct permissions on the directory.hadoop fs -mkdir /apps/spark hadoop fs -chmod 777 /apps/spark
-
Use the appropriate commands for your operating system to install Spark.
- On CentOS / RedHat
-
yum install mapr-spark mapr-spark-master mapr-spark-historyserver mapr-spark-thriftserver
- On Ubuntu
-
apt-get install mapr-spark mapr-spark-master mapr-spark-historyserver mapr-spark-thriftserver
- On SUSE
-
zypper install mapr-spark mapr-spark-master mapr-spark-historyserver mapr-spark-thriftserver
NOTE: Themapr-spark-historyserver
andmapr-spark-thriftserver
packages are optional.Spark is installed into the
/opt/mapr/spark
directory. -
Copy the
/opt/mapr/spark/spark-<version>/conf/slaves.template
into/opt/mapr/spark/spark-<version>/conf/slaves
, and add the hostnames of the Spark worker nodes. Put one worker node hostname on each line. For example:localhost worker-node-1 worker-node-2
-
Set up passwordless ssh for the mapr user such that the Spark master node has
access to all slave nodes defined in the
conf/slaves
file. -
As the mapr user, start the worker nodes by running the following command in the
master node. Since the Master daemon is managed by the Warden daemon, do not use the
start-all.sh
orstop-all.sh
command./opt/mapr/spark/spark-<version>/sbin/start-slaves.sh
-
If you want to integrate Spark with MapR-ES, install the Streams Client on
each Spark node:
- On Ubuntu:
apt-get install mapr-kafka
- On RedHat/CentOS:
yum install mapr-kafka
- On Ubuntu:
-
If you want to use a Streaming Producer, add the
spark-streaming-kafka-producer_2.11.jar
from the MapR Maven repository to the Spark classpath (/opt/mapr/spark/spar-<versions>/jars/
). - After installing Spark Standalone and before running your Spark jobs, follow the steps outlined at Configuring Spark Standalone.