Manually Installing Custom Packages for PySpark
Use the Python package manager, pip
(or pip3
for
PySpark3), to manually install custom packages on each node in your MapR Data Platform cluster. You need administrative access on your
cluster nodes to install the packages.
Procedure
-
Install the package manager using one of the following commands, depending on
your operating system:
-
Install the custom package using the utility you downloaded in the first step above.
The following example installs the
matplotlib
package:sudo pip install matplotlib
sudo pip3 install matplotlib
You must install the package on each node in your MapR cluster where PySpark jobs will run. These are the nodes that contain a YARN NodeManager.
-
To verify successful installs, run the following code snippet in your Zeppelin
UI:
%livy.pyspark import sys print(sys.version) import matplotlib print(matplotlib.__version__)
The code snippet returns output similar to the following:
2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] 2.1.0
3.4.5 (default, May 29 2017, 15:17:55) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] 2.1.0
The minor versions of Python and
matplotlib
may differ depending on the versions you install.