Integrate Spark with HBase
Integrate Spark with HBase or MapR Database when you want to run Spark jobs on HBase or MapR Database tables.
About this task
Procedure
-
Configure the HBase version in the
/opt/mapr/spark/spark-<version>/mapr-util/compatibility.version
file:
The HBase version depends on the current EEP and MapR version that you are running.hbase_versions=<version>
-
If you want to create HBase tables with Spark, add the following property to
hbase-site.xml
:<property> hbase.table.sanity.checks</name> <value>false</value> </property>
-
On each Spark node, copy the
hbase-site.xml
to the{SPARK_HOME}/conf/
directory.TIP Starting in the EEP 7.0.0 release, you do not have to complete step 3. Runningconfigure.sh
copies thehbase-site.xml
file to the Spark directory automatically. -
Specify the
hbase-site.xml
file in theSPARK_HOME/conf/spark-defaults.conf
file:spark.yarn.dist.files SPARK_HOME/conf/hbase-site.xml
-
To verify the integration, complete the following steps: