Spark SQL Thrift Server

Spark SQL Thrift (Spark Thrift) was developed from Apache Hive HiveServer2 and operates like HiveSever2 Thrift server.

Spark Thrift is supported on secure clusters. You can run the Spark Thrift server and connect to Hive versions supported by Spark 2.1.0 and later with Business Intelligence (BI) tools or the Beeline command-line tool.

Starting in the EEP 4.0 release, the Spark Thrift server is available as a separate package. To install this package, see Installing Spark Standalone or Installing Spark on YARN, depending on the type of cluster manager you are installing.

In EEP 3.0, MapR introduces additional security mechanisms for Spark with the Spark Thrift server. MapR-SASL and Kerberos are supported:

  • For JDBC connections into Spark Thrift server
  • Between Spark and Hive metastore

To enable these security mechanisms for the Spark Thrift server, starting in the EEP 4.0 release, for secure clusters, running configure.sh -R configures MapR-SASL security. The script modifies or creates a SPARK_HOME/conf/hive-site.xml file as follows:

  • If Hive is installed in your cluster, the script copies HIVE_HOME/conf/hive-site.xml to SPARK_HOME/conf and modifies the file.
  • If Hive is not installed and you are using MapR-SASL security, the script creates a new SPARK_HOME/conf/hive-site.xml file.
  • Each time the script runs, if there is a pre-existing SPARK_HOME/conf/hive-site.xml file, the script saves a copy of the file in SPARK_HOME/conf/hive-site.xml.old before modifying it.

You can configure security manually by following the steps outlined in sub-topics listed on this page.

To launch the Spark Thrift server, perform the procedures required to configure Apache Spark to use Hive.

IMPORTANT
  • Starting in the EEP 4.0 release, if you start and stop the Spark Thrift server using Warden, the connection port number is 2304. If you start and stop by running the /opt/mapr/spark/<spark-version/sbin/{start,stop}-thriftserver.sh scripts, the port number remains 10000.
  • Starting in the EEP 5.0.4 and EEP 6.3.0 releases, if you start and stop the Spark Thrift server by running the /opt/mapr/spark/<spark-version/sbin/{start,stop}-thriftserver.sh scripts, the port number remains 2304.

Default Behavior

The default behavior of the Spark Thrift server is as follows:

  1. After installation, the Spark Thrift server is started in the local master mode.
  2. If the Spark master package is installed, then Spark Thrift server is started in the standalone master mode.
  3. If the spark.master property is set in the spark-defaults.conf file, then Spark Thrift server uses the master set by this property.

Known Limitations

  • MapR-SASL support is implemented for Spark 2.1.0 and later versions of Spark. For Spark version information, see Component Versions for Released EEPs.
  • The ODBC drivers do not support MAPR-SASL.
  • Username and password authentication through PAM is not supported in EEP 3.0.
  • Spark Thrift server supports only features and commands in Hive 1.2.
  • Although Spark 2.1.0 can connect to Hive 2.1 Metastore, only Hive 1.2 features and commands are supported by Spark 2.1.0.