Enabling High Availability for Spark Thrift Server
With MEPs 5.0.4 or 6.3.0 and later, you can enable high availability for the Spark Thrift Server. Unlike a HiveServer2 high-availability configuration, Spark has no concept of active-active instances. However, after configuration, you can can use Beeline to connect to the Spark Thrift Server on each node.
To enable high availability, use the following steps:
- Install Spark Thrift Server on all the cluster nodes where it is needed:
- On Ubuntu
-
apt-get install mapr-spark-thriftserver
- On Red Hat / CentOS
-
yum install mapr-spark-thriftserver
- On SLES
-
zypper install mapr-spark-thriftserver
- Add the following properties to the
/opt/mapr/spark/spark-<spark_version>/conf/hive-site.xml
file on all the nodes where the Spark Thrift Server is installed
For example:<property> <name>hive.zookeeper.quorum</name> <value><zk_host1_>,<zk_host_2>,…,<zk_host_n></value> </property> <property> <name>hive.zookeeper.client.port</name> <value><zk_port></value> </property> <property> <name>hive.server2.support.dynamic.service.discovery</name> <value>true</value> </property> <property> <name>hive.server2.zookeeper.namespace</name> <value><zk_namespace></value> </property>
<property> <name>hive.zookeeper.quorum</name> <value>node1.cluster.com,node2.cluster.com,node3.cluster.com</value> </property> <property> <name>hive.zookeeper.client.port</name> <value>5181</value> </property> <property> <name>hive.server2.support.dynamic.service.discovery</name> <value>true</value> </property> <property> <name>hive.server2.zookeeper.namespace</name> <value>ts2-ts2</value> </property>
NOTE The values that you provide for thehive.server2.zookeeper.namespace
property should be different for thehive-site.xml
in the Spark and Hive directories. - Restart the Spark Thrift Server to apply the changes following the script in the
.sbin
directory at/opt/mapr/spark/spark-<spark_version>/
or by running amaprcli
command on all configured nodes:
or./sbin/stop-thriftserver.sh ./sbin/start-thriftserver.sh
maprcli node services -nodes <host_1>,<host_2>,<host_n> -name spark-thriftserver -action restart
- Launch the Zookeeper command line interface, and check the Spark Thriftserver znode by
running the following
commands:
For example:/opt/mapr/zookeeper/zookeeper-<version>/bin/zkCli.sh -server <ip:port of zookeeper instance> ls /<hive.server2.zookeeper.namespace>
/opt/mapr/zookeeper/zookeeper-3.4.11/bin/zkCli.sh -server node1.cluster.com:5181 ls /ts2-ts2 [serverUri=node1.cluster.com:2304;version=;sequence=0000000000]
- Using Beeline, you can connect to the Spark Thrift Server by using the following
string:
For example:beeline> !connect jdbc:hive2://<hostname -f>:5181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=<hive.server2.zookeeper.namespace>;
./bin/beeline Warning: Unable to determine $DRILL_HOME Beeline version 1.2.0-mapr-spark-MEP-6.0.0-1912 by Apache Hive beeline> !connect jdbc:hive2://node1.cluster.com:5181/default;ssl=true;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=ts2-ts2;auth=maprsasl; Connecting to jdbc:hive2://node1.cluster.com:5181/default;ssl=true;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=ts2-ts2;auth=maprsasl; 20/03/29 21:38:19 WARN MaprSaslClient: SASL Server qopProperty: auth-confis different from Client: auth-conf,auth-int,auth.Using Server one Connected to: Spark SQL (version 2.4.4.0-mapr-630) Driver: Hive JDBC (version 1.2.0-mapr-spark-MEP-6.0.0-1912) Transaction isolation: TRANSACTION_REPEATABLE_READ 1: jdbc:hive2://node1.cluster.com:5181/defaul> show databases; +-----------------+ | databaseName | +-----------------+ | default | +-----------------+ 1 row selected (0.11 seconds)
NOTE High availability for the Spark Thrift Server can be used in conjunction with
HiveServer2 high availability. For more information about HiveServer2 high availability, see
Enabling High Availability for Hive.