Apache Spark Feature Support

MapR Data Platform supports most Apache Spark features. However, there are some exceptions.

Spark SQL and Apache Derby Support on Spark
If you are using Spark SQL with Derby database without Hive or Hive Metastore installation, you will see the following exception:
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Add the hive-service-2.3.*.jar and log4j2 jars to /opt/mapr/spark/spark-3.x.x/jars location to use Spark SQL with Derby Database without Hive or Hive Metastore installation.

The log4j2 jars are located at /opt/mapr/lib/log4j2/log4j-*.jar location.

Spark 3.1.2 and Spark 3.2.0 does not support log4j1.2 logging on HPE Ezmeral Data Fabric.

Symlink Support on Spark 2.4.4
For full Symlink support on Spark 2.4.4, request the patch. See Applying a Patch.
Spark Thrift JDBC/ODBC Server Support
Running the Spark Thrift JDBC/ODBC Server on a secure cluster is supported only on Spark 2.1.0 or later.
You can run the Spark Thrift JDBC/ODBC Server to enable connections to Hive 1.2.1 using Beeline; however, you can connect only to Hive versions supported by your Spark version.
Spark SQL and Hive Support for Spark 2.1.0
Spark 2.1.0 is able to connect to Hive 2.1 Metastore; however, only features of Hive 1.2 are supported.
Spark SQL and Hive Support for Spark 2.0.1
Spark SQL is supported, but it is not fully compatible with Hive. For details, see the Apache Spark documentation.

The following Hive functions are not supported in Spark SQL:

  • Tables with buckets
  • UNION type
  • Unique join
  • Column statistics collecting
  • Output formats: File format (for CLI), Hadoop Archive
  • Block-level bitmap indexes and virtual columns
  • Automatic determination of the number of reducers for JOIN and GROUP BY
  • Metadata-only query
  • Skew data flag
  • STREAMTABLE hint in JOIN
  • Merging of multiple small files for query results
Spark SQL and Hive Support for Spark 1.6.1
Spark SQL is supported, but it is not fully compatible with Hive. For details, see the Apache Spark documentation. The following Spark SQL operations support the following Hive table formats:
Hive 1.2 Table Format
Spark SQL Operations AVRO ORC Parquet RC default
create Yes Yes Yes Yes Yes
drop Yes Yes Yes Yes Yes
insert into Yes Yes Yes Yes Yes
insert overwrite Yes Yes Yes Yes Yes
select Yes Yes Yes Yes Yes
load data Yes Yes Yes Yes Yes