Spark 1.6.1-1607 Release Notes

The notes below relate specifically to the MapR Converged Data Platform. You may also be interested in the open source Spark 1.6.1 Release Notes

Spark Version 1.6.1
Release Date July 29, 2016
MapR Version Interoperability See the Ecosystem Support Matrix (Pre-5.2 releases) and Spark Support Matrix.
Source on GitHub https://github.com/mapr/spark/tree/1.6.1-mapr-1607
Package Names The following packages are associated with this release:
  • mapr-spark-1.6.1.201607242143-1.noarch.rpm
  • mapr-spark_1.6.1.201607242143_all.deb
  • mapr-spark-historyserver-1.6.1.201607242143-1.noarch.rpm
  • mapr-spark-historyserver_1.6.1.201607242143_all.deb
  • mapr-spark-master-1.6.1.201607242143-1.noarch.rpm
  • mapr-spark-master_1.6.1.201607242143_all.deb

New in This Release

This release of Apache Spark includes the following behavior change that is specific to MapR:

Poll Time for Consuming HPE Ezmeral Data Fabric Streams
When Spark consumes HPE Ezmeral Data Fabric Streams messages, the default poll time is 1000 milliseconds. Previously, the default was 100 milliseconds.

Important Notes

If you want to integrate Spark 1.6.1-1607 with HPE Ezmeral Data Fabric Streams, you must install the Kafka 0.9.0-1607 package.

Fixes

This release by MapR includes the following fixes on the base Apache release. For complete details, refer to the commit log for this project in GitHub.

GitHub Commit Date (YYYY-MM-DD) Comment
c0bb193 2016-06-08 MAPR-23559: Spark now stores PID files in the following directory: /opt/mapr/pid
42d163f 2016-06-08 MAPR-22541: Spark now adds the working directory to the CLASSPATH.
4d048420 2016-06-24 MAPR-23612: Spark no longer hangs due to an incorrect offset configuration for HPE Ezmeral Data Fabric Streams.
941e206 2016-06-30 MAPR-23122: Spark Streaming uses streams.consumer.zerooffset.on.eof to calculate the offset for HPE Ezmeral Data Fabric Streams.
25621e4 2016-07-04 MAPR-22940: Spark Thrift Server is now able to start on a node where Hive is not running. However, when HiveServer2 uses Kerberos authentication, the Spark Thrift Server must run on the same node as HiveServer2. Otherwise, beeline will not be able to connect to the Spark Thirft Server.
2a3abdb 2016-07-13 MAPR-23854: Spark is now able to retrieve messages from HPE Ezmeral Data Fabric Streams.
6d8d5d6 2016-07-19 MAPR-24011: Backported SPARK-14699 and SPARK-13352 to improve Spark performance.
4df099e 2016-07-19 MAPR-24005: Backported Spark 14699 so that Spark standalone Pi jobs no longer generate "Executor lost" errors.

Known Issues

  • MAPR-17271: On secure clusters, the MapR Control System (MCS) does not display links for Spark-Master and Spark-HistoryServer.
  • MAPR-19761: On a secure cluster, MapR does not support the Spark SQL Thrift JDBC server. When the cluster is secure, the Spark Thrift server will not start.
  • Spark versions up to and including 2.3.0 have the following security vulnerability: CVE-2018-1334 Apache Spark local privilege escalation vulnerability