Spark 1.6.1-1607 Release Notes

The notes below relate specifically to the MapR Converged Data Platform. You may also be interested in the open source Spark 1.6.1 Release Notes

Spark Version	1.6.1
Release Date	July 29, 2016
MapR Version Interoperability	See the Ecosystem Support Matrix (Pre-5.2 releases) and Spark Support Matrix.
Source on GitHub	https://github.com/mapr/spark/tree/1.6.1-mapr-1607
Package Names	The following packages are associated with this release: mapr-spark-1.6.1.201607242143-1.noarch.rpm mapr-spark_1.6.1.201607242143_all.deb mapr-spark-historyserver-1.6.1.201607242143-1.noarch.rpm mapr-spark-historyserver_1.6.1.201607242143_all.deb mapr-spark-master-1.6.1.201607242143-1.noarch.rpm mapr-spark-master_1.6.1.201607242143_all.deb

New in This Release

This release of Apache Spark includes the following behavior change that is specific to MapR:

Poll Time for Consuming HPE Ezmeral Data Fabric Streams: When Spark consumes HPE Ezmeral Data Fabric Streams messages, the default poll time is 1000 milliseconds. Previously, the default was 100 milliseconds.

Important Notes

If you want to integrate Spark 1.6.1-1607 with HPE Ezmeral Data Fabric Streams, you must install the Kafka 0.9.0-1607 package.

Fixes

This release by MapR includes the following fixes on the base Apache release. For complete details, refer to the commit log for this project in GitHub.

GitHub Commit	Date (YYYY-MM-DD)	Comment
c0bb193	2016-06-08	MAPR-23559: Spark now stores PID files in the following directory: `/opt/mapr/pid`
42d163f	2016-06-08	MAPR-22541: Spark now adds the working directory to the CLASSPATH.
4d048420	2016-06-24	MAPR-23612: Spark no longer hangs due to an incorrect offset configuration for HPE Ezmeral Data Fabric Streams.
941e206	2016-06-30	MAPR-23122: Spark Streaming uses `streams.consumer.zerooffset.on.eof` to calculate the offset for HPE Ezmeral Data Fabric Streams.
25621e4	2016-07-04	MAPR-22940: Spark Thrift Server is now able to start on a node where Hive is not running. However, when HiveServer2 uses Kerberos authentication, the Spark Thrift Server must run on the same node as HiveServer2. Otherwise, beeline will not be able to connect to the Spark Thirft Server.
2a3abdb	2016-07-13	MAPR-23854: Spark is now able to retrieve messages from HPE Ezmeral Data Fabric Streams.
6d8d5d6	2016-07-19	MAPR-24011: Backported SPARK-14699 and SPARK-13352 to improve Spark performance.
4df099e	2016-07-19	MAPR-24005: Backported Spark 14699 so that Spark standalone Pi jobs no longer generate "Executor lost" errors.

Known Issues

MAPR-17271: On secure clusters, the MapR Control System (MCS) does not display links for Spark-Master and Spark-HistoryServer.
MAPR-19761: On a secure cluster, MapR does not support the Spark SQL Thrift JDBC server. When the cluster is secure, the Spark Thrift server will not start.
Spark versions up to and including 2.3.0 have the following security vulnerability: CVE-2018-1334 Apache Spark local privilege escalation vulnerability