Spark 2.1.0-1707 Release Notes

This section provides reference information, including new features, patches, and known issues for Spark 2.1.0-1707.

The notes below relate specifically to the MapR Distribution for Apache Hadoop. You may also be interested in the open-source Spark 2.1.0 Release Notes.

Spark Version	2.1.0
Release Date	August 2017
MapR Version Interoperability	See EEP Components and OS Support.
Source on GitHub	https://github.com/mapr/spark
GitHub Release Tag	2.1.0-mapr-1707
Maven Artifacts	https://repository.mapr.com/maven/
Package Names	See Package Names for Ecosystem Packs (EEPs)

NOTE

Full support of MapR Streams is available only on MapR 5.2 and later clusters.
Spark 2.1 can connect to Hive Metastore 2.1. But, features of Hive added after Hive 1.2 are not supported by Spark.
Spark Standalone and Spark on YARN can only run on clusters in MRv2 (YARN) mode. They are not supported on clusters in MRv1 (classic) mode.

Hive Support

This version of Spark supports integration with Hive. However, note the following exceptions:

Hive-on-Spark is not supported.
Spark-SQL is supported, but it is not fully compatible with Hive. For details, see the Apache Spark documentation and the MapR Spark documentation.

New in This Release

Spark 2.1.0-1707 introduces the following enhancement:

Spark on Mesos

Fixes

This MapR release includes the following new fixes since the latest MapR Spark 2.1.0 release. For details, refer to the commit log for this project in GitHub.

GitHub Commit	Date (YYYY-MM-DD)	Comment
40dca4e	2017/07/28	[MAPR-28441] - Fix Spark Streaming's handling of zero offsets from Kafka 0.9
99daf6b	2017/06/30	[SPARK-19182][DSTREAM] Optimize the lock in `StreamingJobProgressListener` to not block UI when generating streaming jobs.
8e3b9ed	2017/06/27	Revert earlier fix for MAPR-25770.
52b06b9	2017/06/23	[MAPR-27845] Fix the manner in which Spark determines Hive’s security configuration.
6917681	2017/06/14	[MAPR-27840] Fix wrong type casting when importing data from Oracle.
17f311e	2017/06/06	[SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities.
e10e660	2017/05/30	Revert [SPARK-16736][CORE][SQL] to avoid superfluous filesystem calls.
d8f8657	2017/05/29	[SPARK-18949][SQL][BACKPORT-2.1] Add `recoverPartitions` API to `Catalog` interface.
7733a1c	2017/05/29	[SPARK-19459][SQL][BRANCH-2.1] Support nested char and varchar fields in ORC.
f31976e	2017/05/22	[MAPR-27519] Improve performance of calculating web UI counters for Kafka-streaming.
6a3b683	2017/05/22	[SPARK-19276][CORE] Expose `FetchFailure` exceptions hidden by user exceptions.
333371c	2017/05/22	[SPARK-19597][CORE] Add a test case for task deserialization errors.
377e2ea	2017/05/22	[SPARK-17931] Eliminate unnecessary task serialization.
0788b14	2017/05/22	[SPARK-18662] Move resource managers to their own sub-directories.
fb5fca1	2017/05/22	Fix Mesos build breakage for Scala 2.10.
e05a1e9	2017/05/22	[SPARK-17062][MESOS] Add conf option to Mesos disfixer.
e48f72e	2017/05/22	[SPARK-18836][CORE] Improve performance by serializing a single copy of the task metrics in the `DAGScheduler`.
807ba4d	2017/05/22	[SPARK-18761][CORE] Introduce a "task reaper" to oversee killing of tasks in executors.
54413bd	2017/05/22	[SPARK-20359][SQL] Avoid unnecessary execution in `EliminateOuterJoin` that can lead to NPE.
c919bdb	2017/05/22	[SPARK-19893][SQL] Avoid running DataFrame set operations on map types.
bd90640	2017/05/22	[SPARK-18863][SQL] Return an error if a subquery's output contains non-aggregate expressions without GROUP BY.
b898e28	2017/05/22	[SPARK-20280][CORE] Fix `FileStatusCache Weigher` to avoid integer overflow.
7c3b1b2	2017/05/22	[SPARK-19748][SQL] Fix refresh of an `InMemoryFileIndex` with `FileStatusCache`. Correct the order of operations.
58f2250	2017/05/22	[SQL] Improve the readability of partition handling code.
b3430f7	2017/05/22	[SPARK-20059][YARN] Use the correct classloader for `HBaseCredentialProvider`.
285be99	2017/05/19	[SPARK-20043][ML] Fix the `DecisionTreeModel` so the `ImpurityCalculator` builder handles uppercase impurity type Gini.
9c60a4d	2017/05/19	[SPARK-20125][SQL] Fix conversion of an Option type to a DataSet, when the Option contains a map type.
6663ca6	2017/05/19	[SPARK-18717][SQL] Fix code generation when mapping to an immutable Scala Map.
d602458	2017/05/19	[SPARK-20086][SQL] Fix `CollapseWindow` so it does not collapse dependent adjacent windows.
4755b36	2017/05/19	[SPARK-19925][SPARKR] Fix SparkR `spark.getSparkFiles` to avoid failures when called on executors.
dac1c4a	2017/05/19	[SPARK-19237][SPARKR][CORE] Fix `spark-submit` on Windows to handle the case where Java is not installed.
f48a43a	2017/05/19	SPARK-20017][SQL] Fix the `str_to_map` and `explode` functions to avoid NPEs. Change the nullability of the `StringToMap` function from false to true.
6e5245d	2017/05/19	[SPARK-19980][SQL][BACKPORT-2.1] Fix DataSet transformations on POJOs to preserve nulls. Add NULL checks in the Bean serializer.
bbd0c4d	2017/05/19	[SPARK-19872] [PYTHON] Fix `UnicodeDecodeError` in PySpark when reading from a text file with repartition. Use the correct deserializer for RDD construction for coalesce and repartition.
20579df	2017/05/19	[SPARK-19887][SQL] Fix handling of dynamic partition keys when persisting tables.
ff91608	2017/05/19	[SPARK-19611][SQL] Fix breakages for Hive tables backed by case sensitive data files. Introduce configurable table schema inference.
e9984d0	2017/05/19	[SPARK-19082][SQL] Fix config option `ignoreCorruptFiles` for Parquet files.
014e909	2017/05/19	[SPARK-19857][YARN] Correct calculation of next credential update time.
15fd019	2017/05/19	[SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BACKPORT-2.1][SQL] Backport cache related fixes from Spark 2.2 to Spark 2.1.
c689b5c	2017/05/19	[SPARK-18703][SPARK-18675][SQL][BACKPORT-2.1] Fix CTAS for Hive serde table so it works for all Hive versions. Drop staging directories and data files that were not dropped until JVM termination.
4cf5e41	2017/05/19	[SPARK-14772][PYTHON][ML] Fix Python ML `Params.copy` method to match Scala implementation.
f8d3846	2017/05/19	[SPARK-19691][SQL][BRANCH-2.1] Fix `ClassCastException` when calculating percentile of decimal column.
5aa4a2d	2017/05/19	[SPARK-19500] [SQL] Fix failure in radix sort when attempting to spill the aggregated hash map.
20806a8	2017/05/19	[SPARK-19399][SPARKR][BACKPORT-2.1] Fix tests broken by the introduction of R `coalesce` API for DataFrame and Column.
a4bedf1	2017/05/19	[SPARK-19399][SPARKR] Add R `coalesce` API for DataFrame and Column.
fc9e7b0	2017/05/19	[SPARK-18788][SPARKR] Add `getNumPartitions` API to SparkR.
642e7bb	2017/05/19	[SPARK-18335][SPARKR] Extend `createDataFrame` to support a `numPartitions` parameter.
4ec94f5	2017/05/19	[SPARK-19342][SPARKR] Fix `collect` method for timestamp columns so it does not incorrectly covert to numeric.
d639208	2017/05/19	[SPARK-19543] Fix `from_json` when the input row is empty.
5948815	2017/05/19	[SPARK-19509][SQL] Fix Grouping Sets to handle nullable grouping columns.
f446906	2017/05/19	[SPARK-19472][SQL] Fix parser error when trying to resolve nested CASE WHEN statement with parenthesis. The statement was mistaken for a function call.
889860a	2017/05/19	[SPARK-19406][SQL] Fix function `to_json` to respect user-provided options.
2370cdf	2017/05/19	[SPARK-19396][DOC] Support case-insensitive JDBC options.
242b33c	2017/05/19	[SPARK-19324][SPARKR] Fix SparkR so it does not remove Spark JVM stdout output.
823d5e8	2017/05/19	[SPARK-19338][SQL] Include UDF names in explain output.
768a10b	2017/05/19	[SPARK-19231][SPARKR] Add error handling for download and untar of Spark releases.
d1f9ed5	2017/05/19	[SPARK-19129][SQL] Disallow ALTER TABLE drop partition with an empty partition value.
98d8d9c	2017/05/19	[SPARK-19180] [SQL] Fix incorrect offset in `OffHeapColumn`.
79ff854	2017/05/19	[SPARK-19092][SQL][BACKPORT-2.1] Fix `save()` API in the `DataFrameWriter` to avoid a scan of all the saved files.
c44f274	2017/05/19	[SPARK-19130][SPARKR] Support setting columns to implicit literal values in SparkR.
148167b	2017/05/16	[MAPR-26414] Fix Spark History Server memory leak.
1a9b364	2017/05/15	Update dependencies after ECO-1703 release.
3554f31	2017/05/04	[SPARK-33] Fix streaming example.
3ae224b	2017/04/28	[SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked `collections.namedtuple`. Port cloudpickle changes needed for PySpark to work with Python 3.6.0.
4584170	2017/04/28	[SPARK-19146][CORE] Drop more elements when `stageData.taskData.size` > retainedTasks.
a259c8e	2017/04/28	[MAPR-26287] Remove unnecessary code from `hadoop-version-picker.sh.`
c0c94e5	2017/04/28	[MAPR-26414] Fix Spark History Server memory leak.

Known Issues

MAPR-17271: On secure clusters, the MapR Control System (MCS) does not display links for Spark-Master and Spark-HistoryServer.
Spark versions up to and including 2.3.0 have the following security vulnerability: CVE-2018-1334 Apache Spark local privilege escalation vulnerability

Resolved Issues

None.