Spark 2.1.0-1707 Release Notes
This section provides reference information, including new features, patches, and known issues for Spark 2.1.0-1707.
The notes below relate specifically to the MapR Distribution for Apache Hadoop. You may also be interested in the open-source Spark 2.1.0 Release Notes.
Spark Version | 2.1.0 |
Release Date | August 2017 |
MapR Version Interoperability | See EEP Components and OS Support. |
Source on GitHub | https://github.com/mapr/spark |
GitHub Release Tag | 2.1.0-mapr-1707 |
Maven Artifacts | https://repository.mapr.com/maven/ |
Package Names | See Package Names for Ecosystem Packs (EEPs) |
NOTE
- Full support of MapR Streams is available only on MapR 5.2 and later clusters.
- Spark 2.1 can connect to Hive Metastore 2.1. But, features of Hive added after Hive 1.2 are not supported by Spark.
- Spark Standalone and Spark on YARN can only run on clusters in MRv2 (YARN) mode. They are not supported on clusters in MRv1 (classic) mode.
Hive Support
This version of Spark supports integration with Hive. However, note the following exceptions:
- Hive-on-Spark is not supported.
- Spark-SQL is supported, but it is not fully compatible with Hive. For details, see the Apache Spark documentation and the MapR Spark documentation.
New in This Release
Spark 2.1.0-1707 introduces the following enhancement:
- Spark on Mesos
Fixes
This MapR release includes the following new fixes since the latest MapR Spark 2.1.0 release. For details, refer to the commit log for this project in GitHub.
GitHub Commit | Date (YYYY-MM-DD) | Comment |
---|---|---|
40dca4e | 2017/07/28 | [MAPR-28441] - Fix Spark Streaming's handling of zero offsets from Kafka 0.9 |
99daf6b | 2017/06/30 | [SPARK-19182][DSTREAM] Optimize the lock in
StreamingJobProgressListener to not block UI when generating
streaming jobs. |
8e3b9ed | 2017/06/27 | Revert earlier fix for MAPR-25770. |
52b06b9 | 2017/06/23 | [MAPR-27845] Fix the manner in which Spark determines Hive’s security configuration. |
6917681 | 2017/06/14 | [MAPR-27840] Fix wrong type casting when importing data from Oracle. |
17f311e | 2017/06/06 | [SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities. |
e10e660 | 2017/05/30 | Revert [SPARK-16736][CORE][SQL] to avoid superfluous filesystem calls. |
d8f8657 | 2017/05/29 | [SPARK-18949][SQL][BACKPORT-2.1] Add recoverPartitions API to
Catalog interface. |
7733a1c | 2017/05/29 | [SPARK-19459][SQL][BRANCH-2.1] Support nested char and varchar fields in ORC. |
f31976e | 2017/05/22 | [MAPR-27519] Improve performance of calculating web UI counters for Kafka-streaming. |
6a3b683 | 2017/05/22 | [SPARK-19276][CORE] Expose FetchFailure exceptions hidden by
user exceptions. |
333371c | 2017/05/22 | [SPARK-19597][CORE] Add a test case for task deserialization errors. |
377e2ea | 2017/05/22 | [SPARK-17931] Eliminate unnecessary task serialization. |
0788b14 | 2017/05/22 | [SPARK-18662] Move resource managers to their own sub-directories. |
fb5fca1 | 2017/05/22 | Fix Mesos build breakage for Scala 2.10. |
e05a1e9 | 2017/05/22 | [SPARK-17062][MESOS] Add conf option to Mesos disfixer. |
e48f72e | 2017/05/22 | [SPARK-18836][CORE] Improve performance by serializing a single copy of the
task metrics in the DAGScheduler . |
807ba4d | 2017/05/22 | [SPARK-18761][CORE] Introduce a "task reaper" to oversee killing of tasks in executors. |
54413bd | 2017/05/22 | [SPARK-20359][SQL] Avoid unnecessary execution in
EliminateOuterJoin that can lead to NPE. |
c919bdb | 2017/05/22 | [SPARK-19893][SQL] Avoid running DataFrame set operations on map types. |
bd90640 | 2017/05/22 | [SPARK-18863][SQL] Return an error if a subquery's output contains non-aggregate expressions without GROUP BY. |
b898e28 | 2017/05/22 | [SPARK-20280][CORE] Fix FileStatusCache Weigher to avoid
integer overflow. |
7c3b1b2 | 2017/05/22 | [SPARK-19748][SQL] Fix refresh of an InMemoryFileIndex with
FileStatusCache . Correct the order of operations. |
58f2250 | 2017/05/22 | [SQL] Improve the readability of partition handling code. |
b3430f7 | 2017/05/22 | [SPARK-20059][YARN] Use the correct classloader for
HBaseCredentialProvider . |
285be99 | 2017/05/19 | [SPARK-20043][ML] Fix the DecisionTreeModel so the
ImpurityCalculator builder handles uppercase impurity type
Gini. |
9c60a4d | 2017/05/19 | [SPARK-20125][SQL] Fix conversion of an Option type to a DataSet, when the Option contains a map type. |
6663ca6 | 2017/05/19 | [SPARK-18717][SQL] Fix code generation when mapping to an immutable Scala Map. |
d602458 | 2017/05/19 | [SPARK-20086][SQL] Fix CollapseWindow so it does not collapse
dependent adjacent windows. |
4755b36 | 2017/05/19 | [SPARK-19925][SPARKR] Fix SparkR spark.getSparkFiles to avoid
failures when called on executors. |
dac1c4a | 2017/05/19 | [SPARK-19237][SPARKR][CORE] Fix spark-submit on Windows to
handle the case where Java is not installed. |
f48a43a | 2017/05/19 | SPARK-20017][SQL] Fix the str_to_map and
explode functions to avoid NPEs. Change the nullability of the
StringToMap function from false to true. |
6e5245d | 2017/05/19 | [SPARK-19980][SQL][BACKPORT-2.1] Fix DataSet transformations on POJOs to preserve nulls. Add NULL checks in the Bean serializer. |
bbd0c4d | 2017/05/19 | [SPARK-19872] [PYTHON] Fix UnicodeDecodeError in PySpark when
reading from a text file with repartition. Use the correct deserializer for RDD
construction for coalesce and repartition. |
20579df | 2017/05/19 | [SPARK-19887][SQL] Fix handling of dynamic partition keys when persisting tables. |
ff91608 | 2017/05/19 | [SPARK-19611][SQL] Fix breakages for Hive tables backed by case sensitive data files. Introduce configurable table schema inference. |
e9984d0 | 2017/05/19 | [SPARK-19082][SQL] Fix config option ignoreCorruptFiles for
Parquet files. |
014e909 | 2017/05/19 | [SPARK-19857][YARN] Correct calculation of next credential update time. |
15fd019 | 2017/05/19 | [SPARK-19765][SPARK-18549][SPARK-19093][SPARK-19736][BACKPORT-2.1][SQL] Backport cache related fixes from Spark 2.2 to Spark 2.1. |
c689b5c | 2017/05/19 | [SPARK-18703][SPARK-18675][SQL][BACKPORT-2.1] Fix CTAS for Hive serde table so it works for all Hive versions. Drop staging directories and data files that were not dropped until JVM termination. |
4cf5e41 | 2017/05/19 | [SPARK-14772][PYTHON][ML] Fix Python ML Params.copy method to
match Scala implementation. |
f8d3846 | 2017/05/19 | [SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when
calculating percentile of decimal column. |
5aa4a2d | 2017/05/19 | [SPARK-19500] [SQL] Fix failure in radix sort when attempting to spill the aggregated hash map. |
20806a8 | 2017/05/19 | [SPARK-19399][SPARKR][BACKPORT-2.1] Fix tests broken by the introduction of R
coalesce API for DataFrame and Column. |
a4bedf1 | 2017/05/19 | [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and
Column. |
fc9e7b0 | 2017/05/19 | [SPARK-18788][SPARKR] Add getNumPartitions API to
SparkR. |
642e7bb | 2017/05/19 | [SPARK-18335][SPARKR] Extend createDataFrame to support a
numPartitions parameter. |
4ec94f5 | 2017/05/19 | [SPARK-19342][SPARKR] Fix collect method for timestamp columns
so it does not incorrectly covert to numeric. |
d639208 | 2017/05/19 | [SPARK-19543] Fix from_json when the input row is
empty. |
5948815 | 2017/05/19 | [SPARK-19509][SQL] Fix Grouping Sets to handle nullable grouping columns. |
f446906 | 2017/05/19 | [SPARK-19472][SQL] Fix parser error when trying to resolve nested CASE WHEN statement with parenthesis. The statement was mistaken for a function call. |
889860a | 2017/05/19 | [SPARK-19406][SQL] Fix function to_json to respect
user-provided options. |
2370cdf | 2017/05/19 | [SPARK-19396][DOC] Support case-insensitive JDBC options. |
242b33c | 2017/05/19 | [SPARK-19324][SPARKR] Fix SparkR so it does not remove Spark JVM stdout output. |
823d5e8 | 2017/05/19 | [SPARK-19338][SQL] Include UDF names in explain output. |
768a10b | 2017/05/19 | [SPARK-19231][SPARKR] Add error handling for download and untar of Spark releases. |
d1f9ed5 | 2017/05/19 | [SPARK-19129][SQL] Disallow ALTER TABLE drop partition with an empty partition value. |
98d8d9c | 2017/05/19 | [SPARK-19180] [SQL] Fix incorrect offset in
OffHeapColumn . |
79ff854 | 2017/05/19 | [SPARK-19092][SQL][BACKPORT-2.1] Fix save() API in the
DataFrameWriter to avoid a scan of all the saved files. |
c44f274 | 2017/05/19 | [SPARK-19130][SPARKR] Support setting columns to implicit literal values in SparkR. |
148167b | 2017/05/16 | [MAPR-26414] Fix Spark History Server memory leak. |
1a9b364 | 2017/05/15 | Update dependencies after ECO-1703 release. |
3554f31 | 2017/05/04 | [SPARK-33] Fix streaming example. |
3ae224b | 2017/04/28 | [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked
collections.namedtuple . Port cloudpickle changes needed for
PySpark to work with Python 3.6.0. |
4584170 | 2017/04/28 | [SPARK-19146][CORE] Drop more elements when
stageData.taskData.size > retainedTasks. |
a259c8e | 2017/04/28 | [MAPR-26287] Remove unnecessary code from
hadoop-version-picker.sh. |
c0c94e5 | 2017/04/28 | [MAPR-26414] Fix Spark History Server memory leak. |
Known Issues
- MAPR-17271: On secure clusters, the MapR Control System (MCS) does not display links for Spark-Master and Spark-HistoryServer.
- Spark versions up to and including 2.3.0 have the following security vulnerability: CVE-2018-1334 Apache Spark local privilege escalation vulnerability
Resolved Issues
None.