Spark 2.0.1-1703 Release Notes

The notes below relate specifically to the MapR Distribution for Apache Hadoop. You may also be interested in the open-source Spark 2.0.1 Release Notes.

Spark Version	2.0.1
Release Date	April 2017
MapR Version Interoperability	See EEP Components and OS Support.
Source on GitHub	https://github.com/mapr/spark
GitHub Release Tag	2.1.0-mapr-1703
Maven Artifacts	https://repository.mapr.com/maven/
Package Names	See Package Names for Ecosystem Packs (EEPs)
API Changes for this Version	See Spark API Changes.

NOTE For some important Spark limitations, See "Known Issues and Limitations" later in this release note.

New in This Release

This version of Spark supports integration with Hive. However, note the following exceptions:

Hive-on-Spark is not supported.
Spark-SQL is supported, but it is not fully compatible with Hive. For details, see the Apache Spark documentation and the MapR Spark documentation.

Fixes

This MapR release includes the following new fixes since the latest MapR Spark release. In addition, Spark 2.0.1-1703 includes backports of all the fixes contained in Apache Spark 2.0.2. For details, refer to the commit log for this project in GitHub.

GitHub Commit Number	Date (YYYY-MM-DD)	MapR Fix Number and Description
b5fdf9e	2017/03/01	Merge pull request #94 from mapr/mapr-26289-spark-2.0.1.
f75cad8	2017/03/01	Set default poll timeout to 120s.
1cf7251	2017/03/01	Added include-kafka-09 profile to Assembly.
c9c6030	2017/02/24	[MAPR-26060] Fixed case when mapr-streams make gaps in offsets (#91).
36debc8	2017/02/09	Merge pull request #89 from mapr/mapr-26076-spark-2.0.1.
ed262d0	2017/02/09	[SPARK-15844][CORE] HistoryServer doesn't come up if spark.authenticate = true.
674f9bd	2017/02/08	Merge pull request #86 from mapr/spark-2.0.2-porting.
529e51b	2017/02/08	Fixed version for Kafka 0.10 SQL.
e680ec2	2017/02/06	[SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest.
a68148e	2017/02/06	[SPARK-18125][SQL][BRANCH-2.0] Fix a compilation error in codegen due to splitExpression.
316f706	2017/02/06	[SPARK-17849][SQL] Fix NPE problem when using grouping sets.
01f3743	2017/02/06	[SPARK-17693][SQL][BACKPORT-2.0] Fixed Insert Failure To Data Source Tables when the Schema has the Comment Field.
a996282	2017/02/06	[SPARK-17981][SPARK-17957][SQL][BACKPORT-2.0] Fix Incorrect Nullability Setting to False in FilterExec.
6d9dee4	2017/02/06	[SPARK-18189][SQL][FOLLOWUP] Move test from ReplSuite to prevent java.lang.ClassCircularityError.
cdd189c	2017/02/06	[SPARK-17337][SPARK-16804][SQL][BRANCH-2.0] Backport subquery related PRs.
681a839	2017/02/06	[SPARK-18200][GRAPHX][FOLLOW-UP] Support zero as an initial capacity in OpenHashSet.
cb68e70	2017/02/06	[SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet.
42d7574	2017/02/06	[SPARK-18111][SQL] Wrong approximate quantile answer when multiple records have the minimum value(for branch 2.0).
95aeff9	2017/02/06	[SPARK-18160][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode.
37fcf10	2017/02/06	[SPARK-16796][WEB UI] Mask spark.authenticate.secret on Spark environ.
b1723aa	2017/02/06	[SPARK-18133][BRANCH-2.0][EXAMPLES][ML] Python ML Pipeline Exampl.
a7be955	2017/02/06	[SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent.
724a6e3	2017/02/06	[SPARK-18114][HOTFIX] Fix line-too-long style error from backport of SPARK-18114.
2f1aaa1	2017/02/06	[SPARK-18148][SQL] Misleading Error Message for Aggregation Without Window/GroupBy.
992d65f	2017/02/06	[SPARK-18189][SQL] Fix serialization issue in KeyValueGroupedDataset.
f481615	2017/02/06	[SPARK-18114][MESOS] Fix mesos cluster scheduler generage command option error.
07d3ffe	2017/02/06	[SPARK-18030][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files.
5250480	2017/02/06	[SPARK-18143][SQL] Ignore Structured Streaming event logs to avoid breaking history server (branch 2.0).
bdf4511	2017/02/06	[SPARK-16312][FOLLOW-UP][STREAMING][KAFKA][DOC] Add java code snippet for Kafka 0.10 integration doc.
ecd62ed	2017/02/06	[SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception.
6cab38c	2017/02/06	[SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection".
19d27ad	2017/02/06	[SPARK-17813][SQL][KAFKA] Maximum data per trigger.
6c079b9	2017/02/06	[SPARK-18132] Fix checkstyle.
9c149f4	2017/02/06	[SPARK-18009][SQL] Fix ClassCastException while calling toLocalIterator() on dataframe produced by RunnableCommand.
597b754	2017/02/06	[SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes.
38745a9	2017/02/06	[SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL (branch 2.0).
aa8c453	2017/02/06	[SPARK-18104][DOC] Don't build KafkaSource doc.
6f62a53	2017/02/06	[SPARK-18063][SQL] Failed to infer constraints over multiple aliases.
a031493	2017/02/06	[SPARK-16304] LinkageError should not crash Spark executor.
3b01f41	2017/02/06	[SPARK-17733][SQL] InferFiltersFromConstraints rule never terminates for query.
67484f3	2017/02/06	[SPARK-18022][SQL] java.lang.NullPointerException instead of real exception when saving DF to MySQL.
0002f56	2017/02/06	[SPARK-16988][SPARK SHELL] spark history server log needs to be fixed to show https url when ssl is enabled.
b50e511	2017/02/06	[SPARK-18070][SQL] binary operator should not consider nullability when comparing input types.
be401c8	2017/02/06	[SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance.
c03b30f	2017/02/06	[SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch.
86e6db7	2017/02/06	[SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing.
62ecfdd	2017/02/06	[SPARK-18058][SQL] [BRANCH-2.0]Comparing column types ignoring Nullability in Union and SetOperation.
7d291d4	2017/02/06	[SPARKR][BRANCH-2.0] R merge API doc and example fix.
38c59da	2017/02/06	[SPARK-17123][SQL][BRANCH-2.0] Use type-widened encoder for DataFrame for set operations.
453a44c	2017/02/06	[SPARK-17698][SQL] Join predicates should not contain filter clauses.
0ed97fe	2017/02/06	[SPARK-17986][ML] SQLTransformer should remove temporary tables.
1ac5708	2017/02/06	[SPARK-16606][MINOR] Tiny follow-up to , to correct more instances of the same log message typo.
8049e1d	2017/02/06	[STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches.
1b55321	2017/02/06	[SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream.
f1fc622	2017/02/06	[SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset.
a922ca4	2017/02/06	[SPARK-17926][SQL][STREAMING] Added json for statuses.
290ac5b	2017/02/06	[SPARK-17811] SparkR cannot parallelize data.frame with NA or NULL in Date columns.
a94a716	2017/02/06	[SPARK-18034] Upgrade to MiMa 0.1.11 to fix flakiness
1db928e	2017/02/06	[SPARKR] fix warnings
bbd260f	2017/02/06	[SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD.
c4816ab	2017/02/06	[SPARK-18003][SPARK CORE] Fix bug of RDD zipWithIndex & zipWithUniqueId index value overflowing.
9c22c9d	2017/02/06	[SPARK-17989][SQL] Check ascendingOrder type in sort_array function rather than throwing ClassCastException.
ae60c75	2017/02/06	[SPARK-18001][DOCUMENT] fix broke link to SparkDataFrame.
f2b58bf	2017/02/06	[SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error.
003b20c	2017/02/06	[SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryListener APIs for branch-2.0.
9ad2ee7	2017/02/06	[SPARK-17841][STREAMING][KAFKA] drain commitQueue.
efcc529	2017/02/06	[MINOR][DOC] Add more built-in sources in sql-programming-guide.md.
edbe6a6	2017/02/06	[SPARK-17711] Compress rolled executor log.
28d9c60	2017/02/06	[SPARK-17751][SQL][BACKPORT-2.0] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException.
b8b951a	2017/02/06	[SQL][STREAMING][TEST] Follow up to remove Option.contains for Scala 2.10 compatibility.
78e5c84	2017/02/06	[SQL][STREAMING][TEST] Fix flaky tests in StreamingQueryListenerSuite.
3fbcb1f	2017/02/06	[SPARK-17731][SQL][STREAMING] Metrics for structured streaming for branch-2.0.
1a14c88	2017/02/06	Fix example of tf_idf with minDocFreq.
1bf46c0	2017/02/06	[SPARK-17892][SQL][2.0] Do Not Optimize Query in CTAS More Than Once #15048.
ea7ccbe	2017/02/06	[MINOR][SQL] Add prettyName for current_database function.
e627ac0	2017/02/06	[SPARK-17819][SQL][BRANCH-2.0] Support default database in connection URIs for Spark Thrift Server.
e97b8cc	2017/02/06	[SPARK-17953][DOCUMENTATION] Fix typo in SparkSession scaladoc.
beeb656	2017/02/06	[SPARK-17863][SQL] should not add column into Distinct.
3d6ab95	2017/02/06	[SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer.
00239e8	2017/02/06	minor doc fix for Row.scala.
9957c50	2017/02/06	[SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once.
be58a9b	2017/02/06	[SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics.
b064786	2017/02/06	[SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice.
eb73c46	2017/02/06	[SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB.
8a5a689	2017/02/06	[SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type.
4fb6c0c	2017/02/06	[SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13.
dccbe82	2017/02/06	[SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad.
22078b0	2017/02/06	[SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document is incorrect.
904dc7b	2017/02/06	Fix hadoop.version in building-spark.md.
7c94cc5	2017/02/06	[SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator.
50d4eac	2017/02/06	[SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite.
ea25634	2017/02/06	[SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite.
95a7871	2017/02/06	[SPARK-17417][CORE] Fix # of partitions for Reliable RDD checkpointing.
784dd2f	2017/02/06	[SPARK-17832][SQL] TableIdentifier.quotedString creates un-parseable names when name contains a backtick.
dcdca00	2017/02/06	[SPARK-17806] [SQL] fix bug in join key rewritten in HashJoin.
f36c03b	2017/02/06	[SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules.
eb75678	2017/02/06	[SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0).
c46948e	2017/02/06	[SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths.
cad3e53	2017/02/06	[SPARK-17612][SQL][BRANCH-2.0] Support `DESCRIBE table PARTITION` SQL syntax.
87e573f	2017/02/06	[SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types.
e1cdf30	2017/02/06	[SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with INTERVAL arithmetic.
08a30d9	2017/02/06	[SPARK-17803][TESTS] Upgrade docker-client dependency.
4a48d45	2017/02/06	[SPARK-17780][SQL] Report Throwable to user in StreamExecution.
67ee7ad	2017/02/06	[SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming.
85d0dc1	2017/02/06	[SPARK-17643] Remove comparable requirement from Offset (backport for branch-2.0).
a255661	2017/02/06	[SPARK-17758][SQL] Last returns wrong result in case of empty partition.
07a30cb	2017/02/06	[SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of BlockManagerSuite.
230b501	2017/02/06	[SPARK-17773][BRANCH-2.0] Input/Output] Add VoidObjectInspector.
8ae27fb	2017/02/06	[SPARK-17549][SQL] Only collect table size stat in driver for cached relation.
3fa5485	2017/02/06	[SPARKR][DOC] minor formatting and output cleanup for R vignettes.
13595fc	2017/02/06	[SPARK-17559][MLLIB] persist edges if their storage level is non in PeriodicGraphCheckpointer.
75d7369	2017/02/06	[SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver.
159c854	2017/02/06	[SPARK-17753][SQL] Allow a complex expression as the input a value based case statement.
ca37182	2017/02/06	[SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract.
825c9e3	2017/02/06	[SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,…
258b068	2017/02/06	[MINOR][DOC] Add an up-to-date description for default serialization during shuffling.
92cd75c	2017/02/06	Updated the following PR with minor changes to allow cherry-pick to branch-2.0.
60d2ac2	2017/02/06	[SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix with SparseVector.
e6d1fbe	2017/02/06	[SPARK-17672] Spark 2.0 history server web Ui takes too long for a single application.
90df14b	2017/02/06	[SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath aggregates.
7120a46	2017/02/06	[SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition.
539f476	2017/02/06	[MINOR][DOCS] Fix th doc. of spark-streaming with kinesis.
27de1d4	2017/01/05	Merge pull request #81 from mapr/mapr-25713.
8ea6501	2017/01/05	[MAPR-25713] Spark might try to load MapR Class Loader multiple times and fail.
7e9e5f4	2016/12/26	Merge pull request #80 from mapr/mapr-25638.
965975c	2016/12/26	[SPARK-18528][SQL] Fix a bug to initialise an iterator of aggregation buffer.
96b1fea	2016/12/12	Merge pull request #79 from mapr/mapr-25311.
c5f682b	2016/12/12	[MAPR-25311] Bump Spark dependencies after ECO-1611 release.

Known Issues and Limitations

Spark 2.0.1 does not support Spark Structured Streaming.
Full support of HPE Ezmeral Data Fabric Streams is available only on clusters with MapR 5.2 and later.
Spark is not able to submit jobs to YARN when the cluster is in "classic" mode, even if YARN is installed and configured.
MAPR-17271: On secure clusters, the MapR Control System (MCS) does not display links for Spark-Master and Spark-HistoryServer.
MAPR-25052: Spark Thrift Server does not start on clusters secured by MapR-SASL.
MAPR-26039: Spark does not propagate mapr_sec_enabled variable to Driver.
Spark versions up to and including 2.3.0 have the following security vulnerability:CVE-2018-1334 Apache Spark local privilege escalation vulnerability

Resolved Issues

None.