Spark 2.0.1-1703 Release Notes
The notes below relate specifically to the MapR Distribution for Apache Hadoop. You may also be interested in the open-source Spark 2.0.1 Release Notes.
Spark Version | 2.0.1 |
Release Date | April 2017 |
MapR Version Interoperability | See EEP Components and OS Support. |
Source on GitHub | https://github.com/mapr/spark |
GitHub Release Tag | 2.1.0-mapr-1703 |
Maven Artifacts | https://repository.mapr.com/maven/ |
Package Names | See Package Names for Ecosystem Packs (EEPs) |
API Changes for this Version | See Spark API Changes. |
NOTE For some important Spark limitations, See "Known Issues and Limitations" later in this
release note.
New in This Release
This version of Spark supports integration with Hive. However, note the following exceptions:
- Hive-on-Spark is not supported.
- Spark-SQL is supported, but it is not fully compatible with Hive. For details, see the Apache Spark documentation and the MapR Spark documentation.
Fixes
This MapR release includes the following new fixes since the latest MapR Spark release. In addition, Spark 2.0.1-1703 includes backports of all the fixes contained in Apache Spark 2.0.2. For details, refer to the commit log for this project in GitHub.
GitHub Commit Number | Date (YYYY-MM-DD) | MapR Fix Number and Description |
---|---|---|
b5fdf9e | 2017/03/01 | Merge pull request #94 from mapr/mapr-26289-spark-2.0.1. |
f75cad8 | 2017/03/01 | Set default poll timeout to 120s. |
1cf7251 | 2017/03/01 | Added include-kafka-09 profile to Assembly. |
c9c6030 | 2017/02/24 | [MAPR-26060] Fixed case when mapr-streams make gaps in offsets (#91). |
36debc8 | 2017/02/09 | Merge pull request #89 from mapr/mapr-26076-spark-2.0.1. |
ed262d0 | 2017/02/09 | [SPARK-15844][CORE] HistoryServer doesn't come up if spark.authenticate = true. |
674f9bd | 2017/02/08 | Merge pull request #86 from mapr/spark-2.0.2-porting. |
529e51b | 2017/02/08 | Fixed version for Kafka 0.10 SQL. |
e680ec2 | 2017/02/06 | [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest. |
a68148e | 2017/02/06 | [SPARK-18125][SQL][BRANCH-2.0] Fix a compilation error in codegen due to splitExpression. |
316f706 | 2017/02/06 | [SPARK-17849][SQL] Fix NPE problem when using grouping sets. |
01f3743 | 2017/02/06 | [SPARK-17693][SQL][BACKPORT-2.0] Fixed Insert Failure To Data Source Tables when the Schema has the Comment Field. |
a996282 | 2017/02/06 | [SPARK-17981][SPARK-17957][SQL][BACKPORT-2.0] Fix Incorrect Nullability Setting to False in FilterExec. |
6d9dee4 | 2017/02/06 | [SPARK-18189][SQL][FOLLOWUP] Move test from ReplSuite to prevent java.lang.ClassCircularityError. |
cdd189c | 2017/02/06 | [SPARK-17337][SPARK-16804][SQL][BRANCH-2.0] Backport subquery related PRs. |
681a839 | 2017/02/06 | [SPARK-18200][GRAPHX][FOLLOW-UP] Support zero as an initial capacity in OpenHashSet. |
cb68e70 | 2017/02/06 | [SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet. |
42d7574 | 2017/02/06 | [SPARK-18111][SQL] Wrong approximate quantile answer when multiple records have the minimum value(for branch 2.0). |
95aeff9 | 2017/02/06 | [SPARK-18160][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode. |
37fcf10 | 2017/02/06 | [SPARK-16796][WEB UI] Mask spark.authenticate.secret on Spark environ. |
b1723aa | 2017/02/06 | [SPARK-18133][BRANCH-2.0][EXAMPLES][ML] Python ML Pipeline Exampl. |
a7be955 | 2017/02/06 | [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent. |
724a6e3 | 2017/02/06 | [SPARK-18114][HOTFIX] Fix line-too-long style error from backport of SPARK-18114. |
2f1aaa1 | 2017/02/06 | [SPARK-18148][SQL] Misleading Error Message for Aggregation Without Window/GroupBy. |
992d65f | 2017/02/06 | [SPARK-18189][SQL] Fix serialization issue in KeyValueGroupedDataset. |
f481615 | 2017/02/06 | [SPARK-18114][MESOS] Fix mesos cluster scheduler generage command option error. |
07d3ffe | 2017/02/06 | [SPARK-18030][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files. |
5250480 | 2017/02/06 | [SPARK-18143][SQL] Ignore Structured Streaming event logs to avoid breaking history server (branch 2.0). |
bdf4511 | 2017/02/06 | [SPARK-16312][FOLLOW-UP][STREAMING][KAFKA][DOC] Add java code snippet for Kafka 0.10 integration doc. |
ecd62ed | 2017/02/06 | [SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception. |
6cab38c | 2017/02/06 | [SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection". |
19d27ad | 2017/02/06 | [SPARK-17813][SQL][KAFKA] Maximum data per trigger. |
6c079b9 | 2017/02/06 | [SPARK-18132] Fix checkstyle. |
9c149f4 | 2017/02/06 | [SPARK-18009][SQL] Fix ClassCastException while calling toLocalIterator() on dataframe produced by RunnableCommand. |
597b754 | 2017/02/06 | [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes. |
38745a9 | 2017/02/06 | [SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL (branch 2.0). |
aa8c453 | 2017/02/06 | [SPARK-18104][DOC] Don't build KafkaSource doc. |
6f62a53 | 2017/02/06 | [SPARK-18063][SQL] Failed to infer constraints over multiple aliases. |
a031493 | 2017/02/06 | [SPARK-16304] LinkageError should not crash Spark executor. |
3b01f41 | 2017/02/06 | [SPARK-17733][SQL] InferFiltersFromConstraints rule never terminates for query. |
67484f3 | 2017/02/06 | [SPARK-18022][SQL] java.lang.NullPointerException instead of real exception when saving DF to MySQL. |
0002f56 | 2017/02/06 | [SPARK-16988][SPARK SHELL] spark history server log needs to be fixed to show https url when ssl is enabled. |
b50e511 | 2017/02/06 | [SPARK-18070][SQL] binary operator should not consider nullability when comparing input types. |
be401c8 | 2017/02/06 | [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance. |
c03b30f | 2017/02/06 | [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch. |
86e6db7 | 2017/02/06 | [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing. |
62ecfdd | 2017/02/06 | [SPARK-18058][SQL] [BRANCH-2.0]Comparing column types ignoring Nullability in Union and SetOperation. |
7d291d4 | 2017/02/06 | [SPARKR][BRANCH-2.0] R merge API doc and example fix. |
38c59da | 2017/02/06 | [SPARK-17123][SQL][BRANCH-2.0] Use type-widened encoder for DataFrame for set operations. |
453a44c | 2017/02/06 | [SPARK-17698][SQL] Join predicates should not contain filter clauses. |
0ed97fe | 2017/02/06 | [SPARK-17986][ML] SQLTransformer should remove temporary tables. |
1ac5708 | 2017/02/06 | [SPARK-16606][MINOR] Tiny follow-up to , to correct more instances of the same log message typo. |
8049e1d | 2017/02/06 | [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches. |
1b55321 | 2017/02/06 | [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream. |
f1fc622 | 2017/02/06 | [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset. |
a922ca4 | 2017/02/06 | [SPARK-17926][SQL][STREAMING] Added json for statuses. |
290ac5b | 2017/02/06 | [SPARK-17811] SparkR cannot parallelize data.frame with NA or NULL in Date columns. |
a94a716 | 2017/02/06 | [SPARK-18034] Upgrade to MiMa 0.1.11 to fix flakiness |
1db928e | 2017/02/06 | [SPARKR] fix warnings |
bbd260f | 2017/02/06 | [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD. |
c4816ab | 2017/02/06 | [SPARK-18003][SPARK CORE] Fix bug of RDD zipWithIndex & zipWithUniqueId index value overflowing. |
9c22c9d | 2017/02/06 | [SPARK-17989][SQL] Check ascendingOrder type in sort_array function rather than throwing ClassCastException. |
ae60c75 | 2017/02/06 | [SPARK-18001][DOCUMENT] fix broke link to SparkDataFrame. |
f2b58bf | 2017/02/06 | [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error. |
003b20c | 2017/02/06 | [SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryListener APIs for branch-2.0. |
9ad2ee7 | 2017/02/06 | [SPARK-17841][STREAMING][KAFKA] drain commitQueue. |
efcc529 | 2017/02/06 | [MINOR][DOC] Add more built-in sources in sql-programming-guide.md. |
edbe6a6 | 2017/02/06 | [SPARK-17711] Compress rolled executor log. |
28d9c60 | 2017/02/06 | [SPARK-17751][SQL][BACKPORT-2.0] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException. |
b8b951a | 2017/02/06 | [SQL][STREAMING][TEST] Follow up to remove Option.contains for Scala 2.10 compatibility. |
78e5c84 | 2017/02/06 | [SQL][STREAMING][TEST] Fix flaky tests in StreamingQueryListenerSuite. |
3fbcb1f | 2017/02/06 | [SPARK-17731][SQL][STREAMING] Metrics for structured streaming for branch-2.0. |
1a14c88 | 2017/02/06 | Fix example of tf_idf with minDocFreq. |
1bf46c0 | 2017/02/06 | [SPARK-17892][SQL][2.0] Do Not Optimize Query in CTAS More Than Once #15048. |
ea7ccbe | 2017/02/06 | [MINOR][SQL] Add prettyName for current_database function. |
e627ac0 | 2017/02/06 | [SPARK-17819][SQL][BRANCH-2.0] Support default database in connection URIs for Spark Thrift Server. |
e97b8cc | 2017/02/06 | [SPARK-17953][DOCUMENTATION] Fix typo in SparkSession scaladoc. |
beeb656 | 2017/02/06 | [SPARK-17863][SQL] should not add column into Distinct. |
3d6ab95 | 2017/02/06 | [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer. |
00239e8 | 2017/02/06 | minor doc fix for Row.scala. |
9957c50 | 2017/02/06 | [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once. |
be58a9b | 2017/02/06 | [SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics. |
b064786 | 2017/02/06 | [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice. |
eb73c46 | 2017/02/06 | [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB. |
8a5a689 | 2017/02/06 | [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type. |
4fb6c0c | 2017/02/06 | [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13. |
dccbe82 | 2017/02/06 | [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad. |
22078b0 | 2017/02/06 | [SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document is incorrect. |
904dc7b | 2017/02/06 | Fix hadoop.version in building-spark.md. |
7c94cc5 | 2017/02/06 | [SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator. |
50d4eac | 2017/02/06 | [SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite. |
ea25634 | 2017/02/06 | [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite. |
95a7871 | 2017/02/06 | [SPARK-17417][CORE] Fix # of partitions for Reliable RDD checkpointing. |
784dd2f | 2017/02/06 | [SPARK-17832][SQL] TableIdentifier.quotedString creates un-parseable names when name contains a backtick. |
dcdca00 | 2017/02/06 | [SPARK-17806] [SQL] fix bug in join key rewritten in HashJoin. |
f36c03b | 2017/02/06 | [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules. |
eb75678 | 2017/02/06 | [SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0). |
c46948e | 2017/02/06 | [SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths. |
cad3e53 | 2017/02/06 | [SPARK-17612][SQL][BRANCH-2.0] Support `DESCRIBE table PARTITION` SQL syntax. |
87e573f | 2017/02/06 | [SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types. |
e1cdf30 | 2017/02/06 | [SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with INTERVAL arithmetic. |
08a30d9 | 2017/02/06 | [SPARK-17803][TESTS] Upgrade docker-client dependency. |
4a48d45 | 2017/02/06 | [SPARK-17780][SQL] Report Throwable to user in StreamExecution. |
67ee7ad | 2017/02/06 | [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming. |
85d0dc1 | 2017/02/06 | [SPARK-17643] Remove comparable requirement from Offset (backport for branch-2.0). |
a255661 | 2017/02/06 | [SPARK-17758][SQL] Last returns wrong result in case of empty partition. |
07a30cb | 2017/02/06 | [SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of BlockManagerSuite. |
230b501 | 2017/02/06 | [SPARK-17773][BRANCH-2.0] Input/Output] Add VoidObjectInspector. |
8ae27fb | 2017/02/06 | [SPARK-17549][SQL] Only collect table size stat in driver for cached relation. |
3fa5485 | 2017/02/06 | [SPARKR][DOC] minor formatting and output cleanup for R vignettes. |
13595fc | 2017/02/06 | [SPARK-17559][MLLIB] persist edges if their storage level is non in PeriodicGraphCheckpointer. |
75d7369 | 2017/02/06 | [SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver. |
159c854 | 2017/02/06 | [SPARK-17753][SQL] Allow a complex expression as the input a value based case statement. |
ca37182 | 2017/02/06 | [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract. |
825c9e3 | 2017/02/06 | [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,… |
258b068 | 2017/02/06 | [MINOR][DOC] Add an up-to-date description for default serialization during shuffling. |
92cd75c | 2017/02/06 | Updated the following PR with minor changes to allow cherry-pick to branch-2.0. |
60d2ac2 | 2017/02/06 | [SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix with SparseVector. |
e6d1fbe | 2017/02/06 | [SPARK-17672] Spark 2.0 history server web Ui takes too long for a single application. |
90df14b | 2017/02/06 | [SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath aggregates. |
7120a46 | 2017/02/06 | [SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition. |
539f476 | 2017/02/06 | [MINOR][DOCS] Fix th doc. of spark-streaming with kinesis. |
27de1d4 | 2017/01/05 | Merge pull request #81 from mapr/mapr-25713. |
8ea6501 | 2017/01/05 | [MAPR-25713] Spark might try to load MapR Class Loader multiple times and fail. |
7e9e5f4 | 2016/12/26 | Merge pull request #80 from mapr/mapr-25638. |
965975c | 2016/12/26 | [SPARK-18528][SQL] Fix a bug to initialise an iterator of aggregation buffer. |
96b1fea | 2016/12/12 | Merge pull request #79 from mapr/mapr-25311. |
c5f682b | 2016/12/12 | [MAPR-25311] Bump Spark dependencies after ECO-1611 release. |
Known Issues and Limitations
- Spark 2.0.1 does not support Spark Structured Streaming.
- Full support of HPE Ezmeral Data Fabric Streams is available only on clusters with MapR 5.2 and later.
- Spark is not able to submit jobs to YARN when the cluster is in "classic" mode, even if YARN is installed and configured.
- MAPR-17271: On secure clusters, the MapR Control System (MCS) does not display links for Spark-Master and Spark-HistoryServer.
- MAPR-25052: Spark Thrift Server does not start on clusters secured by MapR-SASL.
- MAPR-26039: Spark does not propagate mapr_sec_enabled variable to Driver.
- Spark versions up to and including 2.3.0 have the following security vulnerability:CVE-2018-1334 Apache Spark local privilege escalation vulnerability
Resolved Issues
None.