Spark 2.4.0.0-1904 (EEP 6.2.0) Release Notes
This section provides reference information, including new features, patches, and known issues for Spark 2.4.0.
The notes below relate specifically to the MapR Distribution for Apache Hadoop. This release of Spark has backward-compatibility changes, see the open-source Spark 2.4.0.0 Release Notes for more information.
These release notes contain only MapR-specific information and are not necessarily cumulative in nature. For information about how to use the release notes, see Ecosystem Component Release Notes.
Spark Version | 2.4.0.0 |
Release Date | April 2019 |
MapR Version Interoperability | See EEP Components and OS Support. |
Source on GitHub | https://github.com/mapr/spark |
GitHub Release Tag | 2.4.0.0-mapr-1904 |
Maven Artifacts | https://repository.mapr.com/maven/ |
Package Names | Navigate to https://package.ezmeral.hpe.com/releases/MEP/ and select your EEP and OS to view the list of package names. |
- Starting with EEP 6.0.0,
keyStore
andtrustStore
passwords can be removed from thespark-defaults.conf
file and can be set in the/opt/mapr/conf/ssl-client.xml
file. - Starting with EEP 6.0.0, after an upgrade,
configuration files of previous versions are saved in the
/opt/mapr/spark
directory. - The MapR 6.1 and EEP 6.0.0 release introduces "Simplified Security". If you are using these versions and enable security on your MapR cluster, MapR scripts automatically configure Spark security features.
Hive Support
This version of Spark supports integration with Hive. However, note the following exceptions:
- Hive-on-Spark is not supported.
- Spark-SQL is supported, but it is not fully compatible with Hive. For details, see the Apache Spark documentation and the MapR Spark documentation.
New in This Release
- For a complete list of all new features, refer to the open source documentation.
Fixes
This MapR release includes the following new fixes since the latest MapR Spark 2.3.1 release. For details, refer to the commit log for this project in GitHub.
GitHub Commit | Date (YYYY-MM-DD) | Comment |
---|---|---|
4bdca6c | 2019-02-25 | MapR [SPARK-427] Update kafka in Spark-2.4.0 to the 1.1.1-mapr |
0ccea10 | 2019-02-25 | MapR [SPARK-434] Move absent commits from 2.3.2 branch |
0af5795 | 2019-02-25 | MapR [SPARK-442] Spark build fails beacuse of the wrong tests in spark-streaming-kafka-10 module |
d40b974 | 2019-02-25 | MapR [SPARK-446] Spark configure.sh doesn't start/stop Spark services |
d42400a | 2019-02-25 | MapR [SPARK-430] PID files should be under /opt/mapr/pid |
b7eec10 | 2019-02-25 | MapR [SPARK-221] Investigate possibility to move creating of the spark-env.sh from private-pkg to configure.sh |
399d5b8 | 2019-02-25 | MapR [SPARK-287] Move logic of creating /apps/spark folder from installer's scripts to the configure.sh |
0170f29 | 2019-02-25 | [SPARK-449] Kafka offset commit issue fixed |
2497c80 | 2019-02-25 | MapR [SPARK-417] impersonation fixes for spark executor. Impersonation is moved from HadoopRDD.compute() method to org.apache.spark.executor.Executor.run() method |
e1d14ed | 2019-02-25 | MapR [SPARK-456] Spark shell can't be started |
1cc194b | 2019-02-26 | [SPARK-466] SparkR errors fixed |
9c4cf43 | 2019-02-26 | [SPARK-379] Fix Spark version for Avro and Kubernetes integration tests |
4436a8a | 2019-02-26 | MapR [SPARK-464] Can't submit spark 2.4 jobs from mapr-client |
b14e1a6 | 2019-02-27 | MapR [SPARK-465] Error messages after update of spark 2.4 |
c9fa510 | 2019-02-28 | MapR [K8S-637][K8S] Add configure.sh configuration in spark-defaults.conf for job runtime |
11e3daf | 2019-02-28 | MapR [SPARK-481] Cannot run spark configure.sh on Client node |
4a740fb | 2019-03-01 | MapR [SPARK-486][K8S] Fix sasl encryption error on Kubernetes |
30f88de | 2019-03-07 | MapR [SPARK-416] CVE-2018-1320 vulnerability in Apache Thrift |
a3f0109 | 2019-03-08 | MapR [SPARK-496] Spark HS UI doesn't work |
f60e8a4 | 2019-03-08 | MapR [SPARK-482] Spark streaming app fails to start by UnknownTopicOrPartitionException with checkpoint |
71f5db9 | 2019-03-15 | MapR [SPARK-514] Recovery from checkpoint is broken |
ba9e107 | 2019-03-18 | MapR [SPARK-515] Move configuring spark-env.sh back to the private-pkg |
9fbdc61 | 2019-03-19 | MapR [SPARK-515][K8S] Remove configure.sh call for k8s |
cbbd78f | 2019/03/19 | MapR [SPARK-492] Spark 2.4.0.0 configure.sh has error messages |
fce6079 | 2019/03/19 | SPARK-463] MAPR_MAVEN_REPO variable for specifying mapR repository |
100aff7 | 2019/03/22 | MapR [SPARK-494] Spark - Distribute Notice.txt across components starting with MEP 6.2 |
a4e4259 | 2019/03/25 | MapR [SPARK-460] Spark Metrics for CollectD Configuration for collecting Spark metrics |
7615273 | 2019/03/26 | MapR [SPARK-510] nonmapr "admin" users not able to view other user logs in SHS |
80edc50 | 2019/03/26 | [SPARK-508] MapR-DB OJAI Connector for Spark isNull condition returns incorrect result |
dfc0022 | 2019/03/28 | MapR [SPARK-462] Spark and SparkHistoryServer allow week ciphers, which can allow man in the middle attack |
baf607e | 2019/03/28 | MapR [SPARK-461] Stop graph after jobs completion to prevent 'java.lang.IllegalStateException: No active subscriptions' |
d48945f | 2019/04/04 | MapR [SPARK-516] Spark jobs failure using yarn mode on kerberos fixed |
1c793f8 | 2019/04/11 | MapR [SPARK-531] Remove duplicating entries from classpath in ClasspathFilter |
6a39ff6 | 2019/04/11 | SPARK-444] Fix of hive version for spark dev branches |
c5aeb67 | 2019/04/15 | Spark 2.4.0 backport 2.4.1 |
2ae047f | 2019/04/19 | SPARK-539 Workaround for absent MapRDBJsonSplit class |
94eb0f1 | 2019/04/20 | K8S-853: Enable spark metrics for external tenant |
c7abaf8 | 2019/04/22 | MapR [SPARK-536] PySpark streaming package for kafka-0-10 added |
ef70d34 | 2019/04/22 | MapR [SPARK-540] Include 'avro' artifacts |
f08108e | 2019/04/23 | MapR [K8S-893] Hide plain text password from logs |
28ddfe9e | 2019/05/17 | MapR [SPARK-541] Avoid duplication of the first unexpired record |
- SPARK-26709 - OptimizeMetadataOnlyQuery does not correctly handle the files with zero record
- SPARK-26080 - Unable to run worker.py on Windows
- SPARK-26873 - FileFormatWriter creates inconsistent MR job IDs
- SPARK-26745 - Non-parsing Dataset.count() optimization causes inconsistent results for JSON inputs with empty lines
- SPARK-26677 - Incorrect results of not(eqNullSafe) when data read from Parquet file
- SPARK-26708 - Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan
- SPARK-26267 - Kafka source may reprocess data
- SPARK-26706 - Fix Cast$mayTruncate for bytes
- SPARK-26078 - WHERE .. IN fails to filter rows when used in combination with UNION
- SPARK-26233 - Incorrect decimal value with java beans and first/last/max... functions
- SPARK-27097 - Avoid embedding platform-dependent offsets literally in whole-stage generated code
- SPARK-26188 - Spark 2.4.0 Partitioning behavior breaks backwards compatibility
- SPARK-25921 - Python worker reuse causes Barrier tasks to run without BarrierTaskContext
Known Issues
The same SQL expressions from SELECT clause and GROUP BY clause resolves to different expression IDs.pyspark.sql.utils.AnalysisException
- Python OJAI connector failure caused by incorrect resolution of python user-defined function calls by Spark SQL parser.Sample SQL query that leads to
pyspark.sql.utils.AnalysisException
, thestringtodate1(yelping_since)
expression is used in SELECT and GROUP BY,stringtodate1
is python user-defined function:SELECT business_id, stringtodate1(yelping_since) AS startyear, avg(stars) AS avgstars FROM temp_table_name GROUP BY business_id, stringtodate1(yelping_since)
Workaround:stringtodate1(yelping_since)
expression in GROUP BY is replaced with aliasstartyear
.SELECT business_id, stringtodate1(yelping_since) AS startyear, avg(stars) AS avgstars FROM temp_table_name GROUP BY business_id, startyear
Resolved Issues
- None.