What's New in EEP 3.0
Provides a summary of the new functionality in EEP 3.0.
EEP 3.0 provides a series of stability and security fixes for Spark and improves the speed of ETL and batch processing with a faster version of Hive.
New Features and Additions
- HPE Ezmeral Data Fabric Database OJAI Connector for Apache Spark
- The HPE Ezmeral Data Fabric Database OJAI Connector for Apache Spark is a new API that makes it easier to build
real-time or batch pipelines between your data and HPE Ezmeral Data Fabric Database and leverage Spark within the
pipeline. This feature includes:
- Two new APIs that allow you to load data from a HPE Ezmeral Data Fabric Database JSON table to a Spark RDD or save a Spark RDD to a HPE Ezmeral Data Fabric Database JSON table
- A custom partitioner that allows you to partition data for better performance
- Data locality: when the connector reads data from HPE Ezmeral Data Fabric Database, it uses the data locality feature of HPE Ezmeral Data Fabric Database to spawn the Spark executors
For more information, see Understanding the HPE Ezmeral Data Fabric Database OJAI Connector for Spark.
- HPE Ezmeral Data Fabric Database Binary Connector for Apache Spark
- The new HPE Ezmeral Data Fabric Database Binary Connector for Apache Spark allows you to write applications that
consume HBase binary tables and use them in Spark. Features include:
- Writing directly to HBase HFiles for bulk insertion into HBase
- Spark SQL can draw on tables that are represented in HBase
For more information, see HPE Ezmeral Data Fabric Database Binary Connector for Apache Spark.
- HPE Ezmeral Data Fabric Streams C Applications (librdkafka)
- As of MapR maintenance release 5.2.1, you can develop C applications for HPE Ezmeral Data Fabric Streams.
The HPE Ezmeral Data Fabric Streams C Client is a distribution of librdkafka that integrates with MapR
Streams.
For more information, see HPE Ezmeral Data Fabric Streams C Applications.
- HPE Ezmeral Data Fabric Streams Python Applications
- As of MapR 5.2.1, you can create Python applications for HPE Ezmeral Data Fabric Streams using the MapR
Streams Python client. The Streams Python client is a binding for librdkafka and
contains support for high-level consumers.
For more information, see HPE Ezmeral Data Fabric Streams Python Applications.
Key Upgrades
- Apache Spark 2.1.0
- Spark 2.1 in the MapR converged data platform brings improvements in enterprise-ready
stability and security, including:
- More than 1200 fixes on the Spark 2.x line
- MapR-SASL support for encrypted Thrift-server connections
- Scalable partition handling
- Stable data-type APIs
For more information, see Apache Spark Feature Support.
- Apache Hive 2.1.1
- EEP 3.0 provides a faster version of Hive to improve the speed of data-processing
tasks, to reduce latency for interactive queries, and to increase throughput for batch
queries. Key improvements include:
- 2x faster ETL through an enhanced cost-based optimizer (CBO), faster type conversions, and dynamic partition pruning
- New HiveServer UI with new diagnostics and monitoring tools
- Dynamically partitioned hash joins, which provide unsorted inputs in order to eliminate the sorting step.
- Vectorized query execution that greatly reduces the CPU usage for typical query operations, like scans, filters, aggregates, and joins
For more information, see Hive.
- Apache Drill 1.10
- Continuing on the iterative releases, Drill 1.10 is another important milestone for
Apache Drill. Numerous enhancements have been added to this release for BI tool
integration, end-to-end security, performance, and usability enhancements. Highlights of
this release include:
- Tableau native connectivity
- Support for Kerberos and MapR-SASL authentication between the client and Drillbit
- Support for the CREATE TEMPORARY TABLE AS (CTTAS) command
- Ability to query data with Hue 3.12 (experimental only)
- Improved compatibility with Hive/Spark-generated Parquet files
For more information, see the Drill Introduction.