Hive 2.1 and Tez 0.8
You can use Tez, instead of MapReduce, for generic data processing tasks. Tez significantly increases the processing speed. Tez, working with Hive, provides smaller latency for interactive queries and higher throughput for batch queries. Some key improvements include:
- Added UDF
aes_encrypt
andaes_decrypt
functions for encrypting and decrypting input using AES (Advanced Encryption Standard).Oracle JRE supports AES-128 out of the box; AES-192 and AES-256 are supported if Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed.
- Added banker's rounding
BROUND
UDF.With banker's rounding, the value is rounded to the nearest even number. Also known as "Gaussian rounding", and, in German, "mathematische Rundung".
- ORC file dump in JSON format.
ORC file dump uses custom format. Will be useful to dump ORC metadata in json format so that other tools can be built on top it.
- Provided a way for developers/users to modify the
numRows
anddataSize
for a table/partition.Although they are part of the table properties, in prior versions, they were set to
-1
when the task did not come from astatsTask
.
MapR Hive on Tez also includes the following:
- Dynamically partitioned hash join for Tez.
- Support for aggregate push down through joins.
- DBTokenStore support to HS2 delegation token.
- Hive View Column Authorization.
- UDF substring_index function that returns the substring from string
str
before count occurrences of the delimiter. - QUARTER(data/time/string) function that returns the quarter of the year for a date, timestamp, or string in the range 1 to 4.
- Support for limited integer type promotion in ORC.
- Possibility for Hive Parser to support multi col in clause (x,y..) in ((..),..., ()).
- Support of special characters in quoted table names.
- Support for "show create database".
- Support escaping carriage return and new line for LazySimpleSerDe.
- Support vectorizing when the input format is TEXTFILE and other formats for better Map Vertex performance.
- Support for NULLS FIRST/NULLS LAST.
The NULLS FIRST and NULLS LAST options can be used to determine whether nulls appear before or after non-null data values when the ORDER BY clause is used.
- Supports aggregate functions in over clause.
- Command to kill an ACID transaction.
This cleans up all state related to this transaction. The initiator of this (if still alive) will get an error trying to heartbeat/commit and will become aware that the transaction failed.
- Hive Hybrid Procedural SQL On Hadoop (HPL/SQL).
Hive Hybrid Procedural SQL On Hadoop (HPL/SQL), which is available in Hive 2.1, is a tool that implements procedural SQL for Hive.
HPL/SQL is an open source tool that implements procedural SQL language for Apache Hive, SparkSQL, Impala, as well as any other SQL-on-Hadoop implementation, any NoSQL, and any RDBMS.
HPL/SQL is a hybrid and heterogeneous language that understands syntaxes and semantics of almost any existing procedural SQL dialect, and you can use with any database (for example, running existing Oracle PL/SQL code on Apache Hive and Microsoft SQL Server, or running Transact-SQL on Oracle, Cloudera Impala, or Amazon Redshift).
NOTE: Create thehplsql-site.xml
file to configure HPL/SQL feature. See http://www.hplsql.org/configuration for more information.
- Hive on Spark
You cannot use Spark as an execution engine for Hive. However, you can run Hive and Spark on the same cluster. You can also use Spark SQL and Drill to query Hive tables.
- HDFS encryption in Hive tables
- Hbase-0.9X with Hive-2.1
Only Hbase-1.X is compatible with Hive-2.1.
- LLAP with Hive-2.1 since Apache Slider is not in the MapR ecosystem
- Apache Knox and Apache Ranger
HiveServer2 HTTP mode is not available with X-Forwarded-Host header for authorization/audits.
- Masking and filtering of rows/columns since Apache Ranger is not in the MapR ecosystem.