Drill 1.6.0 Release Notes

The following release notes are for the 1.6.0 version of the Apache Drill component included in the MapR distribution for Apache Hadoop. Release notes for prior releases are posted on the Apache Drill web site.

Version 1.6.0
Release Date April 4, 2016
MapR Version Interoperability MapR Drill 1.6.0 is certified on the MapR v5.1.0 converged data platform. See Interoperability Matrices and Drill Support Matrix.
Download Location See Installing Drill.

Noteworthy New Features in the MapR Distribution of Drill

This release provides enhanced query improvements with the following bug fixes and improvements:

Additional bug fixes and enhancements listed in the Apache Drill 1.6.0 Release Notes.

Resolved Issues

The following table lists resolved issues in Drill 1.6.0:
Issue Description
MD-803 LIMIT 0 optimization applies to the ranking and value window functions.
MD-784 LIMIT 0 optimization applies to the Hive UDF xpath_double.
MD-781 LIMIT 0 optimization applies to the TRIM function.
MD-608 Complex data functions FLATTEN and KVGEN no longer return null data when you issue queries through the ODBC driver.
MD-604 EXTRACT function with seconds no longer returns incorrect data when you issue queries through the ODBC driver.
MD-361 Drill honors the Hive skip.header.line.count and skip.footer.line.count table properties and skips header and footer lines when querying Hive tables created with these properties.

Known Issues

The following table lists known issues in Drill 1.6.0:
Issue Description
MD-851 Pushdown is disabled for columns with TIMESTAMP values which makes these columns case-insensitive. See Limitations.
MD-845 LIMIT 0 optimization does not apply when used in a query with a FULL OUTER JOIN and the planner.enable_mergejoin option is set to true.
MD-817 LIMIT 0 optimization does not apply when a query contains math operations between an interval type and a constant.
MD-802 LIMIT 0 optimization applies to the AVG window function.
MD-794 LIMIT 0 optimization does not apply when the return type is VARBINARY.
MD-787 LIMIT 0 optimization does not apply to the Hive BIN function.
MD-786 LIMIT 0 does not apply to the Hive HEX function.
MD-747 SUM function queries with a decimal argument and LIMIT 0 optimization returns the wrong data types and nullability.
MD-727 If you query a column that does not exist in the records of the queried table, Drill does not return any rows when it should return rows with null values. See Limitations.

Limitations

(MD-851) Pushdown filtering is disabled on columns with TIMESTAMP values, which results in these columns being case-insensitive. For example, if a JSON table is created with the following JSON records where the “access” columns are not of a particular case:
{"_id":"ID1","name":"Alice","age":"25","access":"2013-10-23 11:11:11","DOB":{"$dateDay":"1988-01-01"},"salary":"50.00"}
{"_id":"ID2","name":"Bob","age":"21","access":"2013-10-23 11:11:11","DOB":{"$dateDay":"1988-01-01"},"salary":"150.00"}
{"_id":"ID3","name":"Frank","age":"32","ACCESS":"2013-10-23 11:11:11","dob":{"$dateDay":"1988-01-01"},"salary":"250.00"}

Running a query on the “access” column returns all values because Drill does not push the filtering process down to HPE Ezmeral Data Fabric Database to get the subset of columns from records where the name is "access" in lowercase. Instead, HPE Ezmeral Data Fabric Database returns every value from each record that has a column named "access", regardless of the case. If some records include a column named "ACCESS" or "Access," these values are also returned in the result set.

SELECT * FROM dfs.`/tmp/data/student_db` where access = timestamp '2013-10-23 11:11:11';
+------+----------------------+------+-------------+-----------+---------+
| _id  |        access        | age  |     dob     |   name    | salary  |
+------+----------------------+------+-------------+-----------+---------+
| ID1  | 2013-10-23 11:11:11  | 25   | 1988-01-01  | Alice  | 50.00   |
| ID2  | 2013-10-23 11:11:11  | 21   | 1980-12-12  | Bob     | 150.00  |
| ID3  | 2013-10-23 11:11:11  | 32   | 1970-10-12  | Frank      | 250.00  |
+------+----------------------+------+-------------+-----------+---------+
3 rows selected (0.223 seconds)
In comparison, you can see that a query on the “dob” column in the same table produces only one result because the column does not contain TIMESTAMP values and is therefore case-sensitive:
SELECT * FROM dfs.`/tmp/data/student_db` where dob = date '1988-01-01';
+------+----------------------+------+-------------+-------+---------+
| _id  |        ACCESS        | age  |     dob     | name  | salary  |
+------+----------------------+------+-------------+-------+---------+
| ID3  | 2013-10-23 11:11:11  | 32   | 1988-01-01  | Frank  | 250.00  |
+------+----------------------+------+-------------+-------+---------+
1 row selected (0.359 seconds)

Although there are three records with “dob” columns, there is only one record with the column name in lowercase.

(MD-727) If you query one column that is partially or completely non-existent in the records of the queried table, Drill does not return any rows for the records when it should return the rows with null values. For example, if a JSON table is created from the following JSON records where only one record has a “time” column with a value and the other two records do not have time columns:
{"_id":"student1", "name":"Alice", "street":"123 Ballmer Av", "zipcode":12345, "state":"CA"},
{"_id":"student2", "name":"Bob", "time":"2016-03-08", "street":"1 Infinite Loop", "zipcode":12345, "state":"CA"},
{"_id":"student3", "name":"Frank", "street":"435 Walker Ct", "zipcode":12345, "state":"CA"}
The query returns only one row for the record in which the time column exists with a value instead of returning three rows with two rows having null values in the column.
SELECT `time` FROM dfs.`/user/root/students`;
+-----------+
| time      |
+-----------+
| 2016-03-08|
+-----------+
In comparison, if you query all of the columns in the table, you can see that the query returns all records and includes null values for records missing a time column.
SELECT * FROM dfs.`/user/root/students`;
+-----------+--------+--------+------------------+----------+-------------+
|    _id    |  name  | state  |      street      | zipcode  |    time     |
+-----------+--------+--------+------------------+----------+-------------+
| student1  | Alice  | CA     | 123 Ballmer Av   | 12345.0  | null        |
| student2  | Bob    | CA     | 1 Infinite Loop  | 12345.0  | 2016-03-08  |
| student3  | Frank  | CA     | 435 Walker Ct    | 12345.0  | null        |
+-----------+--------+--------+------------------+----------+-------------+
3 rows selected (0.314 seconds)