Understanding the DELETE FROM Operation

In EEP 6.3.1 and later, you can use the DELETE FROM operation with HPE Ezmeral Data Fabric Database JSON tables.

Delete All Data from a Table

To delete all data from a HPE Ezmeral Data Fabric DatabaseJSON table use the following operator:

DELETE FROM <table_name>;

Example. In this example we create a table, insert data, and delete all rows:

DROP TABLE IF EXISTS customer;
CREATE TABLE customer(doc_id STRING, first_name STRING, last_name STRING) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/customer","maprdb.column.id" = "doc_id");
INSERT INTO TABLE customer VALUES ("001", "Max", "Born"), ("002", "Demmy", "John"), ("003", "Robby", "Smart");
SELECT doc_id, first_name, last_name FROM customer;
Table 1. Result of the Query
doc_id first_name last_name
001 Max Born
002 Demmy John
003 Robby Smart

The following query gives an empty set. Deletions are supported only for HPE Ezmeral Data Fabric Database JSON tables and transactional tables.

DELETE FROM customer;
SELECT doc_id, first_name, last_name FROM customer;

Example. In this example, we try to delete data from a non-transactional and non-MapR Database JSON table:

DROP TABLE IF EXISTS simple_data;
CREATE TABLE simple_data (id INT);
INSERT INTO TABLE simple_data VALUES (1), (2), (3);
DELETE FROM simple_data;

The result is:

FAILED: SemanticException Operation is not supported. Table is nor ACID neither MapRDbJSON

Delete a Single Row from a Table

To delete a single row from a HPE Ezmeral Data Fabric Database JSON table, use the following syntax:

DELETE FROM <table_name> WHERE <id> = <value>;

Where:

<table_name> is the HPE Ezmeral Data Fabric Database JSON table.

<id> is a key column of the MapR Database JSON table. It corresponds to the maprdb.column.id property.

<value> is the value to be deleted.

Example. In this example, we create a table, insert data, and delete a single row:

DROP TABLE IF EXISTS customer;
CREATE TABLE customer(doc_id STRING, first_name STRING, last_name STRING) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/customer","maprdb.column.id" = "doc_id");
INSERT INTO customer VALUES ("001", "Max", "Born"), ("002", "Demmy", "John"), ("003", "Robby", "Smart");
SELECT doc_id, first_name, last_name FROM customer;
Table 2. Result of the Query
doc_id first_name last_name
001 Max Born
002 Demmy John
003 Robby Smart

The following query deletes a single row using the WHERE clause:

DELETE FROM customer WHERE doc_id = "002";
SELECT doc_id, first_name, last_name FROM customer;
Table 3. Result of the Query
doc_id first_name last_name
001 Max Born
003 Robby Smart

Note. Deletions are supported only for key columns of HPE Ezmeral Data Fabric Database JSON tables.

Example. In this example, we try to use a column other than a key column of the MapR Database JSON table in deletion.

DROP TABLE IF EXISTS customer;
CREATE TABLE customer(doc_id STRING, first_name STRING, last_name STRING) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/customer","maprdb.column.id" = "doc_id");
INSERT INTO customer VALUES ("001", "Max", "Born"), ("002", "Demmy", "John"), ("003", "Robby", "Smart");
SELECT doc_id, first_name, last_name FROM customer;

Column first_name is not the key column of the table.

DELETE FROM customer WHERE first_name = "Max";

The result is:

FAILED: SemanticException Deletion over column first_name is forbidden. Use only key column of MapR Db Json table: doc_id

Delete Several Rows from a Table

To delete several rows from a table, use the following syntax:

DELETE FROM <table_name> WHERE <id> IN (<value1>, <value2>, ...);

Where:

<table_name> is the HPE Ezmeral Data Fabric Database JSON table.

<id> is a key column of the MapR Database JSON table. It corresponds to the maprdb.column.id property.

<value1>, <value2>, are values to be deleted.

Example. In this example, we create a table, insert data, and delete several rows:

DROP TABLE IF EXISTS customer;
CREATE TABLE customer(doc_id STRING, first_name STRING, last_name STRING) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/customer","maprdb.column.id" = "doc_id");
INSERT INTO TABLE customer VALUES ("001", "Max", "Born"), ("002", "Demmy", "John"), ("003", "Robby", "Smart");
SELECT doc_id, first_name, last_name FROM customer;
Table 4. Result of the Query
doc_id first_name last_name
001 Max Born
002 Demmy John
003 Robby Smart

The following query deletes several rows using the WHERE ... IN clause:

DELETE FROM customer WHERE doc_id IN ("001", "002");
SELECT doc_id, first_name, last_name FROM customer;
Table 5. Result of the Query
doc_id first_name last_name
003 Robby Smart

Delete All Rows in a Table Except Listed Rows

To delete all rows from a table except a listed row, use the following syntax:

DELETE FROM <table_name> WHERE <id> NOT IN (<value1>, <value2>, ...);

Where:

<table_name> is the HPE Ezmeral Data Fabric Database JSON table.

<id> is a key column of the MapR Database JSON table. It corresponds to the maprdb.column.id property.

<value1>, <value2>, are values to be preserved.

Example. In this example, we create a table, insert data, and delete all rows except the listed rows:

DROP TABLE IF EXISTS customer;
CREATE TABLE customer(doc_id STRING, first_name STRING, last_name STRING) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/customer","maprdb.column.id" = "doc_id");
INSERT INTO TABLE customer VALUES ("001", "Max", "Born"), ("002", "Demmy", "John"), ("003", "Robby", "Smart");
SELECT doc_id, first_name, last_name FROM customer;
Table 6. Result of the Query
doc_id first_name last_name
001 Max Born
002 Demmy John
003 Robby Smart

The following query deletes all rows except the listed rows:

DELETE FROM customer WHERE doc_id NOT IN ("003");
SELECT doc_id, first_name, last_name FROM customer;
Table 7. Result of the Query
doc_id first_name last_name
003 Robby Smart

Limitations of the DELETE FROM Operation

The following are three limitations of the current implementation:

  • The current implementation does not support arbitrary conditions in the WHERE clause of the DELETE statement even if a key column is used.

    Example. In this example, DELETE FROM is used with an arbitrary condition:

    DELETE FROM customer WHERE doc_id == "003" OR doc_id <> "005";
    The result is:
    FAILED:
            SemanticException This condition is not supported for MapR Db Json deletions. Supported
            WHERE clauses are: <id> = value, <id> IN (value1, value2, ...), <id> NOT IN (value1, value2,
            ...)
  • The current implementation does not support subqueries in the WHERE clause.
  • The current implementation does not support deletions in the MERGE statement.