Understanding the MERGE Statement

Simple `merge.maprdb.column.id` is the join key

Consider merging the following example source and target tables:

Table 1. `customer_source`
id	first_name	last_name	age
001	Dorothi	Hogward	7777
002	Alex	Bowee	7777
088	Robert	Dowson	25

Table 2. `customer_db_json_target`
id	first_name	last_name	age
001	John	Smith	45
002	Michael	Watson	27
003	Den	Brown	33

You can use the following SQL-standard MERGE statement:

MERGE into customer_db_json_target trg 
USING customer_source src 
ON src.id = trg.id 
WHEN MATCHED THEN UPDATE SET age = src.age 
WHEN NOT MATCHED THEN 
INSERT VALUES (src.id, src.first_name, src.last_name, src.age);

The result is:

id	first_name	last_name	age
001	John	Smith	7777
002	Michael	Watson	7777
003	Den	Brown	33
088	Robert	Dowson	25

NOTE The age column is updated and a new id column is inserted.

Simple `merge.maprdb.column.id` is not the join key

Merging when merge.maprdb.column is not the join key is not recommended.

DELETE syntax in the MERGE statement

This section describes how to use the DELETE syntax in the MERGE statement for HPE Ezmeral Data Fabric Database JSON tables. Included are examples of usage and limitations.

Consider two tables: tgt which is the target table of the MERGE statement, and src, which is the source table from which data will be taken. Both tables use MapRDBJsonStorageHandler to store data. The following table shows the initial contents of the tgt table:

Table 3. Initial contents of `tgt` table before `MERGE`
id	Value
1	AAA
2	BBB
3	CCC
4	DDD
5	EEE

The following table shows the initial contents of the src table:

Table 4. Initial contents of `src` table
id	Value
1	AAA
222	BBB---
3	CCC
444	DDD---
5	EEE

The following merge statement contains a WHEN MATCHED THEN DELETE clause. It means that if the id from the tgt table equals the id from the src table, the row is removed from the tgt table. When the value of id does not match, a new row is inserted into the tgt table:

MERGE INTO tgt 
USING src ON tgt.id=src.id 
WHEN MATCHED THEN DELETE 
WHEN NOT MATCHED THEN INSERT VALUES (src.id, src.value);

The following table shows the result of the merge:

Table 5. Contents of `tgt` table after merge
id	Value
2	BBB
222	BBB---
4	DDD
444	DDD---

Here we removed rows with id 1, 3, and 5 from the tgt table because they existed in the src table, and they matched values from the tgt table. We did not touch rows with id 2 and 4, because there were no such values in the src table. We inserted new rows with id values 222 and 444 because they existed in the src table and did not exist in the tgt table.

Limitations

The preceding solution has three limitations:

Subqueries are not supported as a source when DELETE is used.
The source table should be a HPE Ezmeral Data Fabric Database JSON table when deletion is used in a MERGE operator.
The DELETE operator is not supported with additional conditions after WHEN MATCHED. Use either a single UPDATE or DELETE.

Limitation #3 means that queries like the following are not supported:

MERGE INTO tgt
USING src
ON tgt._id = src._id
WHEN MATCHED AND [boolean expression1] THEN DELETE
WHEN MATCHED AND [boolean expression2] THEN UPDATE
WHEN NOT MATCHED THEN INSERT

Multiple source rows match a given target row (cardinality violation)

Consider merging the two tables customer_db_json and customer_new:

Table 6. `customer_db_json`
id	first_name	last_name	age
001	John	Smith	45
002	Michael	Watson	27
003	Den	Brown	33

And:

Table 7. `customer_new`
id	first_name	last_name	age
001	Dorothi	Hogward	77
001	Dorothi	Hogward	77
088	Robert	Dowson	25

To MERGE customer_new and customer_db_json:

MERGE INTO customer_db_json trg 
USING customer_new src ON src.id = trg.id 
WHEN MATCHED THEN UPDATE 
SET first_name = src.first_name, 
last_name = src.last_name 
WHEN NOT MATCHED THEN INSERT VALUES
(src.id, src.first_name, src.last_name, src.age);

This example causes an exception because of duplicate values in the id column in the customer_new table:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0)

To avoid cardinality violation, set hive.merge.cardinality.check=false, but in this case the result is unpredictable because there is no rule that defines the order of duplicated data that will be inserted by using the MERGE statement.

Merge on mixed data types

The merge operation also supports mixed data types, such as arrays, maps, and structures.

Consider two tables mixed_types_source and mixed_types_target:

Table 8. `mixed_types_source`
doc_id	user_info
1	{"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"}
2	{"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}
3	{"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"}
4	{"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"}

And:

Table 9. `mixed_types_target`
id	user_info
1	{"name":"Lexx","surname":"Comfuzer","age":31,"gender":"MALE"}

To merge mixed_types_source and mixed_types_target:

MERGE INTO mixed_types_target trg 
USING mixed_types_source src 
ON src.doc_id = old.doc_id 
WHEN MATCHED THEN UPDATE 
SET user_info = src.user_info 
WHEN NOT MATCHED THEN INSERT VALUES 
(src.doc_id, src.user_info);

The result is:

Table 10. Result of `MERGE` operator
id	first_name
1	`{"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"}`
2	`{"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}`
3	`{"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"}`
4	`{"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"}`

Note that you cannot update only a part of a complex structure field. For example, suppose you have a structure stored as one field in a Hive table:

{"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}

You cannot update only the age field in the structure. You can only replace all values of the structure with new ones. For details, see Understanding the UPDATE Statement.

Merge into external HPE Ezmeral Data Fabric Database JSON tables

The MERGE operator is also available for external HPE Ezmeral Data Fabric Database JSON tables. You can use the MERGE statement to insert and update values in external MapR database JSON table targets.

Merge into partitioned HPE Ezmeral Data Fabric Database JSON tables

Partitioned HPE Ezmeral Data Fabric Database JSON tables are not supported.

Merge into temporary HPE Ezmeral Data Fabric Database JSON tables

The MERGE operator is also available for temporary HPE Ezmeral Data Fabric Database JSON tables. Use temporary tables as target tables for merge. No additional syntax is needed.