Integrate Pig and HBase
This document shows an example of a Pig and HBase integration. The goal of integration is to upload data from the file system to Pig and then move the data to an HBase table.
IMPORTANT This component is deprecated. Hewlett Packard
Enterprise recommends using an alternate product. Deprecated components are either in
maintenance or have reached the end of their maintenance lifecycle. For more information,
see Discontinued Ecosystem Components.
Configuring Pig and HBase
No additional configuration is needed to integrate HBase and Pig.
Pig and HBase Integration Example
- Create sample data, and upload the data to the file system:
- Create a sample data file:
vim input.csv
- Add data to the
file:
1,aaa,bbb 2,ccc,ddd 3,rrr,fff 4,ttt,yyy
- Upload the data to the file system:
hadoop fs -put input.csv /user/mapr/input.csv
- Create a sample data file:
- Create a sample table in HBase:
- Start the HBase shell:
hbase shell
- Create a
table:
hbase(main):012:0> create 'sample_names', 'info'
- Start the HBase shell:
- Load the data to Pig, and store the data in HBase:
- Start the Pig shell:
pig
- Load the data to
Pig:
raw_data = LOAD '/user/mapr/input.csv' USING PigStorage(',') AS (listing_id: chararray, fname: chararray, lname: chararray);
- Store the data in
HBase:
STORE raw_data INTO 'sample_names' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:fname info:lname');
- Start the Pig shell:
- Verify the data in HBase:
- Start the HBase shell:
hbase shell
- Query the data:
hbase(main):017:0* scan 'sample_names'
The result is:ROW COLUMN+CELL 1 column=info:fname, timestamp=1574946889082, value=aaa 1 column=info:lname, timestamp=1574946889082, value=bbb 2 column=info:fname, timestamp=1574946889091, value=ccc 2 column=info:lname, timestamp=1574946889091, value=ddd 3 column=info:fname, timestamp=1574946889091, value=rrr 3 column=info:lname, timestamp=1574946889091, value=fff 4 column=info:fname, timestamp=1574946889091, value=ttt 4 column=info:lname, timestamp=1574946889091, value=yyy
- Start the HBase shell: