Integrate Pig and HBase
This document shows an example of a Pig and HBase integration. The goal of integration is to upload data from the MapR File System to Pig and then move the data to an HBase table.
IMPORTANT This component is deprecated. Hewlett Packard
Enterprise recommends using an alternate product. For more information, see Discontinued Ecosystem Components.
Configuring Pig and HBase
No additional configuration is needed to integrate HBase and Pig.
Pig and HBase Integration Example
- Create sample data, and upload the data to the MapR File System:
- Create a sample data file:
vim input.csv
- Add data to the
file:
1,aaa,bbb 2,ccc,ddd 3,rrr,fff 4,ttt,yyy
- Upload the data to the MapR File System:
hadoop fs -put input.csv /user/mapr/input.csv
- Create a sample data file:
- Create a sample table in HBase:
- Start the HBase shell:
hbase shell
- Create a
table:
hbase(main):012:0> create 'sample_names', 'info'
- Start the HBase shell:
- Load the data to Pig, and store the data in HBase:
- Start the Pig shell:
pig
- Load the data to
Pig:
raw_data = LOAD '/user/mapr/input.csv' USING PigStorage(',') AS (listing_id: chararray, fname: chararray, lname: chararray);
- Store the data in
HBase:
STORE raw_data INTO 'sample_names' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:fname info:lname');
- Start the Pig shell:
- Verify the data in HBase:
- Start the HBase shell:
hbase shell
- Query the data:
hbase(main):017:0* scan 'sample_names'
The result is:ROW COLUMN+CELL 1 column=info:fname, timestamp=1574946889082, value=aaa 1 column=info:lname, timestamp=1574946889082, value=bbb 2 column=info:fname, timestamp=1574946889091, value=ccc 2 column=info:lname, timestamp=1574946889091, value=ddd 3 column=info:fname, timestamp=1574946889091, value=rrr 3 column=info:lname, timestamp=1574946889091, value=fff 4 column=info:fname, timestamp=1574946889091, value=ttt 4 column=info:lname, timestamp=1574946889091, value=yyy
- Start the HBase shell: