Get Started with Pig
About this task
IMPORTANT This component is deprecated. Hewlett Packard
Enterprise recommends using an alternate product. For more information, see Discontinued Ecosystem Components.
In this
tutorial, we will use Pig to run a MapReduce application that counts the words in
the/in/constitution.txt
file located in the mapr
user's
directory on the cluster, and store the results in the file
wordcount.txt
.Procedure
- Download the ZIP file that contains constitution.txt and then extract the constitution.txt file.
-
Load the file onto the cluster and place it in the directory
/user/mapr/in
. -
In the terminal, type the command
pig
to start the Pig shell. -
At the
grunt>
prompt, type the following lines (press ENTER after each): After you type the last line, Pig starts a MapReduce application to count the words in the fileconstitution.txt
.A = LOAD '/user/mapr/in' USING TextLoader() AS (words:chararray);
B = FOREACH A GENERATE FLATTEN(TOKENIZE(*));
C = GROUP B BY $0;
D = FOREACH C GENERATE group, COUNT(B);
STORE D INTO '/user/mapr/wordcount';
-
When the MapReduce application is complete, type
quit
to exit the Pig shell and take a look at the contents of the directory/myvolume/wordcount
to see the results.