Using ADLS for Data Input or Output

You can use Azure Data Lake Store (ADLS) as a source or destination for your application data.

Prerequisites

For general information about the features of ADLS, refer to the Azure Data Lake Store documentation.

For information about configuring ADLS as storage for a Hadoop cluster, refer to the official Apache documentation.

The Azure Data Lake Storage access path syntax is:
adl://<Account Name>.azuredatalakestore.net/
You can use ADLS the same way as you use file system, substituting an adl scheme instead of maprfs, hdfs, webhdfs, and so on.

Procedure

  1. Create a directory and read data:
    [mapr@node4 ~]$ hadoop fs -mkdir adl://<username>.azuredatalakestore.net/testdir
    
    [mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/
    
    Found 1 items
    drwxr-xr-x - 9d3f4f74-8337-4dae-ad77-f63459438553 331c9f66-6875-4e13-a74f-458dd23e4bde 0 2018-04-16 09:09 
    adl://<username>.azuredatalakestore.net/testdir
  2. Put data into ADLS from your local file system:
    [mapr@node4 ~]$ hadoop fs -put testfile adl://<username>.azuredatalakestore.net/testdir
    
    [mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/testdir
    
    Found 1 itemsrw-rr- 1 9d3f4f74-8337-4dae-ad77-f63459438553 331c9f66-6875-4e13-a74f-458dd23e4bde 0 2018-04-16 09:10 
    adl://<username>.azuredatalakestore.net/testdir/testfile
  3. Delete data from ADLS:
    [mapr@node4 ~]$ hadoop fs -rm -r adl://<username>.azuredatalakestore.net/testdir
                            
    [mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/
  4. Run YARN jobs with your input and output stored in ADLS:
    yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0-mapr-1710-SNAPSHOT.jar wordcount 
    adl://<username>.azuredatalakestore.net/testdir/testfile adl://<username>.azuredatalakestore.net/wordcountout