Accessing Data with NFS v3

Describes how data-fabric works with NFS v3.

Unlike other Hadoop distributions that only allow cluster data import or import as a batch operation, Data Fabric lets you mount the cluster itself using NFS for the HPE Ezmeral Data Fabric so that your applications can read and write data directly. Data Fabric allows direct file modification and multiple concurrent reads and writes using POSIX semantics. With a NFS-mounted cluster, you can read and write data directly with standard tools, applications, and scripts. For example, you could run a MapReduce application that outputs to a CSV file, then import the CSV file directly into SQL using NFS for the HPE Ezmeral Data Fabric.

Data Fabric exports each cluster as the directory /mapr/<cluster name> (for example, /mapr/my.cluster.com). If you create a mount point with the local path /mapr, then Hadoop FS paths and NFS v3 paths to the cluster will be the same. This makes it easy to work on the same files using NFS v3 and Hadoop. In a multi-cluster setting, the clusters share a single namespace, and you can see them all by mounting the top-level /mapr directory.

WARNING Data Fabric uses version 3 of the NFS protocol. NFS version 4 bypasses the port mapper and attempts to connect to the default port only. If you are running NFS on a non-standard port, mounts from NFS version 4 clients time out. Use the -o nfsvers=3 option to specify NFS v3.
CAUTION It is observed that NFS v3 clients are caching older atime values from previous history. Therefore, you might observe wrong atime values. To mitigate, make sure to clear caches, before checking file timestamps.

You can mount the cluster on a Linux, Mac, or Windows client. Before you begin, make sure you know the hostname and directory of the NFS v3 share you plan to mount.