MapR File System

The MapR Data Platform provides a unified data solution for structured data (tables) and unstructured data (files).

MapR File System (MapR-FS) is a random read-write distributed file system that allows applications to concurrently read and write directly to disk. The Hadoop Distributed File System (HDFS), by contrast, has append-only writes and can only read from closed files. Because HDFS is layered over the existing Linux file system, a greater number of input/output (I/O) operations decrease the cluster’s performance. MapR-FS also eliminates the Namenode associated with cluster failure in other Hadoop distributions, and enables special features for data management and high availability.

The storage system architecture used by MapR-FS is written in C/C++ and prevents locking contention, eliminating performance impact from Java garbage collection.

The following table highlights some of the MapR-FS features:
Feature Description
Storage pools A group of disks that MapR-FS writes data to.
Containers An abstract entity that stores files and directories in MapR-FS. A container always belongs to exactly one volume and can hold namespace information, file chunks, or table chunks for the volume the container belongs to.
CLDB A service that tracks the location of every container.
Volumes A management entity that stores and organizes containers. Used to distribute metadata, set permissions on data in the cluster, and for data backup. A volume consists of a single name container and a number of data containers.
Direct Access NFS Enables applications to read data and write data directly into the cluster.
POSIX Clients The loopbacknfs and FUSE-based POSIX clients connect to one or more MapR clusters and allow app servers, web servers, and applications to write data directly and securely to the MapR cluster.

For more information, see MapR-FS on MapR Converge Community.