File System

Discusses the features of the Data Fabric distributed file system and compares it to the Hadoop Distributed File System (HDFS).

The Data Fabric distributed file system provides a unified data solution for structured data (tables) and unstructured data (files). The file system is fully compliant with POSIX and Hadoop and is case sensitive.

The Data Fabric file system is a random, read-write distributed file system that allows applications to concurrently read and write directly to disk. By contrast, the Hadoop Distributed File System (HDFS) has append-only writes and can only read from closed files. As HDFS is layered over the existing Linux file system, a large number of input/output (I/O) operations decrease cluster performance. The Data Fabric distributed file system also eliminates the Namenode associated with cluster failure in other Hadoop distributions, and enables special features for data management and high availability.

The storage system architecture used by the Data Fabric distributed file system is written in C/C++ and prevents locking contention, eliminating performance impact from Java garbage collection.

The following table highlights some of the features of the Data Fabric file system:
Feature Description
Storage pools A group of disks to which the Data Fabric file system writes data.
Containers An abstract entity that stores files and directories in the Data Fabric file system. A container always belongs to exactly one volume, and can hold namespace information, file chunks, or table chunks for that volume.
CLDB A service that tracks the location of every container.
Volumes A management entity that stores and organizes containers. Used to distribute metadata, set permissions on data in the cluster, and for data backup. A volume consists of a single name container, and a number of data containers.
Direct Access NFS Enables applications to read and write data directly on to the cluster.
POSIX Clients The loopbacknfs and FUSE-based POSIX clients connect to one or more Data Fabric clusters, and allow app servers, web servers, and applications to write data directly and securely to the Data Fabric cluster.