Cluster Scalability

Information about and location of tables (and files) is not tracked directly, but through file system containers by the CLDB. As this architecture keeps the CLDB size small, it becomes practical to store 10s of exabytes in a data-fabric cluster, regardless of the number of tables and files.

The location of containers in a cluster is tracked by that cluster's container location database (CLDB). CLDBs are updated only when a container is moved, a node fails, or as a result of periodic block change reports. The update rate, even for very large clusters, is therefore relatively low. The data-fabric filesystem does not have to query the CLDB often, so it can cache container locations for very long times.

Moreover, CLDBs are very small in comparison to Apache Hadoop namenodes. Namenodes track metadata and block information for all files, and the locations for all blocks in every file as well. As blocks are typically 200 MB in size on an average, the total number of objects that a namenode tracks is very large. CLDBs, however, track containers, which are much larger objects, so the size of the location information can be 100 to 1000 times smaller than the location information in a namenode. CLDBs do not track information about tables and files.