Column Families in JSON Tables

JSON tables store data in column families. A column family is a collection of fields that are stored together on disk. You can use column families to improve the performance of your queries.

Each table has a default column family, which is default storage for all fields in the documents of a table. You can create additional column families to store data for a collection of fields in a separate location on disk. Queries and other operations that only run on the data stored in a column family are more efficient and better performing than queries on the same data when that data is stored with other data in a table. You can also cache values from a column family in memory.

Default Column Families

Suppose you have three JSON documents in a table and all three documents have the field a.

Figure 1. Schematic diagrams of three JSON documents, showing fields but not values, each document with a field named a

At this point, you have not created any non-default column families. So, all of the data in the table resides in the default column family. Each JSON table is created with a default column family.

Using Column Families to Optimize Data Access

To optimize data access for your applications, you plan to place some data that will be heavily queried in a new column family at path a.b, where b is a field that does not exist yet. Fields do not have to exist before you create column families on them.

Figure 2. The same three JSON documents, showing where the new column family will be created

You create a column family at the path a.b with the name CF1.

When you create field b, it will belong to the column family CF1. All values of b, as well as the values of all fields that might be created after b, will be stored together on disk. Applications can read data directly from this column family and avoid reading the rest of the document at the same time, making queries faster and more efficient.

Figure 3. The three JSON documents with column family CF1 in black

Creating Multiple Column Families

You can create up to 64 column families in a JSON table. The column families can be at any location in your documents. For example, these two documents both use the same non-default column families at the paths a.b, a.b.c, and d.

Figure 4. Two JSON documents that use the same non-default column families are highlighted in orange, blue, and green

Column Family Best Practices

If the path at which you want to create a column family already exists, it is recommended that the path and any fields under it contain no data. After the conversion of the path to a column family, it is possible that data existing in the path before the conversion could become inaccessible.

Applications and Column Families

Applications do not need to be aware of the existence of column families. They perform CRUD operations using the paths of fields in a document. For example, to update any of the fields under a.c, an application does not need to be aware that the field is in the column family at the path a.c. The application simply moves through the document along the path to the field.

Column Family Limitation

You cannot define column families across array type fields, for example:
maprcli table cf create -path /tbl-mcf -cfname abc -force true -jsonpath a.b[0]
ERROR (22) -  Malformed path "a.b[0]", valid format is like "a.b.c".
For information about array fields, see JSON Document Field Paths.