Creating Column Families

Explains how to create column families using either the Control System, the CLI, or the HBase shell.

About this task

There are several methods that you can use to create column families in HPE Ezmeral Data Fabric Database tables. To create column families, you must have the following permissions:

  • readAce and writeAce on the volume
  • lookupdir on directories in the path to the table
  • createrenamefamilyperm on the table

Creating Column Families Using the Control System

About this task

To create a column family from the Control System, under Data > Tables:
NOTE This option is not available on the Kubernetes version of the Control System.

Procedure

  1. Click:
    • Take me to Add Column Family after creating a new table.
    • Add Column Family from the All pane of Column Families tab under the table information page.

      See Viewing Table Information.

  2. Specify the following properties in the Properties pane of the Add Column Family page to set up a column family, as needed. See the table below for information on fields shown under the Properties pane of the Add Column Family page.
    Field Field Description
    Column Family Name The name of the column family.
    JSON Path The path to the column family in dotted notation. For example, suppose the table contained JSON documents that were of this general structure:
    {
         "_id" : "ID",
         "a" :
              {
                   "b" : 
                        {
                             "c" : "value",
                        },
                   "e" : "value"
              }
    }
    If you want to create a column family at the field d nested within b, your new path would be a.b.d.
    NOTE Ensure that the field at which you want to create the column family does not yet exist. If the field exists, it could become inaccessible after the column family is created.
    Compression The compression setting to use for the column family. Valid options are off, lzf, lz4, and zlib. The default setting is the same as the compression setting for the directory where the table is located.
    Time-to-Live Specifies whether to purge data when the age of the data in this column family exceeds the value specified here. Data can remain forever or can be purged after specified amount of time (in seconds). Setting the value to 0 is equivalent to allowing data to remain indefinitely or forever.
    NOTE If the value for an existing column family in a JSON table is not 0, you cannot add another column family.
    In Memory Determines whether preference is given to values of this column family for storage with row keys. Because row keys are cached in memory in preference to row data, column-family data that is stored inline with the row keys is also cached in memory.

    For all column families in a table together, up to 200 bytes of row data will be stored inline with each row key. Storing data inline with a row key might speed retrieval of the data from a column family because disk access can often be avoided. For each column family, up to 32 bytes can be stored inline with each row key even if this is disabled (No), but preference will be given to column families where this is enabled (Yes). A column family can have more than 32 bytes stored inline if this is enabled.

    If the total number of bytes for all column families together exceeds 200 for a row, then preference for inclusion within the inline storage for that row is given to column families that have this enabled.

    NOTE All of the data for a column family will be stored in-line with the row key, or none will be. If the contents in a column family for a particular row are larger than the maximum number of bytes that are allowed to be stored for that column family, no data will be stored in-line for that column family.
    By default, this is enabled.
    Field Field Description
    Column Family Name The name of the column family.
    Version
    • Minimum — The minimum number of versions of column values to keep. The default is zero.
    • Maximum — Maximum number of versions of column values to keep. The default is one.
    Compression The compression setting to use for the column family. Valid options are off, lzf, lz4, and zlib. The default setting is the same as the compression setting for the directory where the table is located.
    Time-to-Live Specifies whether to purge data when the age of the data in this column family exceeds the value specified here. Data can remain forever or can be purged after specified amount of time (in seconds). Setting the value to 0 is equivalent to allowing data to remain indefinitely or forever.
    In Memory Determines whether preference is given to values of this column family for storage with row keys. Because row keys are cached in memory in preference to row data, column-family data that is stored inline with the row keys is also cached in memory.

    For all column families in a table together, up to 200 bytes of row data will be stored inline with each row key. Storing data inline with a row key might speed retrieval of the data from a column family because disk access can often be avoided. For each column family, up to 32 bytes can be stored inline with each row key even if this is disabled (No), but preference will be given to column families where this is enabled (Yes). A column family can have more than 32 bytes stored inline if this is enabled.

    If the total number of bytes for all column families together exceeds 200 for a row, then preference for inclusion within the inline storage for that row is given to column families that have this enabled.

    NOTE All of the data for a column family will be stored in-line with the row key, or none will be. If the contents in a column family for a particular row are larger than the maximum number of bytes that are allowed to be stored for that column family, no data will be stored in-line for that column family.
    By default, this is enabled.
  3. Select Basic or Advanced to set up access controls (shown under the User Access Control pane) for the displayed column family, as needed. Note that the page options displayed after selecting Basic or Advanced differ. These differences are explained below. See the JSON Table Data Access Control Permission Options or Binary Table Data Access Control Permission Options tables below for permission descriptions.
    NOTE By default, all permissions are given to the user creating the table. You can use either the default permissions that are automatically displayed or proceed to define new permissions for this column family.
    To grant or block access to users, groups, and/or roles, from the:
    • Basic settings, select the type — public, (OR) user, group, or role — from the drop-down menu, specify the name of the user, group, or role, and select one or more checkbox to grant permissions.
      TIP Click to create a copy of the associated access control setting. Click to remove the associated access control expression.
      To add Access Control Expression (ACE)s for another user, group, or role, click Add Another and repeat this step.
    • Advanced settings, specify public (p) or user (u), group (g), and/or role (r) who have or do not have the type of access using the following boolean expressions and subexpressions:
      • ! — Negation operator.
      • & — AND operation.
      • | — OR operation.
      Use (), parentheses, for subexpressions.
      NOTE You cannot specify user, group, or role individually if access is granted to all users (public).

      Alternatively, click associated with the type of access to use the Access Control Expression window to define access for public or users, group, and/or role. See Defining ACEs Using the Access Control Expression Builder for more information.

    NOTE If you switch from Basic to Advanced, the basic settings, if any, are carried over to the advanced settings. If you switch from Advanced to Basic, all the settings are lost because the subexpressions and AND (&) and negation (!) operations that are supported by advanced settings are not supported in the basic settings.
    Option Option Description
    Read Data Can do column reads. Reads require permission both at the column-family level and at the field level. This permission is inherited by fields within the column family.
    Write Data Can do column writes. Writes require permission both at the column-family level and at the field level. This permission is inherited by fields within the column family.
    Traverse Data Can pass over fields in JSON documents. For example, suppose that a JSON table contains documents of this general structure:
    {
         "_id" : "ID",
         "a" :
              {
                   "b" : "value",
                   "c" : "value"
              }
    }
    Suppose further that the user sjohnson has read permission on a.b, but not on a. For sjohnson to read a.b, the user needs the traverse permission on a. The user can then pass over field a to a.b. This permission is inherited by fields within the column family.
    Set Compression Can set or change the compression setting for the column family.
    Option Option Description
    Read Data Can do column reads. Reads require permission both at the column-family level and at the field level. This permission is inherited by fields within the column family.
    Write Data Can do column writes. Writes require permission both at the column-family level and at the field level. This permission is inherited by fields within the column family.
    Append Data Can do column appends. Column appends require permission both at the column-family level and at the column level.
    Set Version Can set or change the maximum and minimum number of versions of column values to keep.
    Set Compression Can set or change the compression setting for the column family.
  4. Click Add Column Family to add the column family to the table. The name of newly created column family appears in the All pane of the tables information page.
  5. Opt to add field permissions to the newly created column family.

Creating Column Families Using CLI or the REST API

About this task

To create a column family in a JSON table, include the parameters -jsonpath and -force :
maprcli table cf create -path <path> -cfname <name_of_column_family> -jsonpath 
                                <path> -force true
For the full list of options, see the table cf create command.
The -jsonpath parameter specifies the path to the column family. The path is in dotted notation. For example, suppose the table contained JSON documents that were of this general structure:
{
     "_id" : "ID",
     "a" :
          {
               "b" : 
                    {
                         "c" : "value",
                    },
               "e" : "value"
          }
}
You want to create a column family at the field d in the new path a.b.d because you plan to store image files in fields in that column family.
IMPORTANT Ensure that the field at which you want to create the column family does not yet exist. Also ensure that there are no secondary indexes defined on the field. If the field does exist or is a field in an index, the data in the field could become inaccessible after you create the column family.

By default, every time you try to create a non-default column family in a JSON table, this command fails and returns a warning message that you should ensure there is no existing data at the specified path. Set the -force parameter to true if you want to override this warning mechanism and create a column family.

The command to create a column family for a binary table is:
maprcli table cf create -path <path> -cfname <name_of_column_family>
For the full list of options for this command, see the table cf create command.

The format of the value of the -path parameter depends on whether you are creating a table on a local cluster or a remote cluster.

Creating a Column Family for a Binary Table Using HBase Shell

About this task

After starting the HBase shell, run the alter command. Type help to see a list of commands and their syntax.