HPE Ezmeral Data Fabric 6.2 is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About Release 6.2
This site contains documentation for HPE Ezmeral Data Fabric release 6.2 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
6.2 Installation
This section contains information about installing and upgrading HPE Ezmeral Data Fabric software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a HPE Ezmeral Data Fabric cluster.
6.2 Data Fabric
HPE Ezmeral Data Fabric is the industry-leading data platform for AI and analytics that solves enterprise business needs.
- HPE Ezmeral Data Fabric File Store
  HPE Ezmeral Data Fabric File Store is a distributed file system for data storage, data management, and data protection. File Store supports mounting and cluster access via NFS and FUSE-based POSIX clients (basic, platinum, or PACC) and also supports access and management via HDFS APIs.
- HPE Ezmeral Data Fabric Database
  HPE Ezmeral Data Fabric Database is an enterprise-grade, high-performance, NoSQL database management system that you can use for real-time, operational analytics.
  - Architecture
    HPE Ezmeral Data Fabric Database is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and key-value data models.
    - HPE Ezmeral Data Fabric Database and File Store
      Describes how HPE Ezmeral Data Fabric Database tables are implemented directly in the Data Fabric file system, which allows HPE Ezmeral Data Fabric Database to leverage the same architecture as the rest of the platform and results in minimal additional management.
    - Cluster Scalability
      Information about and location of tables (and files) is not tracked directly, but through file system containers by the CLDB. As this architecture keeps the CLDB size small, it becomes practical to store 10s of exabytes in a data-fabric cluster, regardless of the number of tables and files.
    - High Availability
      Due to the way updates to table regions (also called tablets) are applied and replicated, data in table regions are instantly available. Tables and table regions are part of abstract entities called containers that provide the automatic replication of table regions (with a default of three) across the nodes of a cluster.
    - Multi-Tenancy
      Since HPE Ezmeral Data Fabric Database tables are created in volumes, when you restrict the volume, you also restrict the table data. If a volume is restricted to a subset of a cluster's nodes, then it allows you to isolate sensitive data or applications, and even use heterogeneous hardware in the cluster for specific workloads.
    - Snapshots
      Since HPE Ezmeral Data Fabric Database tables are created in volumes, you can use a volume snapshot to capture the state of a volume's directories, HPE Ezmeral Data Fabric Database tables, and files at an exact point in time.
    - Mirroring
      Since HPE Ezmeral Data Fabric Database tables are created in volumes, volume mirroring lets you automatically replicate differential data across clusters and is done so, as designated, through the use of mirror schedules or through a manual mirroring operation one time without defining a schedule. Consider mirroring volumes to create disaster recovery solutions for databases or provide read-only access to data from multiple locations.
    - Replication
      Automatically replicating differential data across clusters is possible when coupling this feature with volume mirroring processes. Consider using replication to allow for reliable data protection and uninterrupted access to data, in addition to combining its features with mirroring for data recovery features.
    - OJAI Distributed Query Service
      OJAI queries either directly access HPE Ezmeral Data Fabric Database JSON or leverage the OJAI Distributed Query Service. The OJAI Distributed Query Service provides distributed query support for HPE Ezmeral Data Fabric Database JSON, powered by Apache Drill. The data-fabric client automatically determines whether OJAI queries benefit from using the OJAI Distributed Query Service, when the service is available. This section describes the architecture, including the code paths and components involved. It also discusses queries that originate from Drill SQL, which leverage the full functionality of Drill.
  - Data Models
    HPE Ezmeral Data Fabric Database can be used as both a document database and a column-oriented database. As a document database, JSON documents are stored in HPE Ezmeral Data Fabric Database JSON table. As a column-oriented database, binary files are in stored HPE Ezmeral Data Fabric Database binary tables.
  - Secondary Indexes
    Beginning with data-fabric 6.0, HPE Ezmeral Data Fabric Database JSON natively supports secondary indexes on fields in JSON tables. Indexes provide you with flexible, high performance access to data stored in HPE Ezmeral Data Fabric Database.
  - Change Data Capture
    The Change Data Capture (CDC) system allows you to capture changes made to data records in HPE Ezmeral Data Fabric Database tables (JSON or binary) and propagate them to a HPE Ezmeral Data Fabric Streams topic.
  - Table Replication
    You can replicate data in one table to another table that is in the same cluster or in a separate cluster. This type of replication is in addition to the automatic replication that occurs with table regions within a volume.
  - Gateways for Indexing HPE Ezmeral Data Fabric Database Data in Elasticsearch
    As of data-fabric 6.0, HPE Ezmeral Data Fabric Database Elastic Search integration capability is deprecated and no longer available in the HPE Ezmeral Data Fabric Database product.
- HPE Ezmeral Data Fabric Streams
  HPE Ezmeral Data Fabric Streams brings integrated publish and subscribe messaging to the Data Fabric Converged Data Platform.
- Kubernetes Interfaces for Data Fabric
  This section describes the Kubernetes Interfaces for Data Fabric, which include the Container Storage Interface (CSI) driver for multiple container-orchestration systems, and the FlexVolume driver for Kubernetes.
- Cluster Management
  Provides a synopsis of the various cluster components and their management.
- Performance
  Describes how to tune system performance, manage RDMA, and optimize CLDB tables.
- Security
  Provides an overview of the data-fabric security features.
- YARN
- Client Connections
  The following sections describe how a client connects to local and remote data-fabric clusters.
6.2 Administration
This section describes how to manage the nodes and services that make up a cluster.
6.2 Development
This section contains information related to application development for Ezmeral ecosystem components and HPE Ezmeral Data Fabric products, including the file system, Database (Key-Value and JSON), and Event Streams.
Other Docs
This section contains release-independent information, including: Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other data-fabric version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Cluster Scalability

Information about and location of tables (and files) is not tracked directly, but through file system containers by the CLDB. As this architecture keeps the CLDB size small, it becomes practical to store 10s of exabytes in a data-fabric cluster, regardless of the number of tables and files.

The location of containers in a cluster is tracked by that cluster's container location database (CLDB). CLDBs are updated only when a container is moved, a node fails, or as a result of periodic block change reports. The update rate, even for very large clusters, is therefore relatively low. The data-fabric filesystem does not have to query the CLDB often, so it can cache container locations for very long times.

Moreover, CLDBs are very small in comparison to Apache Hadoop namenodes. Namenodes track metadata and block information for all files, and the locations for all blocks in every file as well. As blocks are typically 200 MB in size on an average, the total number of objects that a namenode tracks is very large. CLDBs, however, track containers, which are much larger objects, so the size of the location information can be 100 to 1000 times smaller than the location information in a namenode. CLDBs do not track information about tables and files.