Impala Installation Steps

This topic provides instructions for using package managers to download and install Impala from the EEP repository.

About this task

IMPORTANT This component is deprecated. Hewlett Packard Enterprise recommends using an alternate product. For more information, see Discontinued Ecosystem Components.
Install the Impala package on nodes in the cluster that you have designated to run Impala. Install the Impala server on every node designated to run impalad. Install the statestore and catalog packages on only one node in the cluster.
NOTE It is recommended that you install statestore and catalog together on a separate machine from the Impala server.

Complete the following steps to install Impala, impala-server, statestore, and catalog:

Procedure

  1. Install the mapr-impala package on all the nodes designated to run Impala. To install the package, issue the following command:
    $ sudo yum install mapr-impala
  2. In /opt/mapr/impala/impala-<version>/conf/env.sh, complete the following steps:
    1. Verify that the statestore address is set to the address where you plan to run the statestore service.
      IMPALA_STATE_STORE_HOST=<IP address hosting statestore>
    2. Change the catalog service address to the address where you plan to run the catalog service.
      CATALOG_SERVICE_HOST=<IP address hosting catalog service>
    3. Add the mem_limit and num_threads_per_disk parameters to IMPALA_SERVER_ARGS to allocate a specific amount of memory to Impala, and limit the number of threads that each disk processes per impala server daemon. The default Impala memory setting is high, which can result in conflict between Impala and other frameworks running in the cluster. Adding these parameters can alleviate any potential resource conflicts.
      export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
          -log_dir=${IMPALA_LOG_DIR} \
          -state_store_port=${IMPALA_STATE_STORE_PORT} \
          -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \
          -catalog_service_host=${CATALOG_SERVICE_HOST} \
          -be_port=${IMPALA_BACKEND_PORT} \
          -mem_limit=<absolute notation or percentage of physical memory> \
          -num_threads_per_disk=<n>}

    See Additional Impala Configuration Options for more information about these options and other options that you can modify in env.sh.

    WARNING

    The default maximum heap space allocated to the file system file server should provide enough memory for the file system file server to run concurrently with Impala, however you can modify it if needed. To modify the maximum heap space, navigate to /opt/mapr/conf/warden.conf, and change the service.command.mfs.heapsize.maxpercent parameter. Issue the following command to restart Warden after you modify the parameter:

    service mapr-warden restart

    Refer to warden.conf for more Warden configuration information.

  3. Verify that the following property is configured in hive-site.xml on all the nodes:
    <property>
            <name>hive.metastore.uris</name>
            <value>thrift://<metastore_server_host>:9083</value>
    </property>
  4. Install the Impala components.
    1. To install the statestore service, issue the following command:
      $ sudo yum install mapr-impala-statestore
    2. To install the catalog service, issue the following command:
      $ sudo yum install mapr-impala-catalog
    3. To install the Impala server, issue the following command:
      $ sudo yum install mapr-impala-server 
  5. Run configure.sh to refresh the node configuration.
    /opt/mapr/server/configure.sh -R
  6. If the Hive metastore has MapR-SASL enabled, copy $HIVE_HOME/conf/hive-site.xml to $IMPALA_HOME/conf/. Repeat this step any time hive-site.xml is modified.

Results

At this point, the Impala servers, catalog, and statestore should be running. For instructions on how to run a simple Impala query and how to query HPE Ezmeral Data Fabric Database tables, refer to Example: Running an Impala SQL Query and Query HPE Ezmeral Data Fabric Database and HBase Tables with Impala.