Sqoop

IMPORTANT This component is deprecated. Hewlett Packard Enterprise recommends using an alternate product. For more information, see Discontinued Ecosystem Components.

Apache Sqoop™ is a tool designed to efficiently transfer bulk data between Apache Hadoop and structured datastores, such as relational databases.

This documentation provides information for using Sqoop and Sqoop2, but does not duplicate the Apache Sqoop™ documentation on the Apache Sqoop website.

The following table describes the differences between Sqoop1 or Sqoop2:

Feature Sqoop1 Sqoop2
Specialized connectors for all major RDBMS Available.

Not available. However, you can use the generic-jdbc-connector , which has been tested on these databases:

  • MySQL
  • Microsoft SQL Server
  • Oracle (Not supported in Sqoop 1.99.7)
  • PostgreSQL

The generic JDBC connector should also work with any other JDBC-compliant database, although specialized connectors probably give better performance.

Data transfer from RDBMS to Hive Done automatically.

Must be done manually in two stages:

  1. Import data from RDBMS into MapR File System.
  2. Load data into Hive using the LOAD DATA command
    NOTE As of Sqoop 1.99.7, you can also use the kite-connector to load data into Hive.
Data transfer from Hive to RDBMS

Must be done manually in two stages:

  1. Extract data from Hive into MapR File System, as a text file or as an Avro file.
  2. Export the output of step 1 to an RDBMS using Sqoop.

Must be done manually in two stages:

  1. Extract data from Hive into MapR File System, as a text file or as an Avro file.
    NOTE As of Sqoop 1.99.7, you can also use the kite-connector to extract data from Hive.
  2. Export the output of step 1 to an RDBMS using Sqoop.
Integrated Kerberos security Supported. Supported.
Password encryption Not supported. Supported as of Sqoop 1.99.7.