Planning and Initial Deployment

There are a number of considerations to take into account before migrating from Apache Hadoop to MapR Hadoop.

The first phase of migration is planning. In this phase you will identify the requirements and goals of the migration, identify potential issues in the migration, and define a strategy.

The requirements and goals of the migration depend on a number of factors:

  • Data migration: can you move your datasets individually, or must the data be moved all at once?
  • Downtime: can you tolerate downtime, or is it important to complete the migration with no interruption in service?
  • Customization: what custom patches or applications are running on the cluster?
  • Storage: is there enough space to store the data during the migration?

The MapR Hadoop distribution is 100% plug-and-play compatible with Apache Hadoop, so you do not need to make changes to your applications to run them on a MapR cluster. MapR Hadoop automatically configures compression and memory settings, task heap sizes, and local volumes for shuffle data.

Initial Deployment

The initial MapR deployment phase consists of installing, configuring, and testing the MapR cluster and any ecosystem components (such as Hive or Pig) on an initial set of nodes. Once you have the MapR cluster deployed, you will be able to begin migrating data and applications.

To deploy the MapR cluster on the selected nodes, see the Installing MapR and MapR Ecosystem Components