Mirroring Topics from an Apache Kafka Cluster to the HPE Cluster

You can use MirrorMaker to mirror data continuously from Apache Kafka clusters to streams in HPE Ezmeral Data Fabric Streams clusters.

Prerequisites

  • Because this procedure requires that MirrorMaker be run from the HPE Ezmeral Data Fabric cluster, ensure that the mapr-kafka package is installed on the node that you choose to run MirrorMaker from.
  • Configure the node as a mapr client.
  • Ensure that the ID of the user that runs MirrorMaker has the produceperm and topicperm permissions on the destination stream.

About this task

Alternatively, you can stop mirroring after you migrate the consumers and producers for your applications from your Apache Kafka cluster to your data-fabric cluster where the stream is located. See in Migrating Apache Kafka 0.9.0 Applications to HPE Ezmeral Data Fabric Streams for details. After you start MirrorMaker, it launches a configurable number of consumer threads to read topics that are in a Kafka cluster and a number of producers to write the messages from those topics into topics in HPE Ezmeral Data Fabric Streams.

Figure 1. Mirroring from Apache Kafka to HPE Ezmeral Data Fabric Streams

Before running MirrorMaker, you create a file that contains the required configuration parameters for the consumers that read from the Apache Kafka cluster. You also create a file that contains the required configuration parameters for the producers that publish to the stream in the HPE Ezmeral Data Fabric cluster. You point to these files in the MirrorMaker command.

To specify which topics you want to mirror, use the whitelist parameter to provide a Java-style regular expression that matches the names of the topics that you want mirrored.

Procedure

  1. Create a file that contains the required properties and values for consumers to use. When you run MirrorMaker, you point to this file by using the consumer.config parameter.
    The descriptions of these properties, except for the last, are taken from the documentation for Apache Kafka. The last property is not documented.
    Property Description
    group.id A unique string that identifies the consumer group the consumers started by MirrorMaker belong to.
    bootstrap.servers A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,.... Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
  2. Create a file that contains the required properties and values for producers to use. When you run MirrorMaker, you point to this file by using the producer.config parameter.
    Property Description
    streams.producer.default.stream Specifies the path and name of the stream in the HPE Ezmeral Data Fabric cluster that the topics will be mirrored to.
    auto.create.topics.enable Set the value to true. The producers will therefore be able to create topics in the destination stream automatically.
  3. Run MirrorMaker with this command to start mirroring topics from Apache Kafka to HPE Ezmeral Data Fabric Streams:

    Syntax

    /opt/mapr/kafka/kafka-0.9.0/bin/kafka-mirror-maker.sh  
    --consumer.config <File that lists consumer properties and values>  
    --num.streams <Number of consumer threads>  
    --producer.config <File that lists producer properties and values>  
    --whitelist=<Java-style regular expression for specifying the topics to mirror> 
    Parameter Description
    consumer.config The path and name of the file that lists the consumer properties and their values.
    num.streams Use this option to specify the number of mirror consumer threads to create. Note that if you start multiple mirror maker processes then you may want to look at the distribution of partitions on the source cluster. If the number of consumption streams is too high per mirror maker process, then some of the mirroring threads will be idle by virtue of the consumer rebalancing algorithm (if they do not end up owning any partitions for consumption).
    producer.config The path and name of the file that lists the producer properties and their values.
    whitelist A Java-style regular expression for specifying the topics to copy. Commas (',') are interpreted as the regex-choice symbol ('|').

    This parameter is required.

Example

In this example, the file that lists the properties and values for the consumers that will read messages from the topics in Apache Kafka is named consumers.props. It contains this list:

group.id=cg1
bootstrap.servers=10.10.100.87:9093
shallow.iterator.enable=false

The file that lists the properties and values for the producers that will publish messages to topics in HPE Ezmeral Data Fabric Streams is named producers.props. It contains this list:

streams.producer.default.stream=/newStream
auto.create.topics.enable=true

The topics to mirror all have names that begin with na_west. When running the command, we can use "na_west.*" as the regular expression to use for the whitelist parameter.

Here is the command:

bin/kafka-mirror-maker.sh --consumer.config consumers.props
--num.streams 2 --producer.config producers.props --whitelist="na_west.*"