Configuration Parameters

This topic describes configuration parameters that are either specific to HPE Ezmeral Data Fabric Streams or supported from Apache Kafka.

Table 1. AdminClient configuration parameters specific to HPE Ezmeral Data Fabric Streams
Parameter Description
streams.admin.default.stream This parameter, when set during creation of the AdminClient instance, ensures that the specified stream is using the the AdminClient instance for all administrative operations.

Syntax:

/mapr/<cluster name>/<volume name>/<stream name>
streams.rpc.timeout.ms
Specifies the length of time in milliseconds to wait for a response from the HPE Ezmeral Data Fabric Streams server if soft mount is configured (fs.mapr.hardmount is set to false). Default: 120000 Minimum: 30000
NOTE Applicable as of MapR 6.0.1, is used instead of fs.mapr.rpc.timeout

For producer and consumer applications, make sure the streams.rpc.timeout.ms configuration value for both producers and consumers is set to greater than 50000 to avoid Message Fetch RPC overload.

Table 2. Consumer configuration parameters specific to HPE Ezmeral Data Fabric Streams
Parameter Description
streams.consumer.buffer.memory Specifies how much memory to use for caching pre-fetched messages. Messages that are in subscribed topics and partitions are pre-fetched and cached to improve performance. Default 64MB
streams.consumer.default.stream Specifies the path and name of the stream that the consumer subscribes to if, when subscribing to a topic, the consumer does not specify a stream.
streams.rpc.timeout.ms

Specifies the length of time in milliseconds to wait for a response from the HPE Ezmeral Data Fabric Streams server if a soft mount is configured (fs.mapr.hardmount is set to false). Default: 305000 Minimum: 300000

For producer and consumer applications, make sure the streams.rpc.timeout.ms configuration value for both producers and consumers is set to greater than 50000 to avoid Message Fetch RPC overload.

Table 3. Consumer configuration parameters supported from Apache Kafka
Parameter Description
auto.commit.interval.ms The frequency in milliseconds that the offsets are committed. Default: 1000ms
auto.offset.reset Specifies what HPE Ezmeral Data Fabric Streams should do when there is no initial offset, such as when a consumer starts reading from a partition. Default: latest
earliest
Reset the offset to the offset of the earliest message in the partition.
latest
Reset the offset to the offset of the latest message in the partition.
enable.auto.commit If true, periodically commits the highest offsets of the messages fetched by the consumer in all of the partitions for the topics that the consumer is subscribed to. Default: true
fetch.min.bytes The minimum amount of data the server should return for a fetch request. If insufficient data is available, the server will wait for this minimum amount of data to accumulate before answering the request.

This minimum applies to the totality of what a consumer has subscribed to.

Works in conjunction with the timeout interval that is specified in the poll function. If the minimum number of bytes is not reached by the time that the interval expires, the poll returns with nothing.

For example, suppose the value is set to 6 bytes and the timeout on a poll is set to 100ms. If there are 5 bytes available and no further bytes come in before the 100ms expire, the poll returns with nothing. Default: 1 byte

fetch.max.bytes
The maximum amount of data the server should return for a fetch request. If the first record batch in the first non-empty partition of the fetch is larger than this configuration, the record batch is still returned to ensure that the consumer can make progress.
NOTE This parameter is new as of MapR 6.0.1.
fetch.max.wait.ms The maximum amount of time the HPE Ezmeral Data Fabric Streams server will block before answering the fetch request if there isn't sufficient data to satisfy the requirement given by fetch.min.bytes.
group.id A string 2457 up to bytes long that uniquely identifies the group of consumer processes to which this consumer belongs. By setting the same group ID, multiple consumer processes indicate that they are all part of the same consumer group. Putting consumers into groups provides benefits that are described in Consumer Groups.

It is possible for a single consumer to be in a group.

max.poll.records Places an upper bound on the number of records returned from each call.
NOTE This parameter is new as of MapR 6.0.1.
max.partition.fetch.bytes The number of bytes of message data to attempt to fetch for each partition in each poll request. These bytes will be read into memory for each partition, so this parameter helps control the memory that the consumer uses. Default: 64KB

The size of the poll request must be at least as large as the maximum message size that the server allows or else it is possible for producers to send messages that are larger than the consumer can fetch.

If the first record batch in the first non-empty partition of the fetch is larger than this configuration, the record batch is still returned to ensure that the consumer can make progress.
NOTE This is a behavior change as of MapR 6.0.1.

Table 4. Producer configuration parameters specific to HPE Ezmeral Data Fabric Streams
Parameter Description
streams.buffer.max.time.ms Messages are buffered in the producer for at most the specified time. A thread will flush all the messages that have been buffered for more than the time specified. Default: 3 * 1000 msec create default stream
streams.parallel.flushers.per.partition If enabled, producer may have multiple parallel send requests to the server for each topic partition. If this setting is set to true, it is possible for messages to be sent out of order. Default: true create default stream
streams.producer.default.stream Specifies the stream that the producer will use by default if the producer does not provide the name of a stream when specifying a topic to write to.
Syntax:
/mapr/<cluster name>/<volume name>/<stream name>
create default stream
fs.mapr.hardmount Specifies whether to use a hard mount or a soft mount for connections to the MapR Streams server.

The default is to use a hard mount and the value is true.

If a value for this parameter is set in the core-site.xml file, the value in that file is ignored.

create default stream
fs.mapr.rpc.timeout Specifies the length of time in seconds to wait for a response from the HPE Ezmeral Data Fabric Streams server if the configuration parameter fs.mapr.hardmount is set to false. Default: 300. Minimum value: 30.
NOTE Applicable to MapR 6.0.0 and earlier. As of MapR 6.0.1, use streams.mapr.timeout.ms.

If a soft mount is used, the time expires while a producer waits for a response from the HPE Ezmeral Data Fabric Streams server, and the producer used the KafkaProducer.send(ProducerRecord<K,V> record, Callback callback) method, the callback is invoked with the error EAGAIN, which means "Resource temporarily unavailable."

create default stream
streams.rpc.timeout.ms

Specifies the length of time in milliseconds to wait for a response from the HPE Ezmeral Data Fabric Streams server if soft mount is configured (fs.mapr.hardmount is set to false). Default: 30000 Minimum: 30000

For producer and consumer applications, make sure the streams.rpc.timeout.ms configuration value for both producers and consumers is set to greater than 50000 to avoid Message Fetch RPC overload.

Table 5. Producer configuration parameters supported from Apache Kafka
Parameter Description
buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are generated faster than they can be delivered to the server the producer will block. Default: 33554432
client.id Producers can tag records with a client ID that identifies the producer. Consumers can then be aware of which producer sent a message or set of messages. Apache Drill or other analytic tools querying messages can include this ID in the filters for their queries. Default: No client ID.
metadata.max.age.ms The producer generally refreshes the topic metadata from the server when there is a failure. It will also poll for this data regularly. Default: 300 * 1000 msec