Resolved Issues (MapR 6.1.1)

Lists the issues that were resolved in MapR version 6.1.1.

The following MapR issues, which were reported by customers, are resolved in Version 6.1.1.

Apache Kafka

MS-791: KafkaConsumer position does not honor TTL.

MS-946: Consumer poll returns messages before a subscription or re-balance operation with ConsumerRebalanceListener on PartitionsAssigned callback is complete.

MS-947: Kafka message headers should not be Null but can be empty.

Control System

MON-4844: Navigating to Usage tab when editing a volume causes OOPS error.

MON-4845: API server fails to start if the user is not mapr.

MON-4902: Default snapshot policies and times are incorrect.

MON-5075: Cannot modify existing volume quota due to insufficient permissions.

MON-5102: MCS does not allow configuring SMTP without a username and password.

MON-5131: OOPS error when adding a SMTP provider other than smtp, Office 365, and Gmail.

MON-5169: MCS call to schedule/list returns an error.

MON-5194: Incorrect scale on Y-axis for CPU Utilization graph on Overview page.

MON-5206; MON-5141: Volume modification fails when the metrics feature is not enabled.

MON-5270: Memory leak in API server.

MON-5271: Remove the limit on the number of results returned for MAC addresses when configuring a virtual IP.

MON-5272: User name should not be hardcoded as mapr in MCS.

MON-5275: MCS displays MAC addresses incorrectly.

MON-5338: I/O errors occur and the table summary page is not loaded when huge MapR DB tables are queried.
MON-5489: Volume filter options should not be pre-defined but should allow filtering on any volume attribute.
MON-5491: Detailed alarm descriptions were missing in the alarm popup view and the volume details view.
MON-5510: The Disk Space Available column uses the wrong units in the Nodes view.

MON-5517: Enhance the alarm summary command to return the alarm occurrences for each alarm type, indicate whether the alarm is a warning or an error, and specify the total number of occurrences for each alarm type.

MON-5672: The Security Certificate Expiry Alarm (NODE_ALARM_CERTIFICATE_NEAR_EXPIRATION) is incorrectly labeled.

MON-5774: API Server crashes when the tmp directories are mounted with the noexec option.

MON-5900: Resource Manager URL is invalid on Ubuntu.

Flume

MS-770: Array Index Out of Bounds Exception.

Filesystem

CORE-387: MapR user tickets get overwritten when Kerberos authentication is enabled.

CORE-416: The configure.sh script must support environments where root activity is not allowed.

CORE-427: Local Spark shuffle volumes if damaged, need to be automatically recreated when NodeManager starts.

CORE-472: The disk list command runs very slowly and fails to identify the right MapR-FS disk when more than one symlink points to the same disk.

CORE-476: Running configure.sh with the -R option, erases the value of the MAPR_JMXAUTH variable in the env_override.sh file. This prevents NodeManager from starting.

CORE-480: Volume rack path is not updated for volumes with local path set when moving nodes.

CORE-517: The configure.sh script should not silently alter permissions in the /etc/shadow file. Added the -no-auto-permission-update option as the fix.

CORE-566: The hoststats process crashes continuously and causes cluster auditing to fail.

CORE-571: Oozie server does not start because of a missing jar file.

MFS-1984: The maprcli dashboard info command returns incorrect compression statistics.

MFS-1985: POSIX Client service (FUSE) does on auto start on system reboot on Ubuntu 16.

MFS-2015: The maprcli dashboard info command returns incorrect memory statistics.

MFS-2019: API Server hangs intermittently and fails to access CLDB servers, in a multi-NIC environment.

MFS-2051: Improve the clarity of NFS logs.

MFS-2055: CLDB crashes when processing alarms.

MFS-2062: Remote mirroring fails repeatedly even after a source CLDB that went down is restarted and operational.

MFS-2143: MFS does not preserve excluded volume audit data operations on restart.

MFS-2144: Spark streaming tasks are stuck indefinitely when looking up tablets.

MFS-2209: Exception occurs when calling the getVolName() function.

MFS-2211: Name Container master freezes during resync of orphan entries and causes MFS and the resync operation to restart frequently.

MFS-2218: MFS randomly crashes if errors occur when reading data.

MFS-2260: MapR jobs fail since the file client fails to check the MapR filesystem to determine the status of the RPCs sent previously to MFS, before resending them.

MFS-2266: Spark encounters RPC errors when reading files from volumes with wiresecurity enabled.

MFS-2273: Storage Pools fail randomly when MFS is restarted, and many containers go offline without a valid replica.

MFS-2275: Spark jobs fail intermittently when trying to retrieve rows from MapR DB tables.

MFS-2294: CLDB crashes when registering NFS version 4 servers.

MFS-2298: CRC errors occur randomly in Storage Pools and cause them to go offline.

MFS-2306: Master CLDB crashes when MFS nodes are added or removed frequently.

MFS-2307: Add an internal cluster level flag to prevent storage pools from going offline when Read CRC errors are encountered.

MFS-2323: NFS Server version 4 boot script should not contain hardcoded user and group (mapr:mapr).

MFS-2343: The node list command should not display nodes which contain only the POSIX client (edge node). Using the node list command without the -clientsonly true or the -nfsnodes true option, does not list edge nodes. To include edge nodes, use the -nfsnodes true or the -clientsonly true option.

MFS-2344: The NODE_ALARM_NO_HEARTBEAT (No Heartbeat) alarm should not be raised for POSIX clients (edge nodes). CLDB has a new parameter cldb.ignore.posix.only.hb.alarm that controls whether this alarm is raised for edge nodes.

MFS-2392: gfsck fails on secure clusters due to a missing library.

MFS-2423: POSIX only clients should be immediately removed when marked dead.

MFS-2444: FUSE process does not remove shared memory segments resulting in volumes failing to mount.

MFS-2462: Crash in MapR DB when looking up role memberships.

MFS-2498: FUSE does not honor the product build value.

MFS-2573: Mirroring fails with a CLBD internal error when an invalid container ID is found on the source cluster.

MFS-2610: Persistent Volume mounts hang when tickets expire.

MFS-2628: Null Pointer Exception in CLDB Server.

MFS-2630: The mrconfig info threads command crashes the MFS process when attempting to retrieve volume aces, when Extended Attributes are not enabled.

MFS-2631: CLDB shuts down when adding or removing NFS servers.

MFS-2632: CLDB operations fail with the Server Retry error.

MFS-2659: MFS process hangs intermittently in environments with multiple NICs.

MFS-2694: Stack overflow in MFS when deleting a container chain.

MFS-2695: Cross cluster mirroring fails after enabling the Snapshot Lite feature.

MFS-2725: NFS Server version 3 crashes randomly when trying to satisfy mount requests.

MFS-2731: MFS does not automatically retry reconnecting to CLDB after a connection reset request.

MFS-2732: RPC connections between MFS and CLDB fail intermittently with Connection Reset by Peer errors.

MFS-2757: MapR service status is displayed incorrectly due to change in systemd.

MFS-2767: File client tries to connect to the same failed CLDB node repeatedly.

MFS-4480: NFS Server version 4 crashes intermittently.

MFS-4485: FUSE client fails to work with a scoped impersonation ticket.

MFS-4531: Snapshots of mirrors are not deleted after mirroring completes.

MFS-4551: loopbacknfs does not log any messages to the loopbacknfs.log file.

MFS-4562: File stat on the FUSE mount indicates the block size as a fixed value (512) instead of the client's block size.

MFS-4585: CLDB exception occurs when a NFS heartbeat reports a failed Virtual IP.

MFS-4597: Warden and the maprcli command intermittently cannot start dependent services.

MFS-4605: CLDB shuts down when ACL size exceeds the threshold value.

MFS-4667: Replicated operations fail and cause frequent resyncs.

MFS-4776: FUSE RPCs fail intermittently.

MFS-5356: The getAces() API raises a Null Pointer Exception when called on a non-existing object.

MFS-5405: Client sends the NODE_ALARM_SERVICE_NODEMANAGER_DOWN alarm but CLDB raises the NODE_ALARM_SERVICE_OPENTSDB_DOWN alarm.

MFS-5422: The create() API does not create files with the same permissions as the parent directory.

MFS-5430: NFS Server is unable to parse lines exceeding 8K characters in the exports file.

MFS-5482: Memory leak in CLDB master instance.

MFS-5488: Jobs on random nodes fail to create FileClient.

MFS-5502: Path lookup error occurs when client nodes run a newer version of MapR than the CLDB server nodes.

MFS-5711: Unable to access files on EC tier due to I/O error and StripeletIO failure.

MFS-6585: Applications intermittently fail to detect updates to the ticket file.

MFS-6587: Avoid flooding the CLDB log with invalid snapshot ID messages.

MFS-6667: hoststats creates defunct Python processes.

MFS-6748: Automatic offload does not trigger EC offload.

MFS-6873: Cross cluster move operation fails on FUSE.

MFS-6874: Node Manager fails repeatedly during log aggregation.

MFS-8452: Memory leak in loopback NFS.

MFS-8459: Fixed volume access problems for volumes that reused the volume ID of deleted volumes.

MFS-10328: maprlogin renew (ticket renewal) fails to refresh group memberships.

MFS-10743: Node Manager fails to report container failure and loops between slave CLDBs, without contacting the new master CLDB.

MFS-10825: Cluster is unable to self-heal from the VOLUME_ALARM_DEGRADED_EC_STRIPES (Warm-Tier Data Node Down) alarm, and rebuild does not occur.

MFS-10845: Volume creation fails to honor the credentials of the impersonated user while creating the parent directory.

MFS-11002: fsck crashes due to an inode reservation issue.

MFS-11109: Filesystem crashes due to a leak in orphanage reservation.

MFS-11171: Restrict the tenant ticket so that it cannot mount non-tenant volumes in POSIX.

MFS-11221: Drill query crashes in MapR client due to an unexpected exception during fragment initialization.

MFS-11243: Memory leak in FUSE. Added a FUSE tunable (fuse.max.cache.pages) to limit the amount of memory that each FUSE process can use when working with a large number of open files.

MFS-11295: Offloading fails when mastgateway is stuck in compaction state.

MFS-11442: FUSE client does not honour the location of the cluster configuration file as defined by the parameter fuse.cluster.conf.location.

MFS-11609: Hadoop distcp jobs fail when using CLDB hostname and port.

MFS-11647: SlowOPs trace function does not work for NFSv3.

MFS-11674: Enable gfsck to perform CRC checks without blocking the operations on the EC frontend volume. See gfsck for the new -D|--crc option.

MFS-11682: CLDB volume dump fails with an RPC error due to an unknown session key.

MFS-11729: mrconfig info threads crash the filesystem when hardlinks are not enabled.

MFS-11731: Suppress redundant incorrect build version alarms.

MFS-11740: Filesystem crashes when compacting memory.

MFS-11779: MFS dumps core due to stack overflow.

MFS-11823: Service ticket renewal does not honour duration.

MFS-11838: Jobs fail with the "Too many open files" error.

MFS-15415: Volume dump restore fails with error 20020 (ENOTICKET) despite having a ticket and using it successfully for a long time

Hadoop

MAPRHADOOP-61: Kerberos fails for services when a custom ticket location is set in the env.sh file.

MAPRHADOOP-83: Upgrade Tomcat servers to their latest version or remove them if they are not needed.

MAPRHADOOP-102: Error occurs in ACEs when the Hive resource downloader internally copies files from the MapR filesystem to the local filesystem.

MAPRHADOOP-131: Update Jersey to its latest 1.X version.

MapR-DB

MAPRDB-1236: The Tiny Bucket Flush alarm is raised even when the node has sufficient memory.

MAPRDB-1589: Incorrect key sorting when using the orderby clause with conditions.

MAPRDB-1719: DB server crashes when columnset is used without initialization.

MAPRDB-1732: Inserting data into MapR-DB fails intermittently with an Invalid Argument error.

MAPRDB-1889: The Java API findById() intermittently fails to retrieve complete projection details from JSON documents.

MAPRDB-1985: The mapr dbshell find command crashes when run on a table with a huge number of tablets.

MAPRDB-1995: MapR-DB raises intermittent false VOLUME_ALARM_TABLE_REPL_LAG_HIGH alarms for replicated streams.

MAPRDB-2062: Failed to scan table on a remote secure cluster using the mapr dbshell utility because of a wrong ticket that was sent to ZooKeeper.

MAPRDB-2072: Data Access Gateway (DAG) fails to fetch indexes as it queries indexes as the mapr user instead of the impersonated user.

MAPRDB-2091: MapR-DB hangs due to inodes being recycled even when they are in use.

MAPRDB-2092: In MapR-DB, adding a table index or replicating a table fails if the cluster administrator (MAPR_USER) does not have write access to the parent volume of the table.

MAPRDB-2098: MapR DB crashes when multiple threads modify the size_ variable while calculating the serialised JSON document size.

MAPRDB-2103: PUT operations on binary tables fail when the values of the wireSecurityEnabled field vary between the FileClient and the FileServer.

MAPRDB-2120: Drill query on MapR DB intermittently fails with a DB Scan exception.

MAPRDB-2125: OJAI APIs fail to connect to ZooKeeper.

MAPRDB-2159: DB Autosetup, Indexing, and Replication fail as MFS is unable to reach the local Gateway.

MAPRDB-2201: Memory leak in BaseJsonTable caused by a dangling reference of MetaTable.

MAPRDB-2254: AppendStream fails when Gateway closes the inactive stream, and raises the replication lag alarm.

MAPRDB-2267: DB crashes during heap memory allocation.

MAPRDB-2303: The Replication Lag alarm does not display the actual lag value.

MAPRDB-2315: MapR-DB scan fails on large tables.

MAPRDB-2323: The Table Replication (VOLUME_ALARM_TABLE_REPL_ERROR) alarms are missing information such as the actual bucket FID that produced the alarm, if applicable, and the error code and description of the replication error.

Performance

MS-560: MapR cluster nodes experience high network traffic from mapr-stream clients.

MAPRDB-1727: Delay in data retrieval on MFS nodes with a large number of outstanding active buckets and high usage of DB memory.

MAPRDB-2113: MapR-DB needs to select the most appropriate index in cases where more than one index has been defined over the same field of a MapR-DB table.

MAPRDB-2156: When running queries with a set timeout, the number of threads on the MapR client increases up to 500, exhausting the Thread Pool, and causing the client to stop responding completely, even after all queries time out.

MAPRDB-2250: Too many BatchGet operations in parallel when secondary indexes are present on a table, causes MapR DB to crash. Added a parameter mfs.db.max.concurrent.internal.ops to regulate the number of parallel BatchGet operations.

MAPRMR-8: Reduce the number of input splits that are generated when a job is processed through the CombineFileInputFormat() function. Added the parameter mapreduce.input.fileinputformat.split.maxblocknum that determines the number of blocks that can be added to one split.

MFS-2078: Speed up FUSE path lookups. Added the fuse.negative.timeout parameter to cache negative lookup results.

MFS-2082: Optimize directory lookup and traversal to avoid overwhelming MFS with RPCs.

MFS-2324: Optimize disk space reserved for tiering operations.

MFS-2608: Priority of child threads do not change when the priority of their parent process is changed.

MFS-2638: Avoid re-sorting results in CLDB for the default output of the maprcli alarm list -sortby alarmtype command.

MFS-2691: Optimize fetching of Muted and Raised Alarms.

MFS-2711: Optimize removal of expired snapshots to free up CLDB CPU from background activity.

MFS-2749: The Alarm History feature needs to be disabled on large clusters as it can degrade performance. Added a parameter cldb.disable.alarm.history to disable alarm history.

MFS-3291: File client does not honor the number of flusher threads set in the coresite.xml file. See the fs.mapr.threads parameter.

MFS-4532: Jobs fail with I/O error or are very slow to complete.

MFS-4670: CLDB process consumes 78.8 GB approximately every 6-7 hours and triggers CLDB failover very often.

MFS-4687: Memory leak in CLDB.

MFS-4750: Disk Balancer is unable to move containers from full Storage Pools as they fail the Volume underweight check. Added a tunable - prevent.volume.skew.by.diskbalancer to let the Disk Balancer allow or prevent volume skew.

MFS-4805

Fixed memory leak in NFS Server version 3 that occurs when profiling memory. Added the following entities in /opt/mapr/conf/nfsserver.conf:

MemDebugEnable - Set to true to enable Memory Debugging.
HighMemLimitMB - Sets the maximum amount of memory that the NFS Server can use.

MFS-5227: NFS Server hangs or is very slow and causes replication and resync failures.

MFS-5700: CLDB master failover time is very high.

MFS-6539: NFS Version 3 Server on edge nodes does not have the ulimit setting, as warden is not available on these nodes.

MFS-5724: The MFS configuration parameter mfs.max.restore.count is not being honored causing mirror resync operations to be delayed due to the lack of sufficient restore slots.

MFS-6547: Massive delay in mounting the configured mount points after starting the NFS service.

MFS-6666: NFS server should throttle RPCs to avoid overwhelming CLDB.

MFS-6785: The response from the mrconfig info containers rw command is slow on a cluster with large number of volumes.

MFS-6869: Fuse client should limit the number of RPCs to prevent overwhelming CLDB.

MFS-7181: FileClient defaults to 8KB reads instead of 512KB.

MFS-8475: The createsystemvolumes.sh script took hours to complete when adding a new node to a cluster with a large number of volumes.

MFS-11111: Queuing and CLI RPC processing are slow in CLDB.

Security

COMSECURE-331: Security vulnerability in the JNDI-bindable DataSources library.

COMSECURE-334: Security vulnerability in the DOM4j XML framework.

COMSECURE-335: Security vulnerability in the Jasper library.

CORE-290: The /opt/mapr directory contains files and directories with insecure permissions.

CORE-293: After upgrading system security packages, mapr-zookeeper and mapr-warden are not properly started with systemd. The ps command reports them as started, while systemd reports errors when trying to start these services.

CORE-384: Remote Code Execution vulnerability in the ZooKeeper Java JMX server. Added a parameter JMXDISABLE to enable or disable loading ZooKeeper JMX parameters.

MAPRDB-2251: Standardize JMX handling for Java processes to prevent vulnerabilities.

MAPRDB-2255: Stream ACE u:mapr | has the potential to lock out the administration of the stream.

MAPRHADOOP-63: Security vulnerability in jackson-databind.

CORE-562, MAPRHADOOP-123: Security vulnerabilities in MapR 6.x JAR files.

MAPRHADOOP-58, MAPRHADOOP-64, MAPRHADOOP-136, MAPRHADOOP-137: Multiple security vulnerabilities in Hadoop.

MAPRYARN-241: Remote Code Execution vulnerability in the YARN Java JMX Server.

MFS-2336: File Client impersonation does not honour the permissions of the actual user.

MFS-2493: The /tmp/cldbinfo/unreachableCldbs is created with insecure permissions.

MFS-2551: Local Privilege Escalation vulnerability in the maprexecute command.

MFS-2645: Fixed a buffer overflow in NFS Server version 3.

MFS-2661: Snapshot creation fails due to a permission error.

MFS-2685: The maprcli commands use the wrong ticket to communicate with ZooKeeper in secure, cross cluster environments.

MFS-2700: FUSE kernel sends the wrong user credentials to the MapR FUSE Process.

MFS-2708: Disk failure related log files have insecure permissions.

MFS-3310: Need an alert to warn about expiring SSL certificates. Added the Security Certificate Expiry Alarm.

MFS-5229: Remote Code Execution vulnerability in the MAST Gateway JMX Server.

MFS-5234: Remote Code Execution vulnerability in the CLDB JMX Server.

MFS-5235: Remote Code Execution vulnerability in the Gateway JMX Server.

MFS-5236: Remote Code Execution vulnerability in the Warden JMX Server. Added a new parameter warden.enable.jmxremote that must be explicitly set to true to enable the Warden JMX Server.

MapR Streams

MS-762: Customer Streams face cursor commit failures.

MS-557: Commits fail for MapR Streams on volumes that were previously mirror volumes but are now standard volumes.

Upgrade

MS-925: After upgrade to EEP 6.2 (Spark 2.4.0), Kafka/ MapR Streams cannot be consumed.

MFS-2079: After upgrading to MapR version 6.1.0, the volume Name Container hangs when assigning volume names for volumes created with MapR version < 4.0.1.

MFS-2469: After upgrading MapR from version 5.2.2 to 6.1.0, slave CLDB nodes are stuck during initialization.

MFS-2553: Ecosystem jobs using ZooKeeper fail after upgrading to the MapR 6.1 EBF patch.

MFS-2560: CLDB on a MapR 5.2.2 cluster gets overwhelmed with RPC calls when mirroring from a MapR 6.1.x cluster.

MFS-2561: The VOLUME_ALARM_DATA_UNDER_REPLICATED alarm is generated frequently after upgrading MapR from version 3.0.1 to version 6.1.0.

MFS-2675: Certify MapR version 6.1.0 on RHEL 7.7

MON-3922: When upgrading to MapR 6.x, ensure that volumes prior to MapR version 6.0, which lack volume aces are handled gracefully after upgrade.

MON-4862: After upgrading from MapR 5.2.1 to MapR 6.1, API server fails to start with an M5 license without tables support installed.

MON-4892: Snapshot tab in MCS indicates that license upgrade is needed after upgrading from MapR 5.2.1 to MapR 6.1 with M5 license installed.

YARN

MAPRMR-4: With centralized logging, YARN does not populate stderr and stdout logs.

MAPRMR-19

Applications fail with the Jobstatus not available exception. The ApplicationMaster has already finished processing each job but the Job History Server has not yet updated job statuses. This causes the failures. Two options have been added to YARN to retry fetching job statuses.

yarn.app.mapreduce.job.update-status-max-retries - The number of times to retry.
yarn.app.mapreduce.job.update-status-retry-interval - The interval to wait before each retry attempt.

MAPRYARN-127: Resource Manager fails with a Concurrent Modification Exception.

MAPRYARN-155: Containers fail to launch if property names contain a dash (-) in the launch_container.sh script.

MAPRYARN-161: Deletion of History Server logs is stopped when an invalid application directory is found within the log aggregation directory.

MAPRYARN-171: YARN preemption does not occur with Fair and DRF scheduling policies.

MAPRYARN-191: YARN API requests via CLI do not return any result when cluster has Label-Based-Scheduling enabled.

MAPRYARN-192: MapReduce jobs fail if their labels contain the logical operand character (&&).

MAPRYARN-193: Resource Manager crashes when sorting Collections using the FairShare comparator.

MAPRYARN-195: Resource Manager exits with a FATAL error.

MAPRYARN-203: Resource preemption fails and returns a Null Pointer Exception.

MAPRYARN-210: Use per-node local volumes for YARN log aggregation instead of a single volume. Added the Local Log Aggregation Feature.

MAPRYARN-221

Containers hang in LOCALIZING state. Added two options:

yarn.nodemanager.timeout-localizing-container - The maximum time to wait to localize resources for containers.
yarn.nodemanager.check-interval-localizing-container.ms - The frequency at which the ApplicationMaster checks the running time of the localizing container.

MAPRYARN-223: Maximum idle time of the Jetty connection should be configurable.

MAPRYARN-244: Resource Manager hangs when trying to shut down after CLDB failover.

MAPRYARN-246: Resource Manager hangs when there is a space in the name of the queue in the fair-scheduler.xml file.

MAPRYARN-249: Resources needed to preempt should not have negative vcore values.

MAPRYARN-250: Job History Server (JHS) hangs under heavy load when scanning MFS for job history files. Added a parameter, mapreduce.jobhistory.intermediate-done-scan-timeout to set the timeout in milliseconds for rescanning the done_intermediate user directory.

MAPRYARN-258: Publish system metrics in batches so as to avoid overloading the Application Timeline Server (ATS).

MAPRYARN-261: Administrator users who are not part of the mapr group are not able to view the logs of the running jobs of another user.

MAPRYARN-276: Resource Manager crashes with a Null Pointer Exception.

MAPRYARN-284: YARN kills container but the task process is not killed.

MAPRYARN-287: When the ClientRMService processes an application kill request, the application diagnostics should report the user and the host that issued the kill request.

MAPRYARN-291: On RHEL 8.2, Warden must run Node Manager with umask 022 on MapR Core 6.1.0.