Resolved Issues

Lists the issues that were resolved in MapR version 6.1.0

The following MapR issues, which were reported by customers, are resolved in Version 6.1.
Component Number Description Resolution
CLDB 31346 Number of resync are large for an extended time after a couple of nodes are stopped/restarted in the cluster

The issue occurs when there are a large number of snapshots on a cluster. There are new alarms to warn about this issue, and are configurable.

See the following resources:
31620 Volume mirroring floods log with misleading error The bug occurs when mirroring happens between clusters that have both internal and external IPs set in mapr-clusters.conf. The getIPTypeForCluster method in CLDBRpcCommonUtils is unable to determine whether the IP type is internal or external. The workaround is to put in the internal IP in mapr-clusters.conf and keep the external IP in the env.sh file.

See: Starting the Mirror

Upgrade 31038 Prior to 6.1, the log4j.properties file was not automatically updated in an upgrade All files update automatically.
Upgrade 29752 Prior to 6.1, if Oozie was not upgraded to the EEP 4.0.0 version, the Oozie process would fail following a manual upgrade from MapR 5.2.x/ EEP 3.0.1 to MapR 6.0 The upgrade process works correctly.
FileClient 31024 When hadoop fs -rmr command is run to remove a list of files/directories, if a file/directory is not present at the time of removal, the command returns an error, 'no such file or directory’, and terminates without removing remaining files and/or directories from the given list. When hadoop fs -rmr command is run to remove a list of files and/or directories, if a file/directory from the list is not present on the system, the command now removes remaining files and/or directories from the list.
30987 Memory leak in DoPathWalk This leak is fixed.
31026 FileClient should use one source port to connect to any server This issue occurred because FileClient was using multiple source ports to connect to file servers, thus exhausting all available ports. This issue is now fixed to let FileClient connect on a single source port.

See Client Side Port Binding in What's New in Version 6.1.0

31129 File Client crashed at mapr::fs::CidCache::GetBinding The issue occurred because the assumption was that the number of volume replicas will not exceed 7. This issue is fixed.
31146 The SQL queries are failing intermittently The issue occurred because user impersonation was faulty. This issue is fixed.
31738 Create request should take setattr credentials from ticket To avoid this issue, all requests should take the credentials from the ticket, and not use the context user. This problem is fixed. See Managing the FUSE-Based POSIX Client
31804 FileClient hangs at WaitUntilEnqueued This is a backport of the fix in #24266, where the Idle Flusher needed to be disabled to resolve hangs, for MapR version 5.2.2. This fix is backported.
FileServer MFS-15415 Volume dump restore fails with error 20020 (ENOTICKET) despite having a ticket and using it successfully for a long time The issue is fixed.
26792 createTTVolume.sh needs to reliably determine the MFS state before deciding to recreate the local NM volume This bug caused rolling updates to fail. This issue is fixed.
30063 FCR sent during disk I/O error causes all 3 copies of container to be unavailable This bug occurred because the primary filesystem instance reported incorrect replica chain management information. This issue is fixed.
30917 EIO on a read op of a local volume cid deletes/offlines the container but local volume in not recreated This issue was caused because Volume IDs were not being passed in HandleDamagedVolumes() when called from the ContainerOffline case, causing the volumes to be deleted and not recreated automatically. This bug is fixed.
31007 FUSE clients do not honor impersonation constraints in servicewithimpersonation tickets This issue was caused when FUSE failed to honor constraints for a servicewithimpersonation ticket which includes impersonatedgids constraints. This issue is fixed, and FUSE now enforces such constraints. A support advisory is available at

https://support.datafabric.hpe.com/s/article/FUSE-Clients-do-not-honor-impersonation-constraints-in-servicewithimpersonation-tickets?language=en_US

31301 Some snapcids on mirror volumes that are marked for delete never get deleted This issue is fixed.
31361 Rename operation fails on tables and streams This issue occurred when a table or stream already existed in a rename operation... that is, in mv x y -> y already exists. This issue is fixed.
31365 MFS Rpc thread on CLDB node is running out of CPU This issue occurred because the process that reads from, and writes data to the key-value store, was not offloaded to the compression thread. This issue is fixed.
31453 MFS on CLDB secondary instances crashed due to failure in kvstorerangedelete (ENOENT) This bug occurred due to corruption of the structure that holds information about volume containers. This issue occurred due to a large number of containers being present in the volume. This issue is fixed. Appropriate documentation that depicts how to set the alarm for too many containers is available at Configuring the Alarm Threshold Using the CLI (see CLUSTER_ALARM_TOO_MANY_SNAPSHOT_CONTAINERS), and cldb.conf
31981 Disk failure on Master (A) container when reporting loss of B to CLDB, can cause all copies to be unavailable The problem arose because there was no sanity check to check for the validity of a replica, when the reconnection timer expires. This check has been added.
FS::ACE 26280 "hadoop mfs -setace" does not accept groups with spaces in the name Spaces in AD group names caused the issue. This issue is fixed. Spaces in AD group names are correctly parsed.
30245 Permission denied error due to aceCache when readdir served by the primary node of NC This issue occurred due to a null character in the ACE expression, as the ACE for execute file was null. This issue has been resolved.
FS::Audit 30928 expandaudit does not resolve most of fids in a huge volume This issue is fixed by making logging more explanatory.
FS::Fuse 31662 Fuse : OPEN with O_TRUNC fails with permission denied error This issue is fixed.
31730 Fuse: mkdir fails with EBUSY The issue is caused by inode number reuse. The work around is to set the fuse.use.compressed.inode.format parameter to 1 as documented in Configuring the MapR FUSE-Based POSIX Client
32030 Fuse Assert in fs/client/fuse/cc/fuse_special_ll.c This issue was caused by Fuse reading debugs_assert() from the wrong location. This issue is now fixed.
32050 Fuse needs to honor prod_build This issue was caused by Fuse reading debugs_assert() from the wrong location. This issue is now fixed.
32074 Memory leak in fuse cache This leak is fixed.
FS::Snapshot 31051 MCS & maprcli command are not showing correct Volume size This issue occurred because some snapshot containers might have had a delayed deletion. This issue is fixed.
hoststats 31858 Hoststats process is not coming up after changing the default value (5660) of "mfs.server.port" in /opt/mapr/conf/mfs.conf The problem occurred because the ports were hadcorded in MapR settings. Using a port other than the default, causes the shared memory key to fail. This issue is fixed.
MCS MON-3709 The list of services in the Services page do not refresh per the configured refresh rate settings. The list of services in the Services page refresh per the configured refresh rate settings.
MapR Monitoring SPYG-1010

The Grafana dashboard shows "No data points" for Volume metrics

Dashboard entries are correct.
SPYG-916 MapR monitoring index is not loaded correctly. The MapR monitoring index loads correctly.
MapR Event Store for Apache Kafka 31074 [Kafka 1.0] incorrect behavior for offsetsForTimes when streams.rpc.timeout.ms is configured Behavior corrected for offsetsForTimes when streams.rpc.timeout.ms
Elasticsearch ES-27 Elasticsearch fails to start correctly Elasticsearch starts correctly.
MapR Database 29278 Puts to tablets that failed with out of space errors, continue to fail on the first put, even though there is sufficient space; subsequent puts succeed. When space becomes available, the very first put no longer returns an error.
30489 Client rpc trace messages for slow operations are not printing tablet fid The messages include tablet fid.
31092 MapR filesystem nodes are crashing when there is a burst of read requests Corrected batching of read requests to avoid the condition that was causing the crash.
31297 Using Python happyhbase module to scan a MapR Database table via HBase Thrift causes the Thrift server to hang Addressed underlying issue that was causing the hang.
31766 MapR filesystem node crash causes client RPC timeouts during get and put operations Corrected the underlying logic for notification calls when creating new MapR Database buckets.
31901 Memory corruption occurs when deleting/freeing memory Corrected race conditions that were causing the corruption.
32262 Incremental bulkload via MapReduce fails to load data, even though it does not report an error Corrected the underlying logic that was causing the incremental load to fail without returning an error.
NFS 31810 Unable to mount specific volume, when cldb.reject.root is enabled Enabling cldb.reject.root caused NFS mounts to fail. This issue is fixed.
security 31935 All versions of MapR have Serious Ticket Vulnerability Enabling Authority Escalation This issue was caused by CLDB generating tickets from falsified credentials. This issue is fixed. A security bulletin is available at https://support.datafabric.hpe.com/s/article/MapR-Ticket-Credentials-can-become-compromised?language=en_US and the vulnerabilities list is updated and available at Security Vulnerabilities
Warden 31628 maprcli urls show the wrong value for hbmaster after shutdown of current HBmaster This issue occurred because maprcli had stale information on a HBase Master process that was killed. This issue is fixed.
YARN 25473 Aggregated log is written with wrong ownership Addressed the underlying issue that was causing writing the aggregated log to the wrong user.
31082 ResourceManager address does not resolve after redirect via a proxy request Fixed incorrect condition that updated the current ResourceManager address.
31174 In the primary application log, the following error repeatedly occurred: "Tez AM is trying to access Timeline server and it fails" TEZ AM is released after job or session is completed.
31200 YARN job submission using server side uid resolution fails with ownership exception The problem occurred because the username was not read from the submitted ticket. This issue is fixed.
31487 Containers fail in LOCALIZING state Implemented clean up of subprocesses spawned by Shell when the process exits. Localization failures are available in the container diagnostics.
31679 CVE-2016-6811 user who can escalate to Yarn user can possibly run arbitrary commands as a root user Fixed the underlying security vulnerability.
32178 Thread leak when executing a Scoop job Fixed thread leak.
32304 Fails to refresh labels for nodes in the cluster DefaultContainerExecutor sets proper permissions.