Issues Resolved in 5.2.1

The following customer-reported issues which were observed in version 5.2.0 are resolved in Version 5.2.1.

Product Number Description Resolution
AWS SDK jar 24566 An older version of the aws-sdk jar was built with Mapr. With this fix, MapR upgraded the aws-sdk jar from version 1.7.4 to 1.7.15
Build 24992 Installing a MapR patch caused jar files to be removed from under the drill/drill-1.4.0/jars/ directory Jar files are no longer incorrectly removed
CLDB 14105 When nodes attempt to register with duplicate IDs, CLDB does not register the nodes and log meaningful error messages. With this fix, when nodes attempt to register with duplicate IDs, CLDB will log appropriate error messages.
CLDB 24413 CLDB was crashing when volume replication was greater than 3. With this fix, CLDB will not crash when volume replication factor is greater than 3
CLDB 24647 On a node with multiple host IDs, CLDB crashed and failed over to new a new CLDB when a stale host ID was removed. With this fix, CLDB will not crash and fail over when a stale host ID is removed.
CLDB 24651 CLDB threw an exception and failed over when the snapshots list was iterated over while snapshots were being created. With this fix, CLDB will no longer fail over when snapshots list is iterated over while new snapshots are being created.
CLDB 24662 Intermittently, CLDB was shutting down because of race between initialization and use of license. With this fix, the license will be completely initialized before being used.
CLDB 24770 Under high load, sometimes CLDB would be caught up in a deadlock when updating volume info and volume snapshot count simultaneously. With this fix, there will no longer be a deadlock when updating volume info and volume snapshot count.
CLDB 25708 After a rolling upgrade of certain nodes to 4.1 or later, operations from these nodes to nodes running 4.0.2 and prior versions of MapR were getting stalled because MapR 4.0.2 and older versions did not process the new RPC introduced with MapR 4.1. With this fix, operations on nodes running MapR 4.0.2 will not be stalled.
CLDB 26214 During rolling upgrade, if slave CLDB is upgraded before master CLDB, the slave CLDB may crash when accessing new KvStore tables. With this fix, slave CLDB will not crash on reading new tables, even if they are non-existent.
CLDB 26335 When snapshots were deleted as part of volume remove, CLDB tables, which store snapshot info, were not purged at a fast rate. As a result of this, the cid 1 container grew in size gradually. With this fix, snapshot tables will now be purged properly when volumes are removed and the size of cid 1 container will not grow.
DB 24745 An assertion failure occurs in MapR-FS due to zero (0) length field names in OJAI documents. With this fix, the assert failure will no longer occur.
DB 24807 The run time of MapR tasks with counters is slow for file output commits. Because MapR-DB uses time-based trigger for bucket flushing, any unused buckets for 5 mins are flushed. These unused buckets are flushed every 2 sec in batches of 12. If there are a lot of buckets, regardless of size, that are unused for that period of time, the load caused by the flushing impacts performance. The time-based bucket flush was disabled, preventing slow performance.
DB 25241 In MapR-DB, when using the HBase Java FuzzyRowFilter filter, the wrong result is returned. this occurs because the mask preprocessing converts 1 to 2 and 0 to -1. With this fix, the correct results are returned.
DB 25333 An exception occurs when a JSON document is re-inserted into the same row after a table’s time-to-live has expired. With this fix, the exception will no longer occur and re-insertion will complete successfully.
DB 25401 The cumulative cost becomes a negative value when a MapR-DB table has more than 2147483647 rows. With this fix, the return type for the Java getNumRows() API is changed to long and the correct value is preserved.
DB-JSON 26338 When using an "In" condition for QueryCondition API on only the _id field, the last record is omitted. With this fix, all records are returned.
DB-Marlin 24408 When running multiple producers as separate threads within a process, with a very small value for buffer.memory (say 1KB), some producers can stall. This is due to a lack of buffer memory. With this fix, the default value for minimum buffer memory is increased to 10kB.
FileClient 24053 During client initialization, the client crashed if there was an error during initialization With this fix, the client will not crash if there is an error during initialization.
FileClient 25471 The readdir operation was returning incorrect entries when the child entries were volumes because of an issue with volume attributes on the client side With this fix, volume attributes will be set correctly for lookup and readdir operations.
MapR-FS 12856 When the hadoop fs -rmr command is run, it reads entire directory contents into memory before starting to delete anything resulting in Out Of Memory error. This fix includes a new hadoop mfs -rmr path command that:
  • Will not build entire readdir file list in memory and once 1MB of readdir data is reached, the command will unlink and remove those directories.
  • Will not fetch the attributes of the entries in readdir.
MapR-FS 20644 Sometimes, when mirroring large number of containers, the volume mirror thread was crashing resulting in a CLDB failover. With this fix, the mirroring process will be resilient to large number of containers.
MapR-FS 22044 The CLDB logs were growing to a large size with stdout and stderr messages when a user's ticket expired. With this fix, the CLDB logs will not grow to a large size with stdout and stderr messages when a user's ticket expires because the log level of messages related to ticket expiration has now been changed to Debug.
MapR-FS 23652 The POSIX loopbacknfs client did not automatically refresh renewed service tickets. With this fix, the POSIX loopbacknfs client will:
  • Automatically use the renewed service ticket without requiring a restart if the ticket is replaced before expiration (ticket expiry time + grace period of 55 minutes). If the ticket is replaced after expiration (which is ticket expiry time + grace period of 55 minutes), the POSIX loopbacknfs client will not refresh the ticket as the mount will become stale.
  • Allow impersonation if a service ticket is replaced before ticket expiration (which is ticket expiry time + grace period of 55 minutes) with a servicewithimpersonation ticket.
  • Honor all changes in user/group IDs of the renewed ticket.
MapR-FS 23975 In version 5.1, MFS was failing to start on some docker containers as it was trying to figure out number of numa nodes from /sys/devices/system/node. With this fix, MFS will work on docker containers.
MapR-FS 24022 Mirroring of a volume on a container which does not have a master container caused the mirror thread to hang. With this fix, mirroring will not hang when the container associated with the volume has no master.
MapR-FS 24139 If limit spread was enabled and the nodes were more than 85% full, CLDB did not allocate containers for IOs on non-local volumes. With this fix, CLDB will now allocate new containers to ensure that the IO does not fail.
MapR-FS 24155 Disk setup was timing out if running trim on flash drives took some time. With this fix, disk setup will complete successfully and the warning message (“Starting Trim of SSD drives, it may take a long time to complete”) is entered in the log file.
MapR-FS 24159 The mtime was updated whenever a hard link was created. Also, when a hard link was created from the FUSE mount point, although the ctime was updated, the update timestamp only showed the minutes and seconds and not the nanoseconds. With this fix, mtime will not change on the hard link and when a hard link is created from the FUSE mount point, the timestamp for ctime will include nanoseconds.
MapR-FS 24249 When running map/reduce jobs with older versions of the MapR classes, a system hang or other issues occurred because the older classes linked to the native library installed on cluster nodes that were updated to a newer MapR version With this fix, the new fs.mapr.bailout.on.library.mismatch parameter detects mismatched libraries, fails the map/reduce job, and logs an error message. The parameter is enabled by default. You can disable the parameter on all the TaskTracker nodes and resubmit the job for the task to continue to run. To disable the parameter, you must set it to false in the core-site.xml file.
MapR-FS 24352 Mirror synchronization is not optimized. In this patch, mirror synchronization has been optimized for changes in a small percentage of the inodes. During mirror resync operation, the destination will send the recent version number from the last mirror resync operation. While scanning inodes to identify the inodes that have changed since the last resync operation, MFS will now compare the version number sent by the destination with the allocation group, which keeps track of all the inodes. If the allocation group version is:
  • Higher than the last resync version, then MFS will check for the changed inodes in the allocation group.
  • Less than or equal to the last resync version, MFS will not read all the inodes in the allocation group because the allocation group has not changed since the last resync operation.
MapR-FS 24585 Excessive logging in CLDB audit caused cldbaudit.log file to grow to large sizes. With this fix, to reduce the size of cldbaudit.log file, the queries to CLDB for ZK string will no longer be logged for auditing.
MapR-FS 24618 Remote mirror volumes could not be created on secure clusters using MCS even when the appropriate tickets were present. With this fix, remote mirror volumes can now be created on secure clusters using MCS.
MapR-FS 24630 Under some conditions, using the 'ls' command with --full-time option produced incorrect results that showed as a negative number. With this fix, the correct timestamp is supplied.
MapR-FS 24660 MFS crashed because the maximum number of slots for backgrounded delete operations was not adequate. The incoming client operations reserving these slots were hanging and causing MFS to crash. With this fix, MFS will not crash as the number of slots for background operations has been increased.
MapR-FS 24712 During container resynchronization, the same scratch space was being reused by internal parallel operations resulting in corruption. With this fix, internal parallel operations will use separate scratch spaces.
MapR-FS 24846 If the topology of a node changed, after a CLDB failover, the list of nodes under a topology could not be determined as the new non-leaf topologies were not being updated. With this fix, the inner nodes of topology graph will be updated correctly and the list of nodes under an inner (non-leaf) topology will be determined correctly.
MapR-FS 24915 Running the expandaudit utility on volumes can result in very large (more than 1GB) audit log files due to incorrect GETATTR (get attributes) cache handling. With this fix, the expandaudit utility has been updated so that it will not perform subsequent GETATTR calls if the original call to the same file identifier failed.
MapR-FS 24965 On large clusters, sometimes the bind failed with the message indicating unavailability of port when running MR jobs, specifically reducer tasks. With this fix, the new fs.mapr.bind.retries configuration parameter in core-site.xml file, if set to true, will retry to bind during client initialization for 5 minutes before failing. By default, the fs.mapr.bind.retries configuration parameter is set to false.
MapR-FS 24971 When the mirroring operation started after a CLDB failover, sometimes it was sending requests to slave CLDB where data was stale, resulting in the the mirroring operation hanging. If the CLDB failover happened again during this time, the new CLDB master was discarding data resynchronized by the old mirroring operation, but marking the mirroring operation as successful. This resulted in data mismatch between source and destination. With this fix, mirroring requests will be sent to master CLDB node only.
MapR-FS 25041 Whenever a newly added node was made the master of the name container, MFS crashed while deleting files in the background. With this fix, MFS will not crash when a newly added node is made the master of the name container.
MapR-FS 25184 If limit spread was enabled and the nodes were more than 85% full, CLDB did not allocate containers for IOs on local volumes. With this fix, CLDB will now allocate new containers to ensure that the IO does not fail.
MapR-FS 25290 In secure environment, while writes were in progress, num_groups got corrupted and caused the FUSE process to crash. With this fix, the FUSE process will not crash while writes are in progress.
MapR-FS 25308 MFS crashed when mirroring a mirror volume that was promoted to a read/write volume and edited, and then reverted to a mirror volume. With this fix, MFS will not crash when resynchronizing a mirror volume that was promoted to a read/write volume and edited, and then reverted to a mirror volume.
MapR-FS 25337 When too many files were open, writes through FUSE were failing with EAGAIN messages. With this fix:
  • The limit for open files is 64k.
  • If the number of open files exceed the limit, ENFILE message (rather than EAGAIN) will be logged.
  • If a request is stuck and/or failing, error will be logged periodically.
MapR-FS 25426 The server was rejecting encrypted writes as the expected length was not matching the RPC data length and this caused the server to crash. With this fix, the server will no longer crash as the expected length will always match the RPC data length for encrypted writes.
MapR-FS 25590 Sometimes the SP to Fileserver map became inconsistent across different kvstore tables due to a race condition, which caused the container lookup from slave CLDB to fail. With this fix, kvstore tables will be made consistent if they are inconsistent.
MapR-FS 25775 While uncaching was in progress, MFS writes were taking a long time. With this fix, because of better uncaching algorithm (which utilizes CPU efficiently), there will be an improvement in the overall speed of MFS (including writes) while uncaching is in progress.
MapR-FS 25829 The libMapRClient library required JVM to be installed on the client machine, which is not required by C and C++ programs. With this fix, libMapRClient library will no longer need JVM to be installed on the client machine for C and C++ programs.
MapR-FS 25848 After a rolling upgrade to 5.2 of namespace container nodes, ACE information was getting set on certain operations incorrectly causing operations to fail. With this patch, ACE information will be discarded after a rolling upgrade.
MapR-FS 25856 In the event of a CLDB failover, a table on the unreachable node is deleted and re-created by CLDB master. Sometimes, multiple container lookup threads from slave CLDBs trying to open/access that table during the failover caused CLDB exception. With this fix multiple threads can safely access unreachable node table.
MapR-FS 26025 A corrupt encrypted write results in a data decryption failure. As a result of the decryption failure, MFS returns an EINVAL. The master node for the write crashes when it receives an EINVAL from the replicas. In this case, the decryption failure should have resulted in an EBADMSG instead of an EINVAL. With this fix, an EBADMSG is returned in case of a decryption failure of data. Upon encountering an EBADMSG, MFS sends an ErrServerRetry to the client. The Client revalidates the CRC, tries decrypting the encrypted buffers, and then retries the write operation, making the client more resilient to memory and network corruptions.
MapR-FS 26054 Sometimes, the container was getting stuck in resync state because the resync operation was hanging. With this fix, the resync operation will no longer hang.
MapR-FS 26062 After installing patch 41809 on v5.2, the FUSE-based POSIX client failed to start. With this fix, the FUSE-based POSIX client will now start when the command to start the service is run.
MapR-FS 26093 Sometimes, MFS crashed after promoting destination mirror volume to read-write volume. With this fix, MFS will not crash after promoting destination mirror volume to read-write volume.
MapR-FS 26094 Sometimes MFS crashed because there were many SP cleaner threads between low and high threshold. With this fix, MFS will not crash because the cleaner is disabled if it is below the high threshold.
MapR-FS 26288 During rolling upgrade, if slave CLDB is upgraded before master CLDB, the slave CLDB may crash when accessing new KvStore tables. With this fix, slave CLDB will not crash on reading new tables, even if they are non-existent.
MapR-FS 26336 MFS was crashing during truncate operation because of the following:
  • ACE was set on the file
  • The file had more than one filelet
  • The final truncated size was such that it ended within any of the direct blocks of the last filelet
With this fix, MFS will no longer crash during truncate operation.
MapR-FS 26351 During disksetup, even if the mfs.ssd.trim.enabled configuration parameter was set to false, the device was getting trim calls. With this fix, MFS will not attempt to trim if the configuration parameter is set to false.
Hive, Tez 20965 When working with multiple clusters, synchronization issues was causing MapRFileSystem to return NullPointerException. With this fix, MapRFileSystem has been improved to better support working with multiple clusters and MapRFileSystem contains fixes for synchronization issues.
Hoststats 11349 Hoststats did not work on POSIX edge node. With this fix, hoststats can work on POSIX client edge nodes as well to display the statistics on MCS.
JobTracker 24700 The Job Tracker user interface failed with a NullPointerException when a user submitted a Hive job with a null value in a method. With this fix, the Job Tracker interface does not fail when a Hive job is run with a null value in a method.
MapReduce 24505 A job failed when the JvmManager went into an inconsistent state. With this fix, jobs no longer fail as a result of the JvmManager entering an inconsistent state.
MapReduce 25599 Race condition in jobtracker-start script could cause warden to start multiple jobtrackers. With this fix the start script loops and waits for a successful start of jobtracker before exiting, thus closing the window of the race condition.
MapReduce 25695 It was not possible to restrict the web access port range, so the YARN Mapreduce application master could open a web port anywhere in the ephemeral port range of the node where it was running. With this change, the YARN Mapreduce application master will only open its web port within the range specified by the mapred parameter: yarn.app.mapreduce.am.job.client.port-range
MCS 23257 In MCS, new NFS VIPs were visible in the NFS HA > VIP Assignments tab, but not in the NFS HA > NFS Setup tab. With this fix, the NFS VIPs will be available in both the NFS HA > VIP Assignments tab and the NFS HA > NFS Setup tab.
NFS 24315 If you use the NFS client and you used the dd command with iflag=direct, an incorrect amount of data may have been read. With this fix, the dd command will read exactly the expected amount of data when iflag=direct is set.
NFS 24446 Due to incorrect attribute cache handling in NFS server, the getattr call sometimes returned stale mtime because the attribute cache was not getting updated properly at the time of setattr. With this fix, the attributes are now properly cached.
NFS 24658 CLDB returned “no master” and an empty list for container lookup, which NFS server could not handle, because when multiple servers are down, there can be no master for a container. With this fix, NFS server will handle empty node list for container lookup.
NFS:Loopback 23652 The POSIX loopbacknfs client did not automatically refresh renewed service tickets. With this fix, the POSIX loopbacknfs client will:
  • Automatically use the renewed service ticket without requiring a restart if the ticket is replaced before expiration (ticket expiry time + grace period of 55 minutes). If the ticket is replaced after expiration (which is ticket expiry time + grace period of 55 minutes), the POSIX loopbacknfs client will not refresh the ticket as the mount will become stale.
  • Allow impersonation if a service ticket is replaced before ticket expiration (which is ticket expiry time + grace period of 55 minutes) with a servicewithimpersonation ticket.
  • Honor all changes in user/group IDs of the renewed ticket.
Pkg/deployment 24309 Symlinks that existed in a MapR 5.1 installation were not re-created during an upgrade to MapR 5.2. This problem resulted when the mapr-hadoop-core package was updated on a cluster with the incorrect version of the mapr-core-internal package. This problem can occur during an upgrade from any older MapR version to a newer MapR version. With this fix, the mapr-hadoop-core package has a new dependency for a specific version of mapr-core-internal. If the correct version of mapr-core-internal is not present, an error message is generated, and the mapr-hadoop-core package cannot be installed. Note that this fix is effective for MapR 5.2.1 or later installations.
RPC 24610 In a secure cluster, when there are intermittent connection drops (between MFS-MFS or client-MFS), the client and/or server could crash during authentication. With this fix, the client and/or server will not crash during authentication if there are intermittent connection drops.
Streams 23563 High CPU utilization occurs when the default buffering time for MapR Streams is set to 0. With this fix, CPU utilization and latency is reduced by having TimeBasedFlusher active only when there is work to do.
UI:CLI 24280 Running the maprcli dashboard info command occasionally throws a TimeoutException error. With this fix, the internal timeout command was increased to provide more allowance for command processing.
Warden 24119 Warden adjusts the FileServer (MFS) and Node Manager (NM) memory incorrectly when NM and TaskTracker (TT) are on the same node. This can result in too much memory being allocated to MFS. With this fix, Warden does not adjust MFS memory when NM and TT are on the same node. Memory adjustment is implemented only when TT and MapR-FS (but no NM) are on the same node.
Warden 24562 CLDB (container location database) performance suffered because Warden gave the CLDB service a lower CPU priority. With this fix, Warden uses a new algorithm to set the correct CPU priority for the CLDB service.
Yarn 24477 Jobs failed if a local volume was not available and directories for mapreduce could not be initialized. With this fix, jobs no longer fail, and local volume recovery is enhanced.
YARN 25387 A null pointer exception (NPE) was generated when the capacity scheduler was enabled. Adding a node that does not contain a label can result in an NPE. With this fix, errors are no longer generated when the capacity scheduler is enabled.
YARN 25412 Mapreduce jobs fail if the Application Master (AM) is restarted for any reason -- for example, because of a node failure -- during a job commit and leaves a control file that prevents subsequent commit attempts. With this fix, MAPREDUCE-5485 is backported to MapR 5.1. MAPREDUCE-5485 adds a clean-up of commit-stage files. If the first commit attempt fails, temporary files are removed, allowing the next repeatable commit attempt to write them again without throwing an exception. To benefit from this fix, the user must set the mapreduce.fileoutputcommitter.algorithm.version parameter to "2" in the mapred-site.xml file.
YARN 25654 At startup, while processing application-recovery data, the ResourceManager (RM) failed with a null pointer exception. With this fix, the ResourceManager starts correctly when processing application-recovery data.
Yarn/security 25448 A user's temporary log files for running jobs were not readable by another user from the same group in the RM UI. An exception with the message, "Exception reading log file. User 'mapr' doesn't own requested log file" was generated. With this fix, users in the same primary group can access user logs of other users in the group.
Yarn/Warden 25695 It was not possible to restrict the web access port range, so the YARN Mapreduce application master could open a web port anywhere in the ephemeral port range of the node where it was running. With this change, the YARN Mapreduce application master will only open its web port within the range specified by the mapred parameter: yarn.app.mapreduce.am.job.client.port-range