Issues Resolved in 5.2.1

Product

Number

Description

Resolution

AWS SDK jar

24566

An older version of the aws-sdk jar was built with Mapr.

With this fix, MapR upgraded the aws-sdk jar from version 1.7.4 to 1.7.15

Build

24992

Installing a MapR patch caused jar files to be removed from under the drill/drill-1.4.0/jars/ directory

Jar files are no longer incorrectly removed

CLDB

14105

When nodes attempt to register with duplicate IDs, CLDB does not register the nodes and log meaningful error messages.

With this fix, when nodes attempt to register with duplicate IDs, CLDB will log appropriate error messages.

CLDB

24413

CLDB was crashing when volume replication was greater than 3.

With this fix, CLDB will not crash when volume replication factor is greater than 3

CLDB

24647

On a node with multiple host IDs, CLDB crashed and failed over to new a new CLDB when a stale host ID was removed.

With this fix, CLDB will not crash and fail over when a stale host ID is removed.

CLDB

24651

CLDB threw an exception and failed over when the snapshots list was iterated over while snapshots were being created.

With this fix, CLDB will no longer fail over when snapshots list is iterated over while new snapshots are being created.

CLDB

24662

Intermittently, CLDB was shutting down because of race between initialization and use of license.

With this fix, the license will be completely initialized before being used.

CLDB

24770

Under high load, sometimes CLDB would be caught up in a deadlock when updating volume info and volume snapshot count simultaneously.

With this fix, there will no longer be a deadlock when updating volume info and volume snapshot count.

CLDB

25708

After a rolling upgrade of certain nodes to 4.1 or later, operations from these nodes to nodes running 4.0.2 and prior versions of MapR were getting stalled because MapR 4.0.2 and older versions did not process the new RPC introduced with MapR 4.1.

With this fix, operations on nodes running MapR 4.0.2 will not be stalled.

CLDB

26214

During rolling upgrade, if slave CLDB is upgraded before master CLDB, the slave CLDB may crash when accessing new KvStore tables.

With this fix, slave CLDB will not crash on reading new tables, even if they are non-existent.

CLDB

26335

When snapshots were deleted as part of volume remove, CLDB tables, which store snapshot info, were not purged at a fast rate. As a result of this, the cid 1 container grew in size gradually.

With this fix, snapshot tables will now be purged properly when volumes are removed and the size of cid 1 container will not grow.

DB

24745

An assertion failure occurs in MapR-FS due to zero (0) length field names in OJAI documents.

With this fix, the assert failure will no longer occur.

DB

24807

The run time of MapR tasks with counters is slow for file output commits. Because MapR-DB uses time-based trigger for bucket flushing, any unused buckets for 5 mins are flushed. These unused buckets are flushed every 2 sec in batches of 12. If there are a lot of buckets, regardless of size, that are unused for that period of time, the load caused by the flushing impacts performance.

The time-based bucket flush was disabled, preventing slow performance.

DB

25241

In MapR-DB, when using the HBase Java FuzzyRowFilter filter, the wrong result is returned. this occurs because the mask preprocessing converts 1 to 2 and 0 to -1.

With this fix, the correct results are returned.

DB

25333

An exception occurs when a JSON document is re-inserted into the same row after a table’s time-to-live has expired.

With this fix, the exception will no longer occur and re-insertion will complete successfully.

DB

25401

The cumulative cost becomes a negative value when a MapR-DB table has more than 2147483647 rows.

With this fix, the return type for the Java getNumRows() API is changed to long and the correct value is preserved.

DB-JSON

26338

When using an "In" condition for QueryCondition API on only the _id field, the last record is omitted.

With this fix, all records are returned.

DB-Marlin

24408

When running multiple producers as separate threads within a process, with a very small value for buffer.memory (say 1KB), some producers can stall. This is due to a lack of buffer memory.

With this fix, the default value for minimum buffer memory is increased to 10kB.

FileClient

24053

During client initialization, the client crashed if there was an error during initialization

With this fix, the client will not crash if there is an error during initialization.

FileClient

25471

The readdir operation was returning incorrect entries when the child entries were volumes because of an issue with volume attributes on the client side

With this fix, volume attributes will be set correctly for lookup and readdir operations.

MapR-FS

12856

When the hadoop fs -rmr command is run, it reads entire directory contents into memory before starting to delete anything resulting in Out Of Memory error.

This fix includes a new hadoop mfs -rmr path command that:

Will not build entire readdir file list in memory and once 1MB of readdir data is reached, the command will unlink and remove those directories.
Will not fetch the attributes of the entries in readdir.

MapR-FS

20644

Sometimes, when mirroring large number of containers, the volume mirror thread was crashing resulting in a CLDB failover.

With this fix, the mirroring process will be resilient to large number of containers.

MapR-FS

22044

The CLDB logs were growing to a large size with stdout and stderr messages when a user's ticket expired.

With this fix, the CLDB logs will not grow to a large size with stdout and stderr messages when a user's ticket expires because the log level of messages related to ticket expiration has now been changed to Debug.

MapR-FS

23652

The POSIX loopbacknfs client did not automatically refresh renewed service tickets.

With this fix, the POSIX loopbacknfs client will:

Automatically use the renewed service ticket without requiring a restart if the ticket is replaced before expiration (ticket expiry time + grace period of 55 minutes). If the ticket is replaced after expiration (which is ticket expiry time + grace period of 55 minutes), the POSIX loopbacknfs client will not refresh the ticket as the mount will become stale.
Allow impersonation if a service ticket is replaced before ticket expiration (which is ticket expiry time + grace period of 55 minutes) with a servicewithimpersonation ticket.
Honor all changes in user/group IDs of the renewed ticket.

MapR-FS

23975

In version 5.1, MFS was failing to start on some docker containers as it was trying to figure out number of numa nodes from /sys/devices/system/node.

With this fix, MFS will work on docker containers.

MapR-FS

24022

Mirroring of a volume on a container which does not have a master container caused the mirror thread to hang.

With this fix, mirroring will not hang when the container associated with the volume has no master.

MapR-FS

24139

If limit spread was enabled and the nodes were more than 85% full, CLDB did not allocate containers for IOs on non-local volumes.

With this fix, CLDB will now allocate new containers to ensure that the IO does not fail.

MapR-FS

24155

Disk setup was timing out if running trim on flash drives took some time.

With this fix, disk setup will complete successfully and the warning message (“Starting Trim of SSD drives, it may take a long time to complete”) is entered in the log file.

MapR-FS

24159

The mtime was updated whenever a hard link was created. Also, when a hard link was created from the FUSE mount point, although the ctime was updated, the update timestamp only showed the minutes and seconds and not the nanoseconds.

With this fix, mtime will not change on the hard link and when a hard link is created from the FUSE mount point, the timestamp for ctime will include nanoseconds.

MapR-FS

24249

When running map/reduce jobs with older versions of the MapR classes, a system hang or other issues occurred because the older classes linked to the native library installed on cluster nodes that were updated to a newer MapR version

With this fix, the new fs.mapr.bailout.on.library.mismatch parameter detects mismatched libraries, fails the map/reduce job, and logs an error message. The parameter is enabled by default. You can disable the parameter on all the TaskTracker nodes and resubmit the job for the task to continue to run. To disable the parameter, you must set it to false in the core-site.xml file.

MapR-FS

24352

Mirror synchronization is not optimized.

In this patch, mirror synchronization has been optimized for changes in a small percentage of the inodes. During mirror resync operation, the destination will send the recent version number from the last mirror resync operation. While scanning inodes to identify the inodes that have changed since the last resync operation, MFS will now compare the version number sent by the destination with the allocation group, which keeps track of all the inodes. If the allocation group version is:

Higher than the last resync version, then MFS will check for the changed inodes in the allocation group.
Less than or equal to the last resync version, MFS will not read all the inodes in the allocation group because the allocation group has not changed since the last resync operation.

MapR-FS

24585

Excessive logging in CLDB audit caused cldbaudit.log file to grow to large sizes.

With this fix, to reduce the size of cldbaudit.log file, the queries to CLDB for ZK string will no longer be logged for auditing.

MapR-FS

24618

Remote mirror volumes could not be created on secure clusters using MCS even when the appropriate tickets were present.

With this fix, remote mirror volumes can now be created on secure clusters using MCS.

MapR-FS

24630

Under some conditions, using the 'ls' command with --full-time option produced incorrect results that showed as a negative number.

With this fix, the correct timestamp is supplied.

MapR-FS

24660

MFS crashed because the maximum number of slots for backgrounded delete operations was not adequate. The incoming client operations reserving these slots were hanging and causing MFS to crash.

With this fix, MFS will not crash as the number of slots for background operations has been increased.

MapR-FS

24712

During container resynchronization, the same scratch space was being reused by internal parallel operations resulting in corruption.

With this fix, internal parallel operations will use separate scratch spaces.

MapR-FS

24846

If the topology of a node changed, after a CLDB failover, the list of nodes under a topology could not be determined as the new non-leaf topologies were not being updated.

With this fix, the inner nodes of topology graph will be updated correctly and the list of nodes under an inner (non-leaf) topology will be determined correctly.

MapR-FS

24915

Running the expandaudit utility on volumes can result in very large (more than 1GB) audit log files due to incorrect GETATTR (get attributes) cache handling.

With this fix, the expandaudit utility has been updated so that it will not perform subsequent GETATTR calls if the original call to the same file identifier failed.

MapR-FS

24965

On large clusters, sometimes the bind failed with the message indicating unavailability of port when running MR jobs, specifically reducer tasks.

With this fix, the new fs.mapr.bind.retries configuration parameter in core-site.xml file, if set to true, will retry to bind during client initialization for 5 minutes before failing. By default, the fs.mapr.bind.retries configuration parameter is set to false.

MapR-FS

24971

When the mirroring operation started after a CLDB failover, sometimes it was sending requests to slave CLDB where data was stale, resulting in the the mirroring operation hanging. If the CLDB failover happened again during this time, the new CLDB master was discarding data resynchronized by the old mirroring operation, but marking the mirroring operation as successful. This resulted in data mismatch between source and destination.

With this fix, mirroring requests will be sent to master CLDB node only.

MapR-FS

25041

Whenever a newly added node was made the master of the name container, MFS crashed while deleting files in the background.

With this fix, MFS will not crash when a newly added node is made the master of the name container.

MapR-FS

25184

If limit spread was enabled and the nodes were more than 85% full, CLDB did not allocate containers for IOs on local volumes.

With this fix, CLDB will now allocate new containers to ensure that the IO does not fail.

MapR-FS

25290

In secure environment, while writes were in progress, num_groups got corrupted and caused the FUSE process to crash.

With this fix, the FUSE process will not crash while writes are in progress.

MapR-FS

25308

MFS crashed when mirroring a mirror volume that was promoted to a read/write volume and edited, and then reverted to a mirror volume.

With this fix, MFS will not crash when resynchronizing a mirror volume that was promoted to a read/write volume and edited, and then reverted to a mirror volume.

MapR-FS

25337

When too many files were open, writes through FUSE were failing with EAGAIN messages.

With this fix:

The limit for open files is 64k.
If the number of open files exceed the limit, ENFILE message (rather than EAGAIN) will be logged.
If a request is stuck and/or failing, error will be logged periodically.

MapR-FS

25426

The server was rejecting encrypted writes as the expected length was not matching the RPC data length and this caused the server to crash.

With this fix, the server will no longer crash as the expected length will always match the RPC data length for encrypted writes.

MapR-FS

25590

Sometimes the SP to Fileserver map became inconsistent across different kvstore tables due to a race condition, which caused the container lookup from slave CLDB to fail.

With this fix, kvstore tables will be made consistent if they are inconsistent.

MapR-FS

25775

While uncaching was in progress, MFS writes were taking a long time.

With this fix, because of better uncaching algorithm (which utilizes CPU efficiently), there will be an improvement in the overall speed of MFS (including writes) while uncaching is in progress.

MapR-FS

25829

The libMapRClient library required JVM to be installed on the client machine, which is not required by C and C++ programs.

With this fix, libMapRClient library will no longer need JVM to be installed on the client machine for C and C++ programs.

MapR-FS

25848

After a rolling upgrade to 5.2 of namespace container nodes, ACE information was getting set on certain operations incorrectly causing operations to fail.

With this patch, ACE information will be discarded after a rolling upgrade.

MapR-FS

25856

In the event of a CLDB failover, a table on the unreachable node is deleted and re-created by CLDB master. Sometimes, multiple container lookup threads from slave CLDBs trying to open/access that table during the failover caused CLDB exception.

With this fix multiple threads can safely access unreachable node table.

MapR-FS

26025

A corrupt encrypted write results in a data decryption failure. As a result of the decryption failure, MFS returns an EINVAL. The master node for the write crashes when it receives an EINVAL from the replicas. In this case, the decryption failure should have resulted in an EBADMSG instead of an EINVAL.

With this fix, an EBADMSG is returned in case of a decryption failure of data. Upon encountering an EBADMSG, MFS sends an ErrServerRetry to the client. The Client revalidates the CRC, tries decrypting the encrypted buffers, and then retries the write operation, making the client more resilient to memory and network corruptions.

MapR-FS

26054

Sometimes, the container was getting stuck in resync state because the resync operation was hanging.

With this fix, the resync operation will no longer hang.

MapR-FS

26062

After installing patch 41809 on v5.2, the FUSE-based POSIX client failed to start.

With this fix, the FUSE-based POSIX client will now start when the command to start the service is run.

MapR-FS

26093

Sometimes, MFS crashed after promoting destination mirror volume to read-write volume.

With this fix, MFS will not crash after promoting destination mirror volume to read-write volume.

MapR-FS

26094

Sometimes MFS crashed because there were many SP cleaner threads between low and high threshold.

With this fix, MFS will not crash because the cleaner is disabled if it is below the high threshold.

MapR-FS

26288

During rolling upgrade, if slave CLDB is upgraded before master CLDB, the slave CLDB may crash when accessing new KvStore tables.

With this fix, slave CLDB will not crash on reading new tables, even if they are non-existent.

MapR-FS

26336

MFS was crashing during truncate operation because of the following:

ACE was set on the file
The file had more than one filelet
The final truncated size was such that it ended within any of the direct blocks of the last filelet

With this fix, MFS will no longer crash during truncate operation.

MapR-FS

26351

During disksetup, even if the mfs.ssd.trim.enabled configuration parameter was set to false, the device was getting trim calls.

With this fix, MFS will not attempt to trim if the configuration parameter is set to false.

Hive, Tez

20965

When working with multiple clusters, synchronization issues was causing MapRFileSystem to return NullPointerException.

With this fix, MapRFileSystem has been improved to better support working with multiple clusters and MapRFileSystem contains fixes for synchronization issues.

Hoststats

11349

Hoststats did not work on POSIX edge node.

With this fix, hoststats can work on POSIX client edge nodes as well to display the statistics on MCS.

JobTracker

24700

The Job Tracker user interface failed with a NullPointerException when a user submitted a Hive job with a null value in a method.

With this fix, the Job Tracker interface does not fail when a Hive job is run with a null value in a method.

MapReduce

24505

A job failed when the JvmManager went into an inconsistent state.

With this fix, jobs no longer fail as a result of the JvmManager entering an inconsistent state.

MapReduce

25599

Race condition in jobtracker-start script could cause warden to start multiple jobtrackers.

With this fix the start script loops and waits for a successful start of jobtracker before exiting, thus closing the window of the race condition.

MapReduce

25695

It was not possible to restrict the web access port range, so the YARN Mapreduce application master could open a web port anywhere in the ephemeral port range of the node where it was running.

With this change, the YARN Mapreduce application master will only open its web port within the range specified by the mapred parameter: yarn.app.mapreduce.am.job.client.port-range

MCS

23257

In MCS, new NFS VIPs were visible in the NFS HA > VIP Assignments tab, but not in the NFS HA > NFS Setup tab.

With this fix, the NFS VIPs will be available in both the NFS HA > VIP Assignments tab and the NFS HA > NFS Setup tab.

NFS

24315

If you use the NFS client and you used the dd command with iflag=direct, an incorrect amount of data may have been read.

With this fix, the dd command will read exactly the expected amount of data when iflag=direct is set.

NFS

24446

Due to incorrect attribute cache handling in NFS server, the getattr call sometimes returned stale mtime because the attribute cache was not getting updated properly at the time of setattr.

With this fix, the attributes are now properly cached.

NFS

24658

CLDB returned “no master” and an empty list for container lookup, which NFS server could not handle, because when multiple servers are down, there can be no master for a container.

With this fix, NFS server will handle empty node list for container lookup.

NFS:Loopback

23652

The POSIX loopbacknfs client did not automatically refresh renewed service tickets.

With this fix, the POSIX loopbacknfs client will:

Automatically use the renewed service ticket without requiring a restart if the ticket is replaced before expiration (ticket expiry time + grace period of 55 minutes). If the ticket is replaced after expiration (which is ticket expiry time + grace period of 55 minutes), the POSIX loopbacknfs client will not refresh the ticket as the mount will become stale.
Allow impersonation if a service ticket is replaced before ticket expiration (which is ticket expiry time + grace period of 55 minutes) with a servicewithimpersonation ticket.
Honor all changes in user/group IDs of the renewed ticket.

Pkg/deployment

24309

Symlinks that existed in a MapR 5.1 installation were not re-created during an upgrade to MapR 5.2. This problem resulted when the mapr-hadoop-core package was updated on a cluster with the incorrect version of the mapr-core-internal package. This problem can occur during an upgrade from any older MapR version to a newer MapR version.

With this fix, the mapr-hadoop-core package has a new dependency for a specific version of mapr-core-internal. If the correct version of mapr-core-internal is not present, an error message is generated, and the mapr-hadoop-core package cannot be installed. Note that this fix is effective for MapR 5.2.1 or later installations.

RPC

24610

In a secure cluster, when there are intermittent connection drops (between MFS-MFS or client-MFS), the client and/or server could crash during authentication.

With this fix, the client and/or server will not crash during authentication if there are intermittent connection drops.

Streams

23563

High CPU utilization occurs when the default buffering time for MapR Streams is set to 0.

With this fix, CPU utilization and latency is reduced by having TimeBasedFlusher active only when there is work to do.

UI:CLI

24280

Running the maprcli dashboard info command occasionally throws a TimeoutException error.

With this fix, the internal timeout command was increased to provide more allowance for command processing.

Warden

24119

Warden adjusts the FileServer (MFS) and Node Manager (NM) memory incorrectly when NM and TaskTracker (TT) are on the same node. This can result in too much memory being allocated to MFS.

With this fix, Warden does not adjust MFS memory when NM and TT are on the same node. Memory adjustment is implemented only when TT and MapR-FS (but no NM) are on the same node.

Warden

24562

CLDB (container location database) performance suffered because Warden gave the CLDB service a lower CPU priority.

With this fix, Warden uses a new algorithm to set the correct CPU priority for the CLDB service.

Yarn

24477

Jobs failed if a local volume was not available and directories for mapreduce could not be initialized.

With this fix, jobs no longer fail, and local volume recovery is enhanced.

YARN

25387

A null pointer exception (NPE) was generated when the capacity scheduler was enabled. Adding a node that does not contain a label can result in an NPE.

With this fix, errors are no longer generated when the capacity scheduler is enabled.

YARN

25412

Mapreduce jobs fail if the Application Master (AM) is restarted for any reason -- for example, because of a node failure -- during a job commit and leaves a control file that prevents subsequent commit attempts.

With this fix, MAPREDUCE-5485 is backported to MapR 5.1. MAPREDUCE-5485 adds a clean-up of commit-stage files. If the first commit attempt fails, temporary files are removed, allowing the next repeatable commit attempt to write them again without throwing an exception. To benefit from this fix, the user must set the mapreduce.fileoutputcommitter.algorithm.version parameter to "2" in the mapred-site.xml file.

YARN

25654

At startup, while processing application-recovery data, the ResourceManager (RM) failed with a null pointer exception.

With this fix, the ResourceManager starts correctly when processing application-recovery data.

Yarn/security

25448

A user's temporary log files for running jobs were not readable by another user from the same group in the RM UI. An exception with the message, "Exception reading log file. User 'mapr' doesn't own requested log file" was generated.

With this fix, users in the same primary group can access user logs of other users in the group.

Yarn/Warden

25695

It was not possible to restrict the web access port range, so the YARN Mapreduce application master could open a web port anywhere in the ephemeral port range of the node where it was running.

With this change, the YARN Mapreduce application master will only open its web port within the range specified by the mapred parameter: yarn.app.mapreduce.am.job.client.port-range