Troubleshooting Object Store

Provides methods for troubleshooting issues in Object Store.

Before You Troubleshoot

Verify that Object Store is properly installed and enabled, as described in Installing HPE Ezmeral Data Fabric Object Store and Enabling the HPE Ezmeral Data Fabric Object Store. Enabling the HPE Ezmeral Data Fabric Object Store includes several important steps required to use Object Store successfully, including steps for setting up certificates. If certificates are not properly configured, applications cannot access Object Store.

You can also perform the following pre-troubleshooting verification checks:

Check access to the Object Store UI
The Object Store UI is the Object Store entry point. If Object Store is installed and running, you should be able to access the Object Store UI from the MCS (management control system). Go to https://<node-ip-address>:8443/app/mcs/#/app/login and log in. Click on the Data tab and look for Object Store in the dropdown. If you see Object Store in the dropdown, Object Store is installed and running. If you do not see Object Store in the dropdown, the CLDB S3server quorum is not properly set up or the quorum has not finished setting up.
Check the status of the CLDB and S3 server quorum
To check the status of the CLDB and S3 server quorum, run maprcli dump cldbstate -json. In the output, look for s3Info. S3Info contains the status of all S3 servers.
  • When the status of all S3 servers is running, you should be able to access Object Store through the Object Store UI. If you followed all the instructions in Enabling the HPE Ezmeral Data Fabric Object Store, it may just take a bit more time for the status to change.
  • If the s3State is AWAITING_FEATURE_ENABLE, restart the CLDB service. See node services.
Verify that users have permission to log in to Object Store.
Before a user that is listed in LDAP/AD can access Object Store, the cluster administrator (typically the mapr user) must first give the user permission to log in. In the MCS go to Admin > User Settings and click on the Permissions tab. Add the user and assign Login permission to the user. Click Save Changes when done.

Logging

Object Store generates log files for the following components:
  • MOSS
  • CLDB S3 server module
  • MSI (interface module between MOSS and the file system)
The following table lists and describes the log files produced by Object Store:
Log File Description
moss.log
  • Contains MOSS server logs.
  • Located in /opt/mapr/logs/moss.log.
  • To increase Object Store server logging, change logging in /opt/mapr/conf/moss.conf to DEBUG. The system outputs debug messages to moss.log.
moss.fileclient.log
  • Contains MOSS file client related messages.
  • Located in /opt/mapr/logs/moss.fileclient.log.
  • To increase Object Store client logging, change the fs.mapr.trace property in /opt/mapr/conf/moss-core-site.xml to DEBUG.
moss.out
  • Logs all service orchestration messages and messages for failures and crashes.
  • Located in /opt/mapr/logs/moss.out.
cldb.log
  • Logs the CLDB S3 server module debug information.
  • Located in /opt/mapr/logs/cldb.log.
  • To generate the CLDB S3 server module debug log messages, run:
    maprcli setloglevel cldb -classname -loglevel DEBUG -node cldbnode
mfs.log
  • Contains MSI log information.
  • Located in /opt/mapr/logs/mfs.log-5.
  • To set the log level for MSI to DEBUG, run:
    maprcli trace setlevel -module MSI -level DEBUG

Debugging

You can debug MOSS with the mc admin profile command or DNU/GDB debugger.

Before a user can run the mc commands, the /opt/mapr/conf/ca/chain-ca.pem file must be copied to ~/.mc/certs/CAs/ on the node running mc. Also, a symbolic link must be created in the user directory. To create the symbolic link for a user, run:
su - <user>
mkdir -p ~/.mc/certs/CAs
ln -s /opt/mapr/conf/ca/chain-ca.pem ~/.mc/certs/CAs/chain-ca.pem
MOSS Profile
The mc admin profile command returns information about the MOSS thread activities. Run the start command, wait a few seconds and then run the stop command. The command outputs a zip file that you can unzip to access text files. View (vim or cat) the text files to see the activity of the MOSS threads.

Run the mc admin profile command, as shown:

/opt/mapr/bin/mc admin profile start --type goroutines mapralias

/opt/mapr/bin/mc admin profile stop mapralias
Debug with DNU/GDB Debugger
Running the debugger is helpful if MOSS crashes.
Run the debugger, as shown in the following example:
gdb /opt/mapr/server/moss <moss/core/path>

Debugging Bucket Metrics

Bucket metrics provide you with account-level and bucket-level statistics, such as the total size of an account, the total object count, historical usage of buckets, and so on. The MOSS server includes an SRM (storage recovery metrics) component that automatically recovers metrics for buckets, updates statistics, and reclaims space when any issues occur; for example, if a put operation does not complete. SRM loads metrics into the BucketList table.

The BucketList table is the source of truth for statistics; it provides the last time a bucket recovery occurred. You can access the BucketList table through the mc lb and mc stat commands or in the Object Store UI. You can also get stats from olt statsfid when you run the mrconfig s3 bucketstats command.

The following table describes the interfaces through which you can access bucket metrics:
Interface Description
Object Store UI
  • Collectd collects statistics from the BucketList table and pushes them to OpenTSDB. The Object Store UI and Grafana query OpenTSDB to plot and chart data.
  • The graphs may not always reflect changes to a bucket or show accurate data.
mc stat Returns bucket-level statistics.
mc lb
  • Returns aggregated statistics from the BucketList table for all buckets.
  • Gives additional statistics, such as inProgressCount, inProgressSize, and deleteMarker count.
mrconfig s3 bucketstats Returns statistics from OLT StatsFid for a given bucket.
Resolving Issues with Bucket Metrics
The following issues could result in inaccurate metrics. If the solutions provided do not resolve the issue, you can manually trigger recovery. Recovery is performed by the MOSS server on the node where the master copy of the OLT table's first tablet is hosted.

Verify that SRM Triggered on the Correct MOSS Node
Each bucket is assigned to a different MOSS server for recovery. If recovery is triggered on the wrong node, the system outputs the following error:
mc: <ERROR> Unable to start bucket recovery. We encountered an internal error, please try again.: cause(bucket not assigned).
You can view recovery details in the table dump:
/opt/mapr/server/tools/mossdb dump table -type bucket /var/objstore/domains/primary/BucketListTable
Metrics do not display or do not update in the Object Store UI.

If you cannot see statistics for objects uploaded to Object Store or if the statistics that display are not accurate, there may be an issue with collectd or opentsdb. Collectd pulls data from nodes and opentsdb is needed to view charts in the Object Store UI. Collectd should be installed and running on all nodes. If collectd stops running on a node, statistics will not display for objects. Opentsdb should be installed and running on at least one node.

If the collectd or opentsdb service is not running, restart the service.

Metrics in Object Store UI are accurate, but do not reflect correctly elsewhere.

If any of MOSS or the file server nodes are down, statistics will not be pushed to the BucketList table.

Each bucket is associated with a MOSS server. The SRM component in the MOSS server updates the BucketList table. If the BucketList table is not updated, this could indicate that a MOSS server assigned to the bucket is down.
  • Verify that all MOSS servers are running. If a MOSS server or file server are down, restart the server. Once restarted, the server should automatically push the data to the bucket and update the metrics.
  • If restarting the MOSS server does not work, run s3 bucketstats <bucketname> and look at olt statsFid.
  • You can also look at the logs and enable debugging to see if you can identify the issue in the debug log.
  • If you need an immediate statistics update, run mrconfig s3 refreshstats <bucketName>. This command sends a request for all tables to push individual statistics to the file server. Eventually the aggregated statistics will be pushed to the BucketList table. If you do not run mrconfig s3 refreshstats <bucketName>, statistics will be automatically refreshed at the next recover.

Debugging Volumes

Every account is associated with a volume and every bucket is associated with a volume. An account is associated with a root volume, which stores metadata for the account, including users, groups, and policies. Bucket volumes store data and metadata.

Account and bucket volumes are not exposed externally to users and users do not interact directly with the volumes; however, you may need to see volume details if issues related to a volume arise or the system raises an alarm, for example:
  • If a container is not accessible.
  • You need to run fsck.
  • An offload fails, which would trigger an offload failure in the UI.
In an offload failure scenario, you may not recognize the volume. Should this occur, you can look at the volume name in the volume list to identify which bucket the offload failure is related to and then respond accordingly. You can also look at the logs to see why the offload failed.

Find the Volume Name and Details
Working with Bucket Volumes provides instructions for finding the name of a volume and viewing volume details.

Delete a Volume to Reclaim Space
If you want to reclaim space, you can remove objects; however, deleting objects is a time-consuming process. Instead, you may prefer to delete the volume if deleting all buckets in the volume is feasible.

To delete the root volume for an account, you must delete the account. When the account is deleted, the root volume is automatically deleted.

To delete a bucket volume, run:
maprcli volume remove -deletes3bucket true -name <volumeName>
CAUTION This command only works on volumes where all buckets are non-worm buckets. Running this command against a volume with worm buckets fails.

Additional Tips

In addition to the troubleshooting information provided in this topic, you may also find the following tips helpful:
Check if a bucket exists
To see if a bucket exists, check in the MOSS file client log by running:
grep -nira "<bucketName> /opt/mapr/logs/moss.fileclient.log | more

Check if a bucket is created
To see if a bucket is created, you can check the mfs log by running:
grep -nir "<bucketName>" /opt/mapr/logs/mfs.log-3
The output will state S3BucketCreateFinish.

Get the container ID for a bucket
The request to create a bucket goes to the CLDB and the CLDB creates a volume for the bucket. You can look at the CLDB log to gather data for troubleshooting, such as the container ID, by running:
grep -nira "<volumeName>" /opt/mapr/logs/cldb.log
TIP Working with Bucket Volumes provides instructions for finding the name of a volume associated with a bucket.
You can dump the container to get additional details, such as the file server node IP address, for example:
maprcli dump containerinfo -ids <containerID> -json
Issues running S3 cmd
If the system returns an error similar to the following when you run S3cmd, such as S3cmd ls s3://:
ERROR: SSL certificate verification failure: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)
Run the following command to resolve the error:
s3cmd -ca-certs=/opt/mapr/conf/ca/chain-ca.pem ls s3://
When you run this command, the system returns an error about the access key and prompts you for the keys. To add an access key, run:
s3cmd -configure

And follow the instructions at the prompt.

Issues running AWS
If you try to run the following AWS command:
aws s3 ls s3:// --endpoint-url https://<hostname>:9000
You may get an error similar to the following:
SSL validation failed for https://<hostname>:9000/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)
To resolve the error, point the aws configuration file to the /opt/mapr/conf/ca/chain-ca.pem directory or export it, as shown:
export AWS_CA_BUNDLE=/opt/mapr/conf/ca/chain-ca.pem
If that does not work, run:
aws configure