Monitoring the Cluster

Monitoring Cluster Health Using the Control System

Procedure

Log in to the Control System and click Overview.

The Overview page displays the following panes:

Node Health — the health of the nodes on the cluster, by service (default) or topology
Active Alarms — a summary of active alarms for the cluster
Cluster Utilization — CPU, memory, and disk space usage
Yarn — the number of running and queued applications, number of Node Managers, and percent of memory and CPU's used relative to the amount configured

NOTE During installation using the Installer, you can configure metrics and logging using settings on the Monitoring page of the Installer user interface. The metrics collection infrastructure must be installed because the Control System relies on these metrics to provide graphs and charts in the panes. If the metrics collection infrastructure is not installed, you cannot visualize the metrics in the various panes. If you want, you can install metrics collection or logging by selecting the feature during an Incremental Install.

Viewing Cluster Utilization Information on the Control System

About this task

The Cluster Utilization pane in the Overview page displays the following for:

CPU — Percentage of cores currently utilized and total cores
Memory — Percentage of memory (in GB) currently utlized and total memory (in GB)
Disk — Percentage of space (in GB) currently utilized and total disk space (in GB)

The Cluster Utilization pane also shows the amount of raw data and the savings (in percentage) after compression.

The Utilization Trend pane shows CPU, memory, and disk usage trend for the last 24 hours by default. You can select a preset (shown in the following screenshot) or specify a custom time range (shown in the following screenshot).

You can zoom in (by clicking and dragging the cursor in the pane) for a more granular view. Click Reset Zoom to zoom out and return to selected date/time range view. If there were any alarms during the selected date/time range, the Alarms pane above shows:

When the alarm was raised
The severity of the alarm
- — an error
- — a warning
- — information

Monitoring Cluster Alarms on the Control System

About this task

See Viewing Active Cluster Alarms for more information.

Retrieving Cluster Information Using the CLI or REST API

About this task

The basic command to retrieve cluster health and disk space information is:

maprcli dashboard info -cluster <cluster>

The utilization field in the output shows the total and utilized amount of disk space, memory, and CPU for the cluster, which can also be visualized on the Control System. For example:

# /opt/mapr/bin/maprcli dashboard info -json
{
	"timestamp":1525230746268,
	"timeofday":"2018-05-01 08:12:26.268 GMT-0700 PM",
	"status":"OK",
	"total":1,
	"data":[
		{
			...
			"utilization":{
				"cpu":{
					"util":7,
					"total":8,
					"active":0
				},
				"memory":{
					"total":15886,
					"active":11281
				},
				"disk_space":{
					"total":273,
					"active":0
				},
				"compression":{
					"compressed":0,
					"uncompressed":0
				},
				"tiering":{
					"logicalUsed":0,
					"replicatedLogicalUsed":0,
					"replicatedTotalUsed":0,
					"ecTotalUsed":0,
					"cvTotalUsed":0,
					"offloaded":0,
					"recalled":0
				}
			},
			...
		}
	]
}

For information on all the fields returned by this command, see dashboard info.