NodeManager Alarm

Explains how to resolve the issue with the Node Manager service stopping on the node.

UI Column

NodeManager Alarm

Logged As

NODE_ALARM_SERVICE_NODEMANAGER_DOWN

Meaning
The NodeManager service on the node has stopped running.
Resolution

Go to the node information page or the Services page in the Control System to check whether NodeManager is running. Warden will try three times to restart the service automatically ever 30 minutes (by default). This 30 minute interval can be reconfigured using the parameter services.retryinterval.time.sec in the warden.conf file.

If warden successfully restarts the NodeManager, the alarm is cleared. If warden is unable to restart the NodeManager, see more troubleshooting information.