Patching a Secure Cluster

Explains how to patch a secure cluster when you are unable to establish a secure connection.

About this task

Once the fix is complete, no further action is required except to access the Control System and other web interfaces, such as the JobTracker UI and the ResourceManager UI.

Procedure

  1. Perform the following steps on any cluster node:
    1. Download the script from the following location: https://package.ezmeral.hpe.com/scripts/mcs/ For example:
      wget https://package.ezmeral.hpe.com/scripts/mcs/fixssl
    2. Run the following command to update the permissions on the file:
      chmod 755 fixssl
    3. Run the following command to run the script:
      sudo ./fixssl

      Once you run the script, the following is displayed

      
      Creating 10 year self signed certificate with subjectDN='CN=*.us-west-2.compute.internal'
      Certificate stored in file </tmp/tmpfile-mapcert.3743>
      Certificate was added to keystore
                                      
      *****************************************************************************************
      * In order for your cluster to work, please copy the following files in /opt/mapr/conf  *
      * to all the nodes in the cluster, to the same directory: ssl_keystore ssl_truststore   *
      * After copying the files to the other nodes, please restart CLDB, Webserver, and any   *
      * other service that utilizes https (Jobtracker, tasktracker)                           *
      * (See doc for more details if you do not wish to have downtime in your cluster)        *
      *****************************************************************************************
      
  2. On each node in the cluster, back up existing certificates and copy the certificates to all other nodes in the cluster. For example:
    $ maprcli node list -columns ip
     hostname ip
     ip-172-31-18-196.us-west-2.compute.internal 172.31.18.196
     ip-172-31-18-197.us-west-2.compute.internal 172.31.18.197
     ip-172-31-18-198.us-west-2.compute.internal 172.31.18.198
     ip-172-31-18-199.us-west-2.compute.internal 172.31.18.199
     ip-172-31-18-200.us-west-2.compute.internal 172.31.18.200
                        
    $ ssh 172.31.18.200 "mv /opt/mapr/conf/ssl_keystore /opt/mapr/conf/ssl_keystoreold"
                        
    $ ssh 172.31.18.200 "mv /opt/mapr/conf/ssl_truststore /opt/mapr/conf/ssl_truststoreeold"
                        
     $ scp /opt/mapr/conf/ssl_keystore /opt/mapr/conf/ssl_truststore mapr@172.31.18.200:/opt/mapr/conf
                     
  3. Restart the CLDB secondary services. To do this, first you determine which cluster nodes are running the CLDB service and then determine which node is running the primary CLDB. The secondary instances are the non-primary CLDB nodes. For example:
    $ maprcli node list -columns configuredservice -filter '[configuredservice==cldb]'
     hostname                                     configuredservice                                   ip             
     ip-172-31-18-198.us-west-2.compute.internal  webserver,cldb,fileserver,nfs,hoststats,jobtracker  172.31.18.198  
     ip-172-31-18-199.us-west-2.compute.internal  webserver,cldb,fileserver,nfs,hoststats,jobtracker  172.31.18.199  
     ip-172-31-18-200.us-west-2.compute.internal  webserver,cldb,fileserver,nfs,hoststats,jobtracker  172.31.18.200  
                        
     $ maprcli node cldbmaster
     cldbmaster                                                                           
     ServerID: 8868598593037642491 HostName: ip-172-31-18-199.us-west-2.compute.internal  
                        
     $maprcli node services -cldb restart -nodes 172.31.18.198 172.31.18.200                   
                    
  4. Restart half of the TaskTracker and Nodemanager services.
    1. List all TaskTracker or NodeManager Hosts. For example:
      
       $ maprcli node list -columns configuredservice -filter '[configuredservice==tasktracker]or[configuredservice==nodemanager]'
       hostname                                     configuredservice                     ip             
       ip-172-31-18-196.us-west-2.compute.internal  fileserver,tasktracker,nfs,hoststats  172.31.18.196  
       ip-172-31-18-197.us-west-2.compute.internal  fileserver,tasktracker,nfs,hoststats  172.31.18.197
                              
    2. Restart TaskTracker and NodeManager services on half of the nodes that run those services. For example, the following command will restart both TaskTracker and NodeManager services on all nodes specified. If either service is not configured on that node, it will ignore it.
      
       $ maprcli node services -multi '[{ "name": "tasktracker", "action": "restart"}, { "name": "nodemanager", "action": "restart"}]' -nodes 172.31.18.196
       ERROR (10002) -  Service: nodemanager is not configured on node: ip-172-31-18-196.us-west-2.compute.internal  
                              
  5. Restart JobTracker and ResourceManager services.
    1. List all nodes running JobTracker or ResourceManager. For example:
      
      $ maprcli node list -columns configuredservice -filter '[configuredservice==jobtracker]or[configuredservice==resourcemanager]'
      hostname                                     configuredservice                                   ip            
      ip-172-31-18-198.us-west-2.compute.internal  webserver,cldb,fileserver,nfs,hoststats,jobtracker  172.31.18.198 
      ip-172-31-18-199.us-west-2.compute.internal  webserver,cldb,fileserver,nfs,hoststats,jobtracker  172.31.18.199 
      ip-172-31-18-200.us-west-2.compute.internal  webserver,cldb,fileserver,nfs,hoststats,jobtracker  172.31.18.200
                              
    2. Restart JobTracker and ResourceManager services. For example, the following command will restart both JobTracker and ResourceManager services on the specified nodes. If either service is not configured on that node, it will ignore it.
      
      $ maprcli node services -multi '[{ "name": "jobtracker", "action": "restart"}, { "name": "resourcemanager", "action": "restart"}]' -nodes 172.31.18.198 172.31.18.199 172.31.18.200
      ERROR (10002) -  Service: resourcemanager is not configured on node: ip-172-31-18-199.us-west-2.compute.internal
      ERROR (10002) -  Service: resourcemanager is not configured on node: ip-172-31-18-200.us-west-2.compute.internal
      ERROR (10002) -  Service: resourcemanager is not configured on node: ip-172-31-18-198.us-west-2.compute.internal 
                              
  6. Restart remaining TaskTracker and NodeManager services. For example, the following command will restart both TaskTracker and NodeManager services on the specified nodes. If either service is not configured on that node, it will ignore it.
    $ maprcli node services 
    -multi '[{ "name": "tasktracker", "action": "restart"}, { "name": "nodemanager", "action": "restart"}]' 
    -nodes 172.31.18.197 ERROR (10002) -  Service: nodemanager is not configured on node: ip-172-31-18-197.us-west-2.compute.internal
                    
  7. Restart additional secure services (Oozie, HistoryServer, Webserver, HiveServer2, Hue). For example, the following command can be run with the IPs or hostnames of all nodes in the cluster, as it will only restart the services that it finds:
    $ maprcli node services 
                        -multi '[{ "name": "hue", "action": "restart"}, 
                            { "name": "historyserver", "action": "restart"}, 
                            { "name": "webserver", "action": "restart"}, 
                            { "name": "oozie", "action": "restart"}, 
                            { "name": "hs2", "action": "restart"}]' 
                        -nodes 172.31.18.198 172.31.18.199 172.31.18.200 172.31.18.196 172.31.18.197
                    
  8. Restart CLDB primary service. For example:
    $ maprcli node cldbmaster
    cldbmaster                                                                           
    ServerID: 8868598593037642491 HostName: ip-172-31-18-199.us-west-2.compute.internal  
      
    $ maprcli node services -cldb restart -nodes 172.31.18.199               

Results

The fixssl script performs the following steps on a node in a secure cluster:
  1. Updates manageSSLKeys.sh to use the new certificate cipher algorithm.
  2. Backs up the existing certificates so that new versions can be generated with the new cipher algorithm:
    • /opt/mapr/conf/ssl_keystore is renamed to /opt/mapr/conf/ssl_keystore_old
    • /opt/mapr/comf/ssl_truststore is renamed to /opt/mapr/comf/ssl_truststore_old
  3. Runs the following command to generate new versions of the keystore and truststore files:
    /opt/mapr/manageSSLKey.sh  create -N <clustername> -ug <maprusername>:<maprgroup>
    • The cluster name is retrieved from /opt/mapr/conf/mapr-clusters.conf.
    • The mapr user and mapr group is retrieved from /opt/mapr/conf/daemon.conf.