guts

guts is a tool to measure/analyse performance. In the default mode, it prints one line every second, and counts the number of operations or bytes-processed in one second intervals. guts is an internal utility, and is subject to change without notice.

guts provides information on entities such as:

CPU: Indicates whether the CPU is idle or busy

RPCs: Number of RPCs, RPCs-in, RPCs-out, bytes-in, and bytes-out

MFS: Number of local-writes, local-reads, and other operations (such as lookup, create and remove)

Log: Number of log writes, log flushes, and log force-flushes

IO: Number of disk-operations/second (read/write), and the disk-io/second in MB (read/write)

Syntax

/opt/mapr/bin/guts
    guts
    -help
  instance:<id>  time:unix time:all time:none (add timestamp to output)
  key:5660 (is server port)
  shmid:<shared memory id> (client's shared memory id, check ipcs)
  threadcpu:core threadcpu:all
  cpu:none  cpu:all
  net:sum  net:msum  net:ksum  net:all  net:none
  disk:none disk:ops disk:mb disk:all
  diskMajor:major# of disk
  ssd:none ssd:all
  cache:none cache:small cache:med cache:all
  cleaner:small  cleaner:all  cleaner:none
  fs:rw   fs:all  fs:none
  kv:all  kv:none
  btree:all  btree:none
  allocator:all allocator:none
  rpc:none  rpc:op  rpc:all rpc:debug
  db:none db:op db:get db:put db:scan db:all
  dbrepl:none dbrepl:op dbrepl:all
  streams:none streams:op streams:all
  dsec:infinity (run time in sec.)
  period:n (output every n sec.)
  cache:small  cache:med  cache:all  cache:none
  log:all  log:none
  btree:all btree:none
  resync:all resync:none
  io:all io:small
  hb:all io:none
  gateway:all gateway:op gateway:lc gateway:none
  mastgateway:all mastgateway:tier mastgateway:db mastgateway:mfsops mastgateway:none
  fstier:all fstier:none
  nfs:all nfs:none
  moss:all moss:basic moss:none
  client:none client:db client:fs client:all (requires shmid parameter)
  clientpid:<process id of a running client process>
  nfs4client:all
  fuse:all shmid:<shared memory id> (posix client's shared memory id, check fuse logs)
  header:all  header:none (doesn't seem to work)
  flush:none  flush:line (if line, then output is flushed on every output line)
defaults: time:none net:none disk:none rpc:op db:op db:put dbrepl:none streams:none fs:rw cache:small kv:none 
  cleaner:short log:none btree:none resync:none period:1

Interpreting Output

The prefix c identifies client metrics. The suffix P refers to the number of pending RPCs. The suffix C denotes the number of completed RPCs.

The pending metrics are a snapshot of pending RPCs when the output is printed. The completed metrics are the increase that happened in the last print interval.

Parameters and Output

CPU: cpu:all — Percentage of idle time of each CPU on the system in the last second.

IO

The metrics are ior and iow, which are displayed by default.

ior — The first number reports the number of I/O reads for a machine in the last second. The second number reports the amount of I/O reads in MB in the last second.
iow — The first number reports the number of I/O writes for a machine in the last second. The second number reports the amount of I/O writes in MB in the last second.

Disk

disk:ops — Number of I/O requests (read+write) for each disk in the last second.
disk:mb — Amount of I/O in MB (read+write) for each disk in the last second.
disk:all — The preceding two numbers for each disk in the last second. The first number is from disk:ops, the second number is from disk:mb.

Filesystem

fs:rw — Reports MFS file system activities. Reported metrics are:

read — The first number reports the number of remote reads in the last second. The second number reports the amount of data read in MB in the last second.
write — The first number reports the number of remote writes in the last second. The second number reports the amount of data written in MB in the last second.
lread / lwrite — are similar to the read and write metrics, but are applicable for local reads/writes.

In addition, guts displays the following filesystem metrics:

crP — Total pending read RPCs in the last second.
crC — Total completed read RPCs in the last second.
cwP — Total pending write RPCs in the last second.
cwC — Total completed write RPCs in the last second.
ccP — Total pending create RPCs in the last second.
ccC — Total completed create RPCs in the last second.
cuP — Total pending unlink RPCs in the last second.
cuC — Total completed unlink RPCs in the last second.

RPC

Reports the following metrics:

rpc:none — Does not display any RPC related metrics.
rpc:op — rpc metric
rpc:all — rpc, im, and om metrics.
rpc — Number of RPC calls received in the last second.
im — Amount of RPC calls received in MB in the last second.
om — Amount of RPC calls sent in MB in the last second.

Cache

cache:small — Metrics on inode and dentry cache, which are displayed by default. The metrics reported are:

icache (inode cache) — The first number reports the number of inode cache lookups in the last second. The second number reports the number of inode cache lookup misses in the last second.
dcache (dentry cache) — The first and second numbers report dcache lookups and lookup misses in the last second, respectively.

Network

net:sum — Total network traffic in bytes received and transmitted from all network interfaces for a machine.
net:msum — Total network traffic in megabytes.
net:ksum — Total network traffic in kilobytes.
net:all — Not yet implemented.
net:none — Does not display any network related metrics.

Metrics returned are:

nI — Total amount of network traffic received in bytes in the last second. This is a summation of network traffic from all network interfaces for a machine.
nO — Total amount of network traffic sent in bytes in the last second. This is a summation of network traffic from all network interfaces in a machine.

Database

db:get — Metrics related to gets. The output columns are as follows:

rOP — Number of RPCs completed for type OP in the last second.
rOPR — Number of rows processed from all RPCs of type OP in the last second.
tOPR — Number of rows processed from all RPCs in the last second.
cOP — Number of in-progress RPCs for the OP (not differential).

Cleaner Metrics

guts displays the following cleaner metrics:

di — Number of inodes dirtied by update operations in the last second.
ic — Number of inodes cleaned by the drainer in the last second.
dd — Number of data blocks dirtied by update operations in the last second.
dc — Number of data blocks cleaned by the drainer in the last second.

Operational Metrics

guts displays the following operational metrics:

rput — Number of put RPCs completed in the last second.
rputR — Sum of put rows completed in the last second, from all put rpcs.
tputR — Sum of put rows completed in the last second, from all rpcs (put, increment, checkAndPut, Append ..)
cput — Number of put RPCs in progress currently. This is not a differential, but displays the number of outstanding put RPCs at that particular instant.
rget — Number of get RPCs completed in the last second.
rgetR — Sum of get rows completed in the last second, from all get RPCs.
tgetR — Sum of get rows completed in the last second, from all rpcs (get, increment, checkAndPut, Append ..)
cget — Number of get RPCs in progress currently. This is not a differential, but displays the number of outstanding get RPCs at that particular instant.
rsc — Number of scan RPCs completed in the last second.
rscR — Sum of scan rows returned in the last second, from all scan RPCs.
csc — Number of scan RPCs currently in progress. This is not a differential, but shows the number of outstanding scan RPCs at that particular instant.
rinc — Number of increment RPCs completed in the last second.
cinc — Number of increment RPCs currently in progress. This is not a differential, but shows the number of outstanding increment RPCs at that particular instant.
rchk — Number of checkAndPut/checkAndDelete RPCs completed in the last second.
rapp — Number of append RPCs completed in the last second.
rtlk — Number of tablet lookup RPCs completed in the last second.
ctlk — Number of tablet lookup RPCs currently in progress. This is not a differential, but shows the number of outstanding lookup RPCs at that particular instant.
rbulkb — Number of bulk-import-bucket RPCs completed in the last second.
rbulks — Number of bulk-import-segment RPCs completed in the last second.

Put Metrics

guts displays the following put metrics:

rput — Number of put RPCs completed in the last second.
rputR — Sum of put rows completed in the last second, from all put rpcs.
tputR — Sum of put rows completed in the last second, from all rpcs (put, increment, checkAndPut, Append ..)
cput — Number of put RPCs in progress currently. This value is not a differential, but displays the number of outstanding put RPCs at that particular instant.
rsf — Reserved free memory in MemIndex in MB. If this value falls very low, put RPCs can get throttled. This value is not a differential.
bucketWR:
- Column1 : Number of bucket writes (calls to MFS) in the last second.
- Column2 : Amount of bucket writes in MB in the last second.
fl — Number of bucket flushes fired in the last second.
ffl — Number of force-flushes of buckets in the last second. If the bucket was flushed before it reached its optimal size, then the flush is counted as a force-flush.
sfl — Number of segments touched by the bucket-flushes in the last second.
mcom — Number of segments mini-packed in the last second.
fcom — Number of segments packed fully in the last second.
ccom — Number of segment packs running currently. This value is not a differential.
scr — Number of segment creates in the last second.
spcr — Number of spill creates in the last second.

Get Metrics

guts displays the following get metrics:

rget — Number of get RPCs completed in the last second.
rgetR — Sum of get rows completed in the last second, from all get RPCs.
tgetR — Sum of get rows completed in the last second, from all rpcs (get, increment, checkAndPut, Append ..)
cget — Number of get RPCs currently in progress. This is not a differential, but displays the number of outstanding get RPCs at that particular instant.
vcM — Size of the value-cache in MB. This value is not differential.
cL — Number of value-cache lookups in the last second.
vcH — Number of value-cache hits in the last second.
bget — Number of bucket gets in the last second. Will be 0 if there are no active buckets.
sg — Number of segment gets in the last second. Will normally be equal to tgetR minus the number of value-cache hits.
spg — Number of spill gets in the last second. This value is calculated as sigma(segments * spill-per-segment) - bloomFilterSkips
bskp — Number of spill gets that were avoided/saved by the bloom filter in the last second.

Scan Metrics

guts displays the following scan metrics:

rsc — Number of scan RPCs completed in the last second.
rscR — Sum of scan rows returned in the last second, from all scan RPCs.
csc — Number of scan RPCs currently in progress. This is not a differential, but shows the number of outstanding scan RPCs at that particular instant.
bsc — Number of buckets scanned in the last second.
ssc — Number of segments scanned in the last second.
spsc — Number of spills scanned in the last second.
spscR — Number of rows scanned from spills in the last second.
ldbr — Number of ldb blocks read in the last second.
blkr — Number of data blocks read in the last second (over spills, buckets ..)
raSg — Number of segments for which read-ahead was done in the last second.
raSp — Number of spills for which read-ahead was done in the last second.
nAdv — Number of fadvise calls made to MFS for scan read-ahead in the last second.
raBl — Sum of blocks in the fadvise calls made to MFS for scan read-ahead in the last second.

Cumulative Metrics

guts displays the following cumulative metrics:

cmP — Total pending RPCs from the client in the last second.
cmC — Total completed RPCs from the client in the last second.

DB Metrics

guts displays the following database metrics:

cgP — Total pending get RPCs.
cgC — Total completed get RPCs.
cpP — Total pending put RPCs.
cpC — Total completed put RPCs.
csP — Total pending scan RPCs.
csC — Total completed scan RPCs.
ciP — Total pending increment RPCs.
ciC — Total completed increment RPCs.
caP — Total pending append RPCs.
caC — Total completed append RPCs.
cgR — Total client get rows.
cpR — Total client put rows.
csR — Total client scan rows.
ciR — Total client increment rows.
caR — Total client append rows.

Example Usage

The following example demonstrates viewing client metrics. Perform the following steps:

Find the process ID of the client program.
Find all the shared memory segments (shmem) for this program:
```
ipcs -mp | grep <pid>
  998080521  root       30030      21850
  998113290  root       30030      30030
  ^^^^^^^^^
  shmem ID
```
Here, there are two shared memory segments — one between the client and MFS, and the other between the client and guts.

Identify the correct shmem segment for guts:

ipcs | grep 998113290
  0x00000000 998113290  root       666        2288       1          dest
ipcs | grep 998080521
  0x00000000 998080521  root       660        20971520   1          dest
                                           ^^^^^^^^
                                           size

The shmem with size 20M is between client and MFS. Here, we select shmem with ID 998113290.

Run guts:

/opt/mapr/bin/guts client:all shmid:998113290
  Printing only client statistics
  cmP    cmC    cgP    cgC    cpP    cpC    csP    csC    ciP    ciC    caP   caC    crP    crC    cwP    cwC    ccP    ccC    cuP    cuC
    0      0      0      0      0      0      0      0      0      0      0     0      0      0      0      0      0      0      0      0
    0      0      0      0      0      0      0      0      0      0      0     0      0      0      0      0      0      0      0      0

Pass the shmem ID and one of the client options. Client options are one of:

none — Used when printing MFS/dbserver statistics
db — Prints client statistics for DB operations
fs — Prints client statistics for filesystem operations
all — Prints all client statistics

CLDB Guts

The cldbguts utility prints information about active container reports, full container reports, registration requests, MapR-FS heartbeats, NFS server heartbeats, and containers. For more information, see cldbguts.

NFS Guts

guts displays the following NFS metrics:

req — Number of requests received from all the NFS clients to this NFS server in the last second.
dpC — Number of dropped calls from NFS client due to running out of ONC handles (probably cluster is responding slow OR NFS client is bombarding the NFS server ).
inReadReq — Number of incoming read requests from NFS clients.
outReadResp — Number of outgoing read request responses to NFS Clients.
inReadDataReq — Size/Length of incoming read requests (buffer size) from NFS Clients.
outReadDataResp — Size/Length of outgoing read request response (buffer size) to NFS Clients.
inWriteReq — Number of incoming write requests from NFS clients.
outWriteResp — Number of outgoing read request responses to NFS clients.
inWriteDataReq — Size/Length of incoming write request (buffer size) from NFS clients.
outWriteDataResp — Size/Length of outgoing write request response (buffer size) to NFS Clients.

Running Guts

Start guts on the node for which you need to collect metrics.

/opt/mapr/bin/guts
  00 01 02 03 04 05 06 07   rpc   lpc     write   lwrite   bwrite      read    lread      icache       dcache     di  ic    dd  dc       ior      iow  rput  rputR  cput  tputR  rget  rgetR  cget  tgetR   rsc    rscR  csc
  86 90 84 84 87 93 81 84     5     6     0   0    1   0    0   0     0   0    3   0     8     0    163     1    337  22    13  16     1   0   73   4     0      0     0      1     0      0     0      0     0       0    0
  62 77 70 82 93 61 50 84    12    20     0   0    3   0    0   0     0   0   10   0    27     0     41     0      6   0     3   0     0   0    0   0     0      0     0      3     0      0     0      0     0       0    0
  63 78 59 56 84 64 32 86     4     5     0   0    5   0    0   0     0   0    0   0     5     0     27     0      8   0    22   0     0   0    0   0     3   1506     0   1506     0      0     0      0     0       0    0
  83 76 77 82 68 69 82 67     1     0     0   0    0   0    0   0     0   0    0   0     0     0      0     0      0   0     0   0     0   0    0   0     0      0     0      0     0      0     0      0     0       0    0
  94 49 91 56 75 48 57 92     1     0     0   0    0   0    0   0     0   0    0   0     0     0      0     0      0   0     0   0     0   0    0   0     0      0     0      0     0      0     0      0     0       0    0
  97 96 99 89 93 94 82 95     2     0     0   0    1   0    0   0     0   0    0   0     1     0      8     0      2   0     1   0     0   0    0   0     0      0     0      0     0      0     0      0     0       0    0
  99 99 96 97 99 98 99 82    19     6     0   0    1   0    0   0     0   0    3   0   186     0     18     0      0   0     0   0     0   0    0   0     0      0     0      1     0      0     0      0     0       0    0

To stop collecting metrics, press ^C.