Verbose Data
Data is reported in this form when either --verbose is used OR if there is at least one
type of data requested that doesn't have a brief form such as any detail data or
ionodes, processes or slabs. Specifying some of the lustre output options with --lustopts
such as B, D and M will also force verbose format.
CPU, collectl -sc
# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# USER NICE SYS WAIT IRQ SOFT STEAL IDLE INTR CTXSW PROC RUNQ RUN AVG1 AVG5 AVG15
These are the percentage of time the system in running is one of the modes, noting that
these are averaged across all CPUs. While User and Sys modes are self-eplanitory, the others
may not be:
User |
Time spent in User mode, not including time spend in "nice" mode. |
Nice |
Time spent in Nice mode, that is lower priority as adjusted by
the nice command and have the "N" status flag set when examined with "ps". |
Sys |
This is time spent in "pure" system time. |
Wait |
Also known as "iowait", this is the time the CPU was idle during an
outstanding disk I/O request. This is not considered to be part of the total or
system times reported in brief mode. |
Irq |
Time spent processing interrupts and also considered to be part of
the summary system time reported in "brief" mode. |
Soft |
Time spent processing soft interrupts and also considered to be part
of the summary system time reported in "brief" mode. |
Steal |
Time spend in involuntary wait state while the hypervisor was servicing
another virtual processor. |
This next set of fields apply to processes
Proc | Process creations/sec. |
Runq | Number of processes in the run queue. |
Run | Number of processes in the run state. |
Avg1, Avg5, Avg15 | Load average over the last 1,5 and 15 minutes. |
Disks, collectl -sd
# DISK SUMMARY (/sec)
#KBRead RMerged Reads SizeKB KBWrit WMerged Writes SizeKB
KBRead | KB read/sec |
RMerged |
Read requests merged per second when being dequeued.
These statistics are not available in older kernels which
only record disk statistics in /proc/stat. |
Reads | Number of reads/sec |
SizeKB | Average read size in KB |
KBWrite | KB written/sec |
WMerged |
Write requests merged per second when being dequeued. |
Writes | Number of writes/sec |
SizeKB | Average write size in KB |
Inodes/Filesystem, collectl -si
# INODE SUMMARY
# Dentries File Handles Inodes
# Number Unused Alloc % Max Number
40585 39442 576 0.17 38348
DCache |
Number | Number of entries in directory cache |
Unused | Number of unused entries in directory cache |
Handles | Number of allocated file handles |
% Max | Percentage of maximum available file handles |
Inode | Number of used inode handles |
NOTE - as of this writing I'm baffled by the dentry unused field. No matter how
many files and/or directories I create, this number goes up! Sholdn't it go down?
Infiniband, collectl -sx
# INFINIBAND SUMMARY (/sec)
# KBIn PktIn SizeIn KBOut PktOut SizeOut Errors
KBIn | KB received/sec. |
PktIn | Packets received/sec. |
SizeIn | Average incoming packet size in KB |
KBOut | KB transmitted/sec. |
PktOut | Packets transmitted/sec. |
SizeOut | Average outgoing packet size in KB |
Errs | Count of current errors. Since these
are typically infrequent, it is felt that reporting them as a rate would result
in either not seeing them OR round-off hiding their values. |
Lustre
Lustre Client, collectl -sl
There are several formats here controlled by the --lustopts switch. There is
also detail data for these available as well. Specifying -sL results in
data broken out by the file system and --lustopts O further breaks it out by OST.
# LUSTRE CLIENT SUMMARY
# KBRead Reads KBWrite Writes
KBRead | KB/sec delivered to the client. |
Reads | Reads/sec delivered to the client,
not necessarily from the lustre
storage servers. |
KBWrite | KB Writes/sec delievered to the storage servers. |
Writes | Writes/sec delievered to the storage servers. |
# LUSTRE CLIENT SUMMARY: METADATA
# KBRead Reads KBWrite Writes Open Close GAttr SAttr Seek Fsynk DrtHit DrtMis
KBRead | KB/sec delivered to the client. |
Reads | Reads/sec delivered to the client,
not necessarily from the lustre storage servers. |
KBWrite | KB Writes/sec delievered to the storage servers. |
Writes | Writes/sec delievered to the storage servers. |
Open | File opens/sec |
Close | File closes/sec |
GAttr | getattrs/sec |
Seek | seeks/sec |
Fsync | fsyncs/sec |
DrtHit | dirty hits/sec |
DrtMis | dirty misses/sec |
# LUSTRE CLIENT SUMMARY: READAHEAD
# KBRead Reads KBWrite Writes Pend Hits Misses NotCon MisWin FalGrb LckFal Discrd ZFile ZerWin RA2Eof HitMax Wrong
KBRead | KB/sec delivered to the client. |
Reads | Reads/sec delivered to the client,
not necessarily from the lustre storage servers. |
KBWrite | KB Writes/sec delievered to the storage servers. |
Writes | Writes/sec delievered to the storage servers. |
Pend | Pending issued pages |
Hits | prefetch cache hits |
Misses | prefetch cache misses |
NotCon | The current pages read that were not consecutive with the previous ones./td> |
MisWin | Miss inside window. The pages that were expected to be in the
prefetch cache but weren't. They were probably
reclaimed due to memory pressure |
LckFal | Failed grab_cache_pages. Tried to prefetch page but it was locked. |
Discrd | Read but discarded. Prefetched pages (but not read by applicatin)
have been discarded either becuase of memory pressure or lock
revocation. |
ZFile | Zero length file. |
ZerWin | Zero size window. |
RA2Eof | Read ahead to end of file |
HitMax | Hit maximum readahead issue. The read-ahead window has grown to the
maximum specified by max_read_ahead_mb |
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#RdK Rds 1K 2K ... WrtK Wrts 1K 2K ...
This display shows the size of rpc buffer distribution buckets in K-pages. You can find the
page size for you system in the header (collectl --showheader).
RdK | KBs read/sec |
Rds | Reads/sec |
nK | Number of pages of of this size read |
WrtK | KBs written/sec |
Wrts | Writes/sec |
nK | Number of pages of of this size written |
Lustre Meta-Data Server, collectl -sl
# LUSTRE FILESYSTEM SUMMARY
#<------------- MDS --------------->
#Close Getattr Reint Sync
Close | Number of file closes/sec. |
Getattr | Number of getattrs/sec. |
Reint | Reintegrated operations/sec which are inode
modifications and unklinks. |
Sync | Number of syncs/sec. |
This display is very similar the the RPC buffers in that the sizes of different size
I/O requests are reported. In this case there are requests sent to the disk driver.
Note that this report is only available for HP's SFS.
# LUSTRE DISK BLOCK LEVEL SUMMARY
#Rds RdK 0.5K 1K ... Wrts WrtK 0.5K 1K ...
Rds | Reads/sec |
RdK | KBs read/sec |
nK | Number of blocks of of this size read |
Wrts | Writes/sec |
WrtK | KBs written/sec |
nK | Number of blocks of of this size written |
Lustre Object Storage Server, collectl -sl
# LUSTRE FILESYSTEM SUMMARY
#<----------------- OST ---------------->
#KBRead Reads KBWrite Writes
KBRead | KB/sec read |
Reads | Reads/sec |
KBWrite | KB/sec written |
Writes | Writes/sec
|
Lustre Object Storage Server, collectl -sl --lustopts B
# LUSTRE FILESYSTEM SUMMARY
#<--------read----------------writes-----------------
#RdK Rds 1K 2K ... WrtK Wrts 1K 2K ....
RdK | KBs read/sec |
Rds | Reads/sec |
nK | Number of pages of of this size read |
WrtK | KBs written/sec |
Wrts | Writes/sec |
nK | Number of pages of of this size written |
Lustre Object Storage Server, collectl -sl --lustopts D
# LUSTRE DISK BLOCK LEVEL SUMMARY
#RdK Rds 0.5K 1K ... WrtK Wrts 0.5K 1K ...
RdK | KBs read/sec |
Rds | Reads/sec |
nK | Number of blocks of of this size read |
WrtK | KBs written/sec |
Wrts | Writes/sec |
nK | Number of blocks of of this size written |
Memory, collectl -sm
# MEMORY STATISTICS
#<------------------------Physical Memory-----------------------><-----------Swap----------><-Inactive->
# TOTAL USED FREE BUFF CACHED SLAB MAPPED COMMIT TOTAL USED FREE TOTAL IN OUT
Total |
Total physical memory |
Used |
Used physical memory. This does not include memory used by the kernel itself. |
Commit |
Accorting to RedHat: "An estimate of how much RAM you would need to make a 99.99% guarantee
that there never is OOM (out of memory) for this workload." |
Swap Total |
Total Swap |
Swap Used |
Used Swap |
Swap Free |
Free Swap |
Inactive |
Inactive pages. On ealier kernels this number is the sum of the clean, dirty
and laundry pages. |
Pages/Sec In |
Total number of pages read by block devices |
Pages/Sec Out |
Total number of pages written by block devices |
Network, collectl -sn
The entries for error counts are actually the total of several types of errors.
To get individual error counts, you must report details on individual
interfaces in plot format by specifying -P. Transmission errors are categorized
by errors, dropped, fifo, collisions and carrier.
Receive errors are broken out for errors, dropped, fifo and fragments.
# NETWORK SUMMARY (/sec)
# KBIn PktIn SizeIn MultI CmpI ErrIn KBOut PktOut SizeO CmpO ErrOut
KBIn |
Incoming KB/sec |
PktIn |
Incoming packets/sec |
SizeI |
Average incoming packet size in bytes |
MultI |
Incoming multicast packets/sec |
CmpI |
Incoming compressed packets/sec |
ErrIn |
Incoming errors/sec |
KBOut |
Outgoing KB/sec |
PktOut |
Outgoing packets/sec |
SizeO |
Average outgoing packet size in bytes |
CmpO |
Outgoing compressed packets/sec |
ErrOut |
Outgoing errors/sec |
NFS, collectl -sf
These statistics will be reported for V3 servers by default but you can
choose a different version and/or client data via --nfsopts. They correspond
to the net, rpc and protocol specific sections of the nfsstat utility.
# NFS SERVER (/sec)
#<----------Network-------><----------RPC---------><---NFS V3--->
#PKTS UDP TCP TCPCONN CALLS BADAUTH BADCLNT READ WRITE
Pkts | Total network packets, which is the sum of UDP and TCP |
UDP | Number of UDP packets/sec |
TCP | Number of TCP packets/sec |
TCPConn | Number of TCP connections/sec |
Calls | Number of RPC calls/sec |
BadAuth | Number of authentication failures/sec |
BadClnt | Number of unknown clients/sec |
Read | Number of reads/sec |
Write | Number of writes/sec |
NFS, collectl -sf -nfsopts C
The data reported for clients is slightly different, specifically the
retrans and authref fields.
# NFS CLIENT (/sec)
#<----------RPC---------><---NFS V3--->
#CALLS RETRANS AUTHREF READ WRITE
Calls | Number of RPC calls/sec |
Retrans | Retransmitted calls |
Authref | Authentication failed |
Read | Number of reads/sec |
Write | Number of writes/sec |
Slabs, collectl -sy
As of the 2.6.22 kernel, there is a new slab allocator, called SLUB, and since
there is not a 1:1 mapping between what it reports and the older slab allocator,
the format of this listing will depend on which allocator is being used. The following
format is for the older allocator.
# SLAB SUMMARY
#<------------Objects------------><--------Slab Allocation-------><--Caches--->
# InUse Bytes Alloc Bytes InUse Bytes Total Bytes InUse Total
Objects |
InUse |
Total number of objects that are currently in use. |
Bytes |
Total size of all the objects in use. |
Alloc |
Total number of objects that have been allocated but not necessarily in use. |
Bytes |
Total size of all the allocated objects whether in use or not. |
Slab Allocation |
InUse |
Number of slabs that have at least one active object in them. |
Bytes |
Total size of all the slabs. |
Total |
Total number of slabs that have been allocated whether in use or not. |
Bytes |
Total size of all the slabs that have been allocted whether in use or not. |
Caches |
InUse |
Not all caches are actully in use. This included only those with non-zero
counts. |
Total |
This is the count of all caches, whether currently in use or not. |
This is format for the new slub allocator
# SLAB SUMMARY
#<---Objects---><-Slabs-><-----memory----->
# In Use Avail Number Used Total
One should note that this report summarizes those slabs being monitored. In general
this represents all slabs, but if filering is being used these numbers will only
apply to those slabs that have matched the filter.
Objects |
InUse |
The total number of objects that have been allocated to processes. |
Avail |
The total number of objects that are available in the currently allocated slabs.
This includes those that have already been allocated toprocesses. |
Slabs |
Number |
This is the number of individual slabs that have been allocated and
taking physical memory. |
Memory |
Used |
Used memory corresponds to those objects that have been allocated to
processes. |
Total |
Total physical memory allocated to processes. When there is no filtering
in effect, this number will be equal to the Slabs field reported by -sm. |
Sockets, collectl -ss
# SOCKET STATISTICS
# <-------------Tcp-------------> Udp Raw <---Frag-->
#Used Inuse Orphan Tw Alloc Mem Inuse Inuse Inuse Mem
Used | Total number if socket allocated which can include additional types such as domain. |
Tcp |
Inuse | Number of TCP connections in use |
Orphan | Number of TCP orphaned connections |
Tw | Number of connections in TIME_WAIT |
Alloc | TCP sockets allocated |
Mem | |
Udp |
Inuse | Number of UCP connections in use |
Raw |
Inuse | Number of RAW connections in use |
Frag |
Inuse | |
Mem | |
TCP, collectl -st
# TCP SUMMARY (/sec)
# PureAcks HPAcks Loss FTrans
PureAcks | ACKs/sec that only contain acks (ie no data). |
HPAcks | Fast-path acks/sec. |
Loss | Packets/sec TCP thinks have been lost coming in. |
FTrans | Fast retransmissions/sec. |