Aspersa User's Manual
The diskstats tool
Aspersa Manual › The diskstats tool

The diskstats tool is related to iostat, but has some advantages. It separates out reads and writes, for example, and computes some things that iostat does in either incorrect or confusing ways. It is also menu-driven and interactive with several different ways to aggregate the data, and integrates well with the collect tool. These properties make it very convenient for quickly drilling down into I/O performance at the desired level of granularity.

Command-Line Options and Environment Variables

The tool has the following command-line options, which must come first on the command-line, before any filenames:

-c COLS
An Awk regex of which columns to include. Any columns whose names match the regex will be included in the output.
-d DEVICES
An Awk regex of which devices to include. Any devices (disks) whose names match the regex will be included in the output.
-g GROUPBY
The group-by mode, which specifies how to aggregate the disk performance data. The following options are permitted:
disk
Each line of output shows one disk device.
sample
Each line of output shows one sample of statistics.
all
Each line of output shows one sample and one disk device.
The default value is 'disk', which (unlike iostat) shows one line for each disk in the collected statistics; this makes it easy to begin by filtering out devices you do not want to examine.
-i INTERVAL
In sample mode, include INTERVAL seconds per sample. The default is for every sample to result in one line of output. If you set this to 60, for example, then each line will contain one minute's worth of input data.
-k KEEPFILE
A file to save diskstats samples in. This is used when you specify no input file, in which case diskstats will capture and store samples of diskstats data; if you want to save them after the program exits, then specify a non-default filename here.
-n SAMPLES
When in data-gathering mode (collecting samples live instead of reading an input file), stop collecting after N samples.
-s INTERVAL
When in data-gathering mode, both sample and redisplay /proc/diskstats every N seconds (default 1).

How it Works

This program works in two main modes. One way is to process a file with saved disk statistics, which you specify on the command line. The other way is to start a background process gathering samples at intervals and saving them into a file, and process this file in the foreground. In both cases, the tool is interactively controlled by keystrokes, so you can redisplay and slice the data flexibly and easily. If the tool is not attached to a terminal, it doesn't run interactively; it just processes and prints its output, then exits. Otherwise it loops until you exit with the 'q' key.

The input is simple: the contents of /proc/diskstats, followed by timestamp lines in the format TS timestamp.nanoseconds ISO8601-timestamp. You can generate a sample file easily as follows:

$ while sleep 1; do cat /proc/diskstats >> stats.txt date +'TS %s.%N %F %T' >> stats.txt done

This is exactly what the tool itself does to generate its own input. This makes it easy to capture data even if you don't have the tool installed, transport it to another server, and analyze it there.

Example Usage

Let us start with a small sample file that is used for test cases. It is stored in the Subversion repository. Start the tool as follows. Each subsequent code listing will show the output it generates.

[baron@ginger aspersa]$ diskstats t/samples/diskstats-001.txt #ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg {4} ram0 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {4} cciss/c0d0 0.0 0.0 0.0 0.5 0.0 0.6 0% 0 {4} cciss/c0d0p1 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {4} cciss/c0d0p2 0.0 0.0 0.0 0.5 0.0 0.6 0% 0 {4} cciss/c0d1 9.6 1.4 25.1 23.3 0.0 0.1 13% 13 {4} cciss/c1d0 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {4} dm-0 0.0 0.0 0.0 0.4 0.0 0.7 0% 0 {4} md0 0.0 0.0 0.0 0.0 0.0 0.0 0% 0

The default display shown above has a line of column headers, followed by one line of output for every disk in the system. These lines are aggregated from the first to the last sample of disk statistics in the file; in this example, you can see that the input file contains 4 samples of /proc/diskstats from the system where this data was collected. The leftmost column specifies how many seconds' worth of samples were aggregated into each line.

The columns are as follows:

#ts
The number of seconds of samples in the line. If there is only one, then the timestamp itself is shown, without the {curly braces}.
device
The device name. If there is more than one device, then instead the number of devices aggregated into the line is shown, in {curly braces}.
rd_mb_s
The number of megabytes read per second, average, during the sampled interval.
rd_cnc
The average concurrency of the read operations, as computed by Little's Law (a.k.a. queueing theory). In the example above, you can see that cciss/c0d1 had on average 1.4 reads in progress at all times during the sampled interval.
rd_rt
The average response time of the read operations, in milliseconds.
wr_mb_s
Megabytes written per second, average.
wr_cnc
Write concurrency, similar to read concurrency.
wr_rt
Write response time, similar to read response time.
busy
The fraction of time that the device had at least one request in progress; this is what iostat calls %util (which is a misleading name).
in_prg
The number of requests that were in progress. Unlike the read and write concurrencies, which are averages that are generated from reliable numbers, this number is an instantaneous sample, and you can see that it might represent a spike of requests, rather than the true long-term average.

In addition to the above columns, there are a few columns that are hidden by default. If you press the 'c' key, and then press Enter, you will blank out the regular expression pattern that selects columns to display, and you will then see all columns:

Enter a column pattern: #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg {4} ram0 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 {4} cciss/c0d0 0.0 0.0 0.0 0% 0.0 0.0 17.7 28.1 0.5 86% 0.0 0.6 0% 0 {4} cciss/c0d0p1 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 {4} cciss/c0d0p2 0.0 0.0 0.0 0% 0.0 0.0 17.7 28.1 0.5 86% 0.0 0.6 0% 0 {4} cciss/c0d1 458.1 21.5 9.6 0% 1.4 25.1 985.0 24.2 23.3 0% 0.0 0.1 13% 13 {4} cciss/c1d0 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 {4} dm-0 0.0 0.0 0.0 0% 0.0 0.0 99.3 4.0 0.4 0% 0.0 0.7 0% 0 {4} md0 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0

The additional columns are as follows:

rd_s
The number of reads per second.
rd_avkb
The average size of the reads, in kilobytes.
rd_mrg
The percentage of read requests that were merged together in the disk scheduler before reaching the device.
wr_s, wr_avgkb, and wr_mrg
These are analogous to their rd_* cousins.

If you press the '?' key, you will bring up the interactive help menu that shows which keys control the program.

You can control this program by key presses: ------------------- Key ------------------- ---- Current Setting ---- A, D, S) Set the group-by mode disk c) Enter an awk regex to match column names cnc|rt|mb|busy|prg d) Enter an awk regex to match disk names (none) i) Set the sample size in seconds (none) s) Set the redisplay interval in seconds 1 p) Pause the program q) Quit the program ------------------- Press any key to continue -----------------------

If any of these isn't obvious, it should become obvious as we continue with the guided tour. First, let's switch the display from disk-per-line to sample-per-line by pressing the 'S' key so that the group-by mode becomes SAMPLE.

#ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg 2.0 {8} 10.2 1.4 23.9 24.6 0.0 0.2 12% 18 4.0 {8} 8.6 1.3 27.4 13.2 0.0 0.1 11% 17 6.0 {8} 8.8 1.4 25.5 24.1 0.0 0.1 11% 9 7.0 {8} 12.1 2.0 23.8 48.7 0.1 0.2 22% 5

Now you can see the timestamp of each column, followed by the information that all 8 devices were grouped together for each line. Let's put this to use to actually find out what is happening on this server, in terms of I/O. Switch back to DISK aggregation mode, with the 'D' key:

#ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg {4} ram0 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {4} cciss/c0d0 0.0 0.0 0.0 0.5 0.0 0.6 0% 0 {4} cciss/c0d0p1 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {4} cciss/c0d0p2 0.0 0.0 0.0 0.5 0.0 0.6 0% 0 {4} cciss/c0d1 9.6 1.4 25.1 23.3 0.0 0.1 13% 13 {4} cciss/c1d0 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {4} dm-0 0.0 0.0 0.0 0.4 0.0 0.7 0% 0 {4} md0 0.0 0.0 0.0 0.0 0.0 0.0 0% 0

It's obvious that cciss/c0d1 is the only disk this system is really using. Let's filter on that disk so we can see it more clearly. Press the 'd' key, and then type 'c0d1' to create a pattern that matches that device name:

Enter a disk/device pattern: c0d1 #ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg {4} cciss/c0d1 9.6 11.5 25.1 23.3 0.1 0.1 102% 13

Now, switch back to grouping by samples, with the 'S' key:

#ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg 2.0 cciss/c0d1 10.2 11.2 23.9 22.8 0.1 0.1 92% 18 4.0 cciss/c0d1 8.6 10.2 27.4 12.6 0.1 0.1 91% 17 6.0 cciss/c0d1 8.8 10.8 25.5 24.0 0.1 0.1 89% 9 7.0 cciss/c0d1 12.1 16.2 23.8 44.1 0.2 0.1 172% 5

It looks like this device's usage is fairly consistent from second to second — it is not spiking up and down. But the device is quite busy, mostly with reads; there are more than ten reads at a time, on average, and the response time of those reads is very slow, compared to the writes. In fact, this is a RAID controller with a battery-backed write cache, and the writes are simply going to the cache and returning, but of course the reads have to be serviced from the spindles, and that is rather slow. (This device is still performing more slowly than it should. It turned out on deeper diagnosis that something was wrong with the hardware.)

Now, what if we had a large sample, once per second, that was collected over tens of minutes? That would scroll off the screen in sample mode. To control this, we can set the size of the window that encloses a 'sample'. Press the 'i' key, and set the sample size:

Enter a sample size: 2 #ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg 2.0 cciss/c0d1 18.8 21.4 25.5 35.5 0.2 0.1 183% 17 6.0 cciss/c0d1 7.4 9.5 24.8 23.0 0.1 0.1 88% 5

Here you can see that we have zoomed out. If you have a large sample, for example many minutes, you can set the sampling interval to 60 or 300 or something like that to fit everything onto one screen.

Press 'q' to quit the program, and let's start it up in its other mode, without a file. I'll start it on my laptop. Because I did not give it a saved file to analyze, it will start a process in the background to gather current disk activity and iteratively print it out, a sample at a time. The main device on my laptop is /dev/sda2, so that is what I will start with to avoid seeing many lines of output per iteration:

[baron@ginger aspersa]$ diskstats -d sda2 #ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0

It is hard to show on this manual page, but each line above appears after a 1-second delay. As you let the program run, the background process continues appending to the file, and the foreground process continues to analyze the last sample appended every second. You can change the sampling and display interval with the 's' key.

[baron@ginger aspersa]$ diskstats -d sda2 #ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {1} sda2 0.0 0.0 0.0 0.0 0.1 21.0 6% 0 {1} sda2 0.0 0.0 0.0 0.0 0.1 21.0 6% 0 {1} sda2 0.0 0.0 0.0 0.0 0.1 55.0 6% 2 {1} sda2 0.0 0.0 0.0 0.1 0.2 29.9 5% 2 Enter a redisplay interval: 5 #ts device rd_mb_s rd_cnc rd_rt wr_mb_s wr_cnc wr_rt busy in_prg {7} sda2 0.0 0.0 0.0 0.0 0.1 29.8 3% 0 {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0 {1} sda2 0.0 0.0 0.0 0.0 0.0 0.0 0% 0

Notice that after changing the redisplay interval, the headers are reprinted, followed by a summary of all the gathered statistics up to that point; after that, new lines appear once every 5 seconds.