Aspersa User's Manual
The ioprofile tool
Aspersa Manual › The ioprofile tool

The ioprofile tool captures a process's I/O activity through lsof and strace and summarizes it. The result is a tabular display that shows you where the process spent its time on I/O operations. It performs a cross-tabulation ("pivot table") on the I/O operations to summarize them.

You might need to be careful with this tool. strace is generally safe, but there is always a chance of a bug that could cause problems with the process you're profiling. It's also possible in some cases for strace to add a lot of overhead to the process you're tracing. The faster the I/O normally is, the larger the relative overhead can be, so you might see this more dramatically on a database on a FusionIO card, for example.

Command-Line Options and Environment Variables

The tool has the following command-line options, which must come first on the command-line, before any filenames:

-a FUNCTION
Specifies the aggregation function to perform on each cell of the tabular output. By default, it is 'sum', but 'avg' is also available.
-b BINARY
Specifies the name of a process to trace and summarize. The default value is 'mysqld'.
-c CELL
Specifies what value to place into the cells of the tabular output. By default, it is 'times', which means that the cells contain the timing information about I/O operations. You can specify 'count' for a simple count, and 'sizes' for the size of the operations, in bytes.
-g GROUPBY
Specifies the item by which the I/O operations are aggregated. By default, they are aggregated by 'filename'. You can aggregate them by 'pid' to get a per-thread view of the I/O, and by 'all' to get an overall view.
-k KEEPFILE
Specifies a file to hold the strace data. The specified file will not be removed when the program finishes, so you can re-analyze it if you wish.
-p PID
Specifies a process ID to trace and summarize. Causes -b to be ignored.
-s SLEEPTIME
Specifies how long to profile. Default value is 30.

Any additional arguments on the command-line are treated as file names containing the results of lsof + strace data previously gathered, which is to be summarized. In this case, the tool doesn't gather any traces, but merely processes the ones you give it.

How it Works

The ioprofile tool begins by capturing a single sample of lsof output, which identifies the profiled process's file descriptors and the corresponding filenames. It then starts strace and waits the specified amount of time, after which it stops strace and processes the results.

It's important to know how strace is stopped, because processes that are being traced can be in a delicate state. System calls can be interrupted when the trace is started, for example. And processes that are being traced have funny signal handling semantics. So ioprofile starts strace in the background, waits the specified time, and then kills strace with first a SIGINT (because that's what a CTRL-C at the terminal would normally do, and that's how strace is normally stopped when it's run interactively), and then a SIGTERM. It then sends a SIGCONT to the process that was being traced, because in some cases it may be in a stopped state after strace exits. It is not clear whether this is fully reliable and safe. If you know a better way to do this, please share your knowledge.

After the trace is complete, ioprofile processes the results by parsing through the output and converting it into an intermediate format that's easier to manipulate: one line of output per function call that was captured by strace, which has information such as the process ID, size, elapsed time, and filename of the call. The filenames are initially gathered from lsof, and correlated with file descriptor numbers; thereafter, any new files the process opens will be possible to correlate by looking at the arguments to the open system call and gathering the filename from that.

After making the information into an easy-to-process format, ioprofile passes it through an aggregator, which is sort of the equivalent of an SQL GROUP BY query. By default, it aggregates the calls by filename, with one column per function, and the sum of the elapsed time in the cells.

ioprofile aggregates a list of specific I/O calls. This list is hard-coded into the tool, and is currently any call that matches the regular expression /read|write|sync|open|close|getdents|seek/. If this list needs to be expanded, please file a bug report.

Example Usage

Here is an example of the tool's default output, on a sample file that you can find in the Subversion repository:

$ ioprofile t/samples/ioprofile-001.txt total pread read pwrite write filename 10.094264 10.094264 0.000000 0.000000 0.000000 /data/data/abd_2dia/aia_227_228.ibd 8.356632 8.356632 0.000000 0.000000 0.000000 /data/data/abd_2dia/aia_227_223.ibd 0.048850 0.046989 0.000000 0.001861 0.000000 /data/data/abd/aia_instances.ibd 0.035016 0.031001 0.000000 0.004015 0.000000 /data/data/abd/vo_difuus.ibd 0.013360 0.000000 0.001723 0.000000 0.011637 /var/log/mysql/mysql-relay.002113 0.008676 0.000000 0.000000 0.000000 0.008676 /data/data/master.info 0.002060 0.000000 0.000000 0.002060 0.000000 /data/data/ibdata1 0.001490 0.000000 0.000000 0.001490 0.000000 /data/data/ib_logfile1 0.000555 0.000000 0.000000 0.000000 0.000555 /var/log/mysql/mysql-relay-log.info 0.000141 0.000000 0.000000 0.000141 0.000000 /data/data/ib_logfile0 0.000100 0.000000 0.000000 0.000100 0.000000 /data/data/abd/9fvus.ibd

This output is sorted in descending order by the leftmost column. It should be fairly self-explanatory. Let's see a few different ways we can process the same dataset. Let's aggregate by count of operations instead of by elapsed time, so we can see how many times each function was executed on each file:

$ ioprofile -c count t/samples/ioprofile-001.txt total pread read pwrite write filename 4282 4282 0 0 0 /data/data/abd_2dia/aia_227_223.ibd 2713 2713 0 0 0 /data/data/abd_2dia/aia_227_228.ibd 390 0 47 0 343 /var/log/mysql/mysql-relay.002113 343 0 0 0 343 /data/data/master.info 30 8 0 22 0 /data/data/abd/vo_difuus.ibd 19 7 0 12 0 /data/data/abd/aia_instances.ibd 16 0 0 16 0 /data/data/ib_logfile1 16 0 0 0 16 /var/log/mysql/mysql-relay-log.info 6 0 0 6 0 /data/data/ibdata1 1 0 0 1 0 /data/data/ib_logfile0 1 0 0 1 0 /data/data/abd/9fvus.ibd

Interesting that the #1 time consumer isn't the #1 in terms of number of operations, isn't it? We can re-examine the times, showing the average time per call instead of the sum of times:

$ ioprofile -a avg t/samples/ioprofile-001.txt total pread read pwrite write filename 0.003721 0.003721 0.000000 0.000000 0.000000 /data/data/abd_2dia/aia_227_228.ibd 0.002571 0.006713 0.000000 0.000155 0.000000 /data/data/abd/aia_instances.ibd 0.001952 0.001952 0.000000 0.000000 0.000000 /data/data/abd_2dia/aia_227_223.ibd 0.001167 0.003875 0.000000 0.000182 0.000000 /data/data/abd/vo_difuus.ibd 0.000343 0.000000 0.000000 0.000343 0.000000 /data/data/ibdata1 0.000141 0.000000 0.000000 0.000141 0.000000 /data/data/ib_logfile0 0.000100 0.000000 0.000000 0.000100 0.000000 /data/data/abd/9fvus.ibd 0.000093 0.000000 0.000000 0.000093 0.000000 /data/data/ib_logfile1 0.000035 0.000000 0.000000 0.000000 0.000035 /var/log/mysql/mysql-relay-log.info 0.000034 0.000000 0.000037 0.000000 0.000034 /var/log/mysql/mysql-relay.002113 0.000025 0.000000 0.000000 0.000000 0.000025 /data/data/master.info

Another way to aggregate the data is to look at the size of the operations (in bytes), rather than the elapsed time:

$ ioprofile -c sizes t/samples/ioprofile-001.txt total pread read pwrite write filename 90800128 90800128 0 0 0 /data/data/abd_2dia/aia_227_223.ibd 52150272 52150272 0 0 0 /data/data/abd_2dia/aia_227_228.ibd 999424 0 0 999424 0 /data/data/ibdata1 638976 131072 0 507904 0 /data/data/abd/vo_difuus.ibd 327680 114688 0 212992 0 /data/data/abd/aia_instances.ibd 305263 0 149662 0 155601 /var/log/mysql/mysql-relay.002113 217088 0 0 217088 0 /data/data/ib_logfile1 22638 0 0 0 22638 /data/data/master.info 16384 0 0 16384 0 /data/data/abd/9fvus.ibd 1088 0 0 0 1088 /var/log/mysql/mysql-relay-log.info 512 0 0 512 0 /data/data/ib_logfile0

It's also possible to report by process ID (thread ID), instead of by filename:

$ ioprofile -g pid t/samples/ioprofile-001.txt total pread read pwrite write pid 9.580759 9.580759 0.000000 0.000000 0.000000 22782 8.187935 8.187935 0.000000 0.000000 0.000000 20974 0.300581 0.300581 0.000000 0.000000 0.000000 2370 0.181209 0.181209 0.000000 0.000000 0.000000 2369 0.088197 0.088197 0.000000 0.000000 0.000000 2366 0.081061 0.077990 0.001723 0.000793 0.000555 10013 0.038928 0.038928 0.000000 0.000000 0.000000 2373 0.036679 0.036679 0.000000 0.000000 0.000000 2372 0.020577 0.020577 0.000000 0.000000 0.000000 2371 0.020313 0.000000 0.000000 0.000000 0.020313 10012 0.010502 0.010502 0.000000 0.000000 0.000000 2368 0.005529 0.005529 0.000000 0.000000 0.000000 2367 0.002172 0.000000 0.000000 0.002172 0.000000 2375 0.002020 0.000000 0.000000 0.002020 0.000000 2374 0.001923 0.000000 0.000000 0.001923 0.000000 2385 0.001636 0.000000 0.000000 0.001636 0.000000 2377 0.000982 0.000000 0.000000 0.000982 0.000000 2378 0.000141 0.000000 0.000000 0.000141 0.000000 2365

Finally, you can aggregate by the entire dataset, so you simply get the function calls and the desired statistic, not broken out by filename or thread ID:

$ ioprofile -g all t/samples/ioprofile-001.txt 18.561144 TOTAL 18.528886 pread 0.020868 write 0.009667 pwrite 0.001723 read