While reading Brendan Gregg's book "BPF Performance Tools", I wanted to test the newly created tools.
In particular, I wanted to see if disk saturation issues could be detected with
In this article, we will see how a Virtualbox VM can be configured to test these tools.
The test protocol
The test protocol will be quite simple.
It will consist in the launch of a very common and simple Java application: Spring PetClinic.
Before the application is started, a
bio* command will be executed to monitor it.
The test will be run twice. A first time on an unthrottled machine. And a second time on a machine which has an I/O bandwidth limitation set to 1 MB/s.
VBoxManage tool, it is possible to access many more VM configuration parameters than the ones available in VirtualBox GUI.
In this article, we will specifically focus on the two commands
VBoxManage storageattach and
bandwidthctl, we are going to create a disk bandwidth limit that can be used by a VM.
storageattach, we are going to associate a VM disk to the limit we just created.
|The two commands listed above must be run while the VM is stopped.|
I have a VM called
Archlinux ready to be used.
It has a unique disk on which all the files are stored (OS,
/boot, let’s keep things simple).
First, I need to create a limit.
That limit will be created with a very high value so that it does not slow down the startup of the VM.
Because, yes, I did try with a base value at
1 KB/s, and Grub took minutes just to load its own files.
VBoxManage bandwidthctl Archlinux add "Disk bandwidth limit" --type disk --limit 1000M
Then, the VM disk has to be associated with the limit.
VBoxManage storageattach Archlinux --storagectl "SATA" --port 0 --bandwidthgroup "Disk bandwidth limit"
To start the application, a simple
java -jar target/*.jar will work.
Between two tests, it will be necessary to cleanup the Linux cache with
echo 3 | sudo tee /proc/sys/vm/drop_caches.
Optionally, it will be possible to verify that the main jar is not in cache with the command
Test without limits
Before starting the application, let’s verify that the petclinic jar is not in cache.
[alarm@archlinux ~]$ vmtouch -v ./spring-petclinic/target/*.jar ./spring-petclinic/target/spring-petclinic-2.2.0.BUILD-SNAPSHOT.jar [ ] 0/11643 Files: 1 Directories: 0 Resident Pages: 0/11643 0/45M 0% Elapsed: 0.000353 seconds
When the application is launched, we can see that the
java process loads dozens of MB form the disk.
This corresponds not only to the uberjar itself, but also to all the files needed by the JVM, including
These I/O are performed with an excellent latency (<1ms on average).
[alarm@archlinux ~]$ sudo biotop -C -r 10 Tracing... Output every 1 secs. Hit Ctrl-C to end [...] 22:44:29 loadavg: 0.22 0.35 0.35 4/159 4093 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 4076 java R 8 0 sda 497 31376.0 0.49 4076 C2 CompilerThre R 8 0 sda 22 1632.0 0.59 4076 bash R 8 0 sda 14 696.0 0.39 4076 C1 CompilerThre R 8 0 sda 6 348.0 0.44 509 bash R 8 0 sda 3 140.0 2.28 22:44:30 loadavg: 0.37 0.38 0.36 4/163 4097 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 4076 java R 8 0 sda 640 8792.0 0.48 4076 background-prei R 8 0 sda 165 2044.0 0.51 4076 C2 CompilerThre R 8 0 sda 1 124.0 0.57 22:44:31 loadavg: 0.37 0.38 0.36 4/162 4098 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 4076 java R 8 0 sda 391 4100.0 0.52 4076 background-prei R 8 0 sda 254 3908.0 0.49 4076 Thread-0 R 8 0 sda 37 316.0 0.45 22:44:32 loadavg: 0.37 0.38 0.36 4/162 4098 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 4076 java R 8 0 sda 313 3480.0 0.59
Once the application is stopped, we can see that the petclinic jar is in the FS cache.
We can therefore either use
vmtouch -e or
echo 3 | sudo tee /proc/sys/vm/drop_caches to get it out.
Since I want to have a much I/O as possible, I choose to empty the cache entirely.
[alarm@archlinux ~]$ vmtouch -v ./spring-petclinic/target/*.jar ./spring-petclinic/target/spring-petclinic-2.2.0.BUILD-SNAPSHOT.jar [ooooOooOooooOoOoOOOOOOooooOooOOooooooooOooooooooooOooooooOoO] 8777/11643 Files: 1 Directories: 0 Resident Pages: 8777/11643 34M/45M 75.4% Elapsed: 0.000901 seconds
Test with limits
This time, I configure the VM so that it has a 1 MB/s disk bandwidth limit. I did try with smaller values, but it sometimes hang the VM entirely for a few minutes. It seems that the background tasks are noisy enough to sometimes reach that limit. Anyway, 1 MB/s is a good value, given that petclinic’s uberjar takes 45 MB.
So let’s run the command below.
VBoxManage bandwidthctl Archlinux set "Disk bandwidth limit" --limit 1M
This time, the result is very different. I can already see a loading time much longer than before. Without formally measuring it, I can already confirm that the bandwidth limit works as expected.
But what’s even more telling is the output from
We can clearly see that the
java process cannot read files faster than 1 MB/s.
And not only that, we can also see that to read this MB, the average latency is between 50 and 100ms.
That is between 50x and 100x longer than in the first run.
We can see how
biotop can detect a disk saturation issue.
[alarm@archlinux ~]$ sudo biotop -C -r 10 Tracing... Output every 1 secs. Hit Ctrl-C to end [...] 22:32:14 loadavg: 0.49 0.43 0.27 2/138 3933 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 3932 java R 8 0 sda 9 1060.0 110.47 22:32:15 loadavg: 0.49 0.43 0.27 2/138 3933 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 3932 java R 8 0 sda 17 1020.0 59.48 22:32:16 loadavg: 0.49 0.43 0.27 7/138 3933 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 3932 java R 8 0 sda 11 996.0 90.63 22:32:17 loadavg: 0.49 0.43 0.27 6/143 3938 PID COMM D MAJ MIN DISK I/O Kbytes AVGms 3932 java R 8 0 sda 16 980.0 62.48
Some surprises / Lessons learned
I initially planned to test three other BPF commands. But it turned out to not work as expected.
biosnoop command lists every file that is opened.
When a JVM starts, it opens so many files that the output is too noisy to be usable.
bitesize command shows the size of requested I/O.
It does not take the actual bandwidth of these I/O into account.
So it is designed to investigate another problem than the one I was studying.
biolatency command was interesting, but it emphasizes the disk latency, as its name suggests.
In my previous listing, we can see that disk latency was indeed higher.
But given it is the consequence of a throughput problem, I think
biotop is better suited for this job.
VBoxManage bandwidthctl can also be used to create limits on network cards.
It can be used to test other new tools like
soconnect and detect network congestions.
The BPF Performance Tools book is full of new tools. I am curious to see how test cases can be designed to experiment on each one of them.
If you have any question/comment, feel free to send me a tweet at @pingtimeout. And if you enjoyed this article and want to support my work, you can always buy me a coffee ☕️.