Limit a VM bandwidth to test some BPF tools
While reading Brendan Gregg's book "BPF Performance Tools", I wanted to test the newly created tools.
In particular, I wanted to see if disk saturation issues could be detected with biosnoop
, biotop
, biolatency
and/or bitesize
.
In this article, we will see how a Virtualbox VM can be configured to test these tools.
The test protocol
The test protocol will be quite simple.
It will consist in the launch of a very common and simple Java application: Spring PetClinic.
Before the application is started, a bio*
command will be executed to monitor it.
The test will be run twice. A first time on an unthrottled machine. And a second time on a machine which has an I/O bandwidth limitation set to 1 MB/s.
Setup
Using VirtualBox' VBoxManage
tool, it is possible to access many more VM configuration parameters than the ones available in VirtualBox GUI.
In this article, we will specifically focus on the two commands VBoxManage storageattach
and VBoxManage bandwidthctl
.
With bandwidthctl
, we are going to create a disk bandwidth limit that can be used by a VM.
With storageattach
, we are going to associate a VM disk to the limit we just created.
The two commands listed above must be run while the VM is stopped. |
I have a VM called Archlinux
ready to be used.
It has a unique disk on which all the files are stored (OS, /home
and /boot
, let’s keep things simple).
First, I need to create a limit.
That limit will be created with a very high value so that it does not slow down the startup of the VM.
Because, yes, I did try with a base value at 1 KB/s
, and Grub took minutes just to load its own files.
VBoxManage bandwidthctl Archlinux add "Disk bandwidth limit" --type disk --limit 1000M
Then, the VM disk has to be associated with the limit.
VBoxManage storageattach Archlinux --storagectl "SATA" --port 0 --bandwidthgroup "Disk bandwidth limit"
To start the application, a simple java -jar target/*.jar
will work.
Between two tests, it will be necessary to cleanup the Linux cache with echo 3 | sudo tee /proc/sys/vm/drop_caches
.
Optionally, it will be possible to verify that the main jar is not in cache with the command vmtouch
.
Test without limits
Before starting the application, let’s verify that the petclinic jar is not in cache.
[alarm@archlinux ~]$ vmtouch -v ./spring-petclinic/target/*.jar
./spring-petclinic/target/spring-petclinic-2.2.0.BUILD-SNAPSHOT.jar
[ ] 0/11643
Files: 1
Directories: 0
Resident Pages: 0/11643 0/45M 0%
Elapsed: 0.000353 seconds
When the application is launched, we can see that the java
process loads dozens of MB form the disk.
This corresponds not only to the uberjar itself, but also to all the files needed by the JVM, including rt.jar
.
These I/O are performed with an excellent latency (<1ms on average).
[alarm@archlinux ~]$ sudo biotop -C -r 10
Tracing... Output every 1 secs. Hit Ctrl-C to end
[...]
22:44:29 loadavg: 0.22 0.35 0.35 4/159 4093
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
4076 java R 8 0 sda 497 31376.0 0.49
4076 C2 CompilerThre R 8 0 sda 22 1632.0 0.59
4076 bash R 8 0 sda 14 696.0 0.39
4076 C1 CompilerThre R 8 0 sda 6 348.0 0.44
509 bash R 8 0 sda 3 140.0 2.28
22:44:30 loadavg: 0.37 0.38 0.36 4/163 4097
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
4076 java R 8 0 sda 640 8792.0 0.48
4076 background-prei R 8 0 sda 165 2044.0 0.51
4076 C2 CompilerThre R 8 0 sda 1 124.0 0.57
22:44:31 loadavg: 0.37 0.38 0.36 4/162 4098
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
4076 java R 8 0 sda 391 4100.0 0.52
4076 background-prei R 8 0 sda 254 3908.0 0.49
4076 Thread-0 R 8 0 sda 37 316.0 0.45
22:44:32 loadavg: 0.37 0.38 0.36 4/162 4098
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
4076 java R 8 0 sda 313 3480.0 0.59
Once the application is stopped, we can see that the petclinic jar is in the FS cache.
We can therefore either use vmtouch -e
or echo 3 | sudo tee /proc/sys/vm/drop_caches
to get it out.
Since I want to have a much I/O as possible, I choose to empty the cache entirely.
[alarm@archlinux ~]$ vmtouch -v ./spring-petclinic/target/*.jar
./spring-petclinic/target/spring-petclinic-2.2.0.BUILD-SNAPSHOT.jar
[ooooOooOooooOoOoOOOOOOooooOooOOooooooooOooooooooooOooooooOoO] 8777/11643
Files: 1
Directories: 0
Resident Pages: 8777/11643 34M/45M 75.4%
Elapsed: 0.000901 seconds
Test with limits
This time, I configure the VM so that it has a 1 MB/s disk bandwidth limit. I did try with smaller values, but it sometimes hang the VM entirely for a few minutes. It seems that the background tasks are noisy enough to sometimes reach that limit. Anyway, 1 MB/s is a good value, given that petclinic’s uberjar takes 45 MB.
So let’s run the command below.
VBoxManage bandwidthctl Archlinux set "Disk bandwidth limit" --limit 1M
This time, the result is very different. I can already see a loading time much longer than before. Without formally measuring it, I can already confirm that the bandwidth limit works as expected.
But what’s even more telling is the output from biotop
.
We can clearly see that the java
process cannot read files faster than 1 MB/s.
And not only that, we can also see that to read this MB, the average latency is between 50 and 100ms.
That is between 50x and 100x longer than in the first run.
We can see how biotop
can detect a disk saturation issue.
[alarm@archlinux ~]$ sudo biotop -C -r 10
Tracing... Output every 1 secs. Hit Ctrl-C to end
[...]
22:32:14 loadavg: 0.49 0.43 0.27 2/138 3933
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
3932 java R 8 0 sda 9 1060.0 110.47
22:32:15 loadavg: 0.49 0.43 0.27 2/138 3933
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
3932 java R 8 0 sda 17 1020.0 59.48
22:32:16 loadavg: 0.49 0.43 0.27 7/138 3933
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
3932 java R 8 0 sda 11 996.0 90.63
22:32:17 loadavg: 0.49 0.43 0.27 6/143 3938
PID COMM D MAJ MIN DISK I/O Kbytes AVGms
3932 java R 8 0 sda 16 980.0 62.48
Some surprises / Lessons learned
I initially planned to test three other BPF commands. But it turned out to not work as expected.
The biosnoop
command lists every file that is opened.
When a JVM starts, it opens so many files that the output is too noisy to be usable.
The bitesize
command shows the size of requested I/O.
It does not take the actual bandwidth of these I/O into account.
So it is designed to investigate another problem than the one I was studying.
Finally, the biolatency
command was interesting, but it emphasizes the disk latency, as its name suggests.
In my previous listing, we can see that disk latency was indeed higher.
But given it is the consequence of a throughput problem, I think biotop
is better suited for this job.
Digging deeper
The VBoxManage bandwidthctl
can also be used to create limits on network cards.
It can be used to test other new tools like soconnect
and detect network congestions.
The BPF Performance Tools book is full of new tools. I am curious to see how test cases can be designed to experiment on each one of them.
If you have any question/comment, feel free to send me a tweet at @pingtimeout. And if you enjoyed this article and want to support my work, you can always buy me a coffee ☕️.