Limit a VM bandwidth to test some BPF tools

While reading Brendan Gregg's book "BPF Performance Tools", I wanted to test the newly created tools.

In particular, I wanted to see if disk saturation issues could be detected with biosnoop, biotop, biolatency and/or bitesize. In this article, we will see how a Virtualbox VM can be configured to test these tools.

The test protocol

The test protocol will be quite simple. It will consist in the launch of a very common and simple Java application: Spring PetClinic. Before the application is started, a bio* command will be executed to monitor it.

The test will be run twice. A first time on an unthrottled machine. And a second time on a machine which has an I/O bandwidth limitation set to 1 MB/s.

Setup

Using VirtualBox' VBoxManage tool, it is possible to access many more VM configuration parameters than the ones available in VirtualBox GUI.

In this article, we will specifically focus on the two commands VBoxManage storageattach and VBoxManage bandwidthctl. With bandwidthctl, we are going to create a disk bandwidth limit that can be used by a VM. With storageattach, we are going to associate a VM disk to the limit we just created.

The two commands listed above must be run while the VM is stopped.

I have a VM called Archlinux ready to be used. It has a unique disk on which all the files are stored (OS, /home and /boot, let’s keep things simple). First, I need to create a limit. That limit will be created with a very high value so that it does not slow down the startup of the VM. Because, yes, I did try with a base value at 1 KB/s, and Grub took minutes just to load its own files.

VBoxManage bandwidthctl Archlinux add "Disk bandwidth limit" --type disk --limit 1000M

Then, the VM disk has to be associated with the limit.

VBoxManage storageattach Archlinux --storagectl "SATA" --port 0 --bandwidthgroup "Disk bandwidth limit"

To start the application, a simple java -jar target/*.jar will work. Between two tests, it will be necessary to cleanup the Linux cache with echo 3 | sudo tee /proc/sys/vm/drop_caches. Optionally, it will be possible to verify that the main jar is not in cache with the command vmtouch.

Test without limits

Before starting the application, let’s verify that the petclinic jar is not in cache.

[alarm@archlinux ~]$ vmtouch -v ./spring-petclinic/target/*.jar
./spring-petclinic/target/spring-petclinic-2.2.0.BUILD-SNAPSHOT.jar
[                                                            ] 0/11643

           Files: 1
     Directories: 0
  Resident Pages: 0/11643  0/45M  0%
         Elapsed: 0.000353 seconds

When the application is launched, we can see that the java process loads dozens of MB form the disk. This corresponds not only to the uberjar itself, but also to all the files needed by the JVM, including rt.jar. These I/O are performed with an excellent latency (<1ms on average).

[alarm@archlinux ~]$ sudo biotop -C -r 10
Tracing... Output every 1 secs. Hit Ctrl-C to end

[...]
22:44:29 loadavg: 0.22 0.35 0.35 4/159 4093

PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
4076   java             R 8   0   sda        497 31376.0   0.49
4076   C2 CompilerThre  R 8   0   sda         22  1632.0   0.59
4076   bash             R 8   0   sda         14   696.0   0.39
4076   C1 CompilerThre  R 8   0   sda          6   348.0   0.44
509    bash             R 8   0   sda          3   140.0   2.28

22:44:30 loadavg: 0.37 0.38 0.36 4/163 4097

PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
4076   java             R 8   0   sda        640  8792.0   0.48
4076   background-prei  R 8   0   sda        165  2044.0   0.51
4076   C2 CompilerThre  R 8   0   sda          1   124.0   0.57

22:44:31 loadavg: 0.37 0.38 0.36 4/162 4098

PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
4076   java             R 8   0   sda        391  4100.0   0.52
4076   background-prei  R 8   0   sda        254  3908.0   0.49
4076   Thread-0         R 8   0   sda         37   316.0   0.45

22:44:32 loadavg: 0.37 0.38 0.36 4/162 4098

PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
4076   java             R 8   0   sda        313  3480.0   0.59

Once the application is stopped, we can see that the petclinic jar is in the FS cache. We can therefore either use vmtouch -e or echo 3 | sudo tee /proc/sys/vm/drop_caches to get it out. Since I want to have a much I/O as possible, I choose to empty the cache entirely.

[alarm@archlinux ~]$ vmtouch -v ./spring-petclinic/target/*.jar
./spring-petclinic/target/spring-petclinic-2.2.0.BUILD-SNAPSHOT.jar
[ooooOooOooooOoOoOOOOOOooooOooOOooooooooOooooooooooOooooooOoO] 8777/11643

           Files: 1
     Directories: 0
  Resident Pages: 8777/11643  34M/45M  75.4%
         Elapsed: 0.000901 seconds

Test with limits

This time, I configure the VM so that it has a 1 MB/s disk bandwidth limit. I did try with smaller values, but it sometimes hang the VM entirely for a few minutes. It seems that the background tasks are noisy enough to sometimes reach that limit. Anyway, 1 MB/s is a good value, given that petclinic’s uberjar takes 45 MB.

So let’s run the command below.

VBoxManage bandwidthctl Archlinux set "Disk bandwidth limit" --limit 1M

This time, the result is very different. I can already see a loading time much longer than before. Without formally measuring it, I can already confirm that the bandwidth limit works as expected.

But what’s even more telling is the output from biotop. We can clearly see that the java process cannot read files faster than 1 MB/s. And not only that, we can also see that to read this MB, the average latency is between 50 and 100ms. That is between 50x and 100x longer than in the first run.

We can see how biotop can detect a disk saturation issue.

[alarm@archlinux ~]$ sudo biotop -C -r 10
Tracing... Output every 1 secs. Hit Ctrl-C to end

[...]
22:32:14 loadavg: 0.49 0.43 0.27 2/138 3933

PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
3932   java             R 8   0   sda          9  1060.0 110.47

22:32:15 loadavg: 0.49 0.43 0.27 2/138 3933

PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
3932   java             R 8   0   sda         17  1020.0  59.48

22:32:16 loadavg: 0.49 0.43 0.27 7/138 3933

PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
3932   java             R 8   0   sda         11   996.0  90.63

22:32:17 loadavg: 0.49 0.43 0.27 6/143 3938

PID    COMM             D MAJ MIN DISK       I/O  Kbytes  AVGms
3932   java             R 8   0   sda         16   980.0  62.48

Some surprises / Lessons learned

I initially planned to test three other BPF commands. But it turned out to not work as expected.

The biosnoop command lists every file that is opened. When a JVM starts, it opens so many files that the output is too noisy to be usable.

The bitesize command shows the size of requested I/O. It does not take the actual bandwidth of these I/O into account. So it is designed to investigate another problem than the one I was studying.

Finally, the biolatency command was interesting, but it emphasizes the disk latency, as its name suggests. In my previous listing, we can see that disk latency was indeed higher. But given it is the consequence of a throughput problem, I think biotop is better suited for this job.

Digging deeper

The VBoxManage bandwidthctl can also be used to create limits on network cards. It can be used to test other new tools like soconnect and detect network congestions.

The BPF Performance Tools book is full of new tools. I am curious to see how test cases can be designed to experiment on each one of them.


If you have any question/comment, feel free to send me a tweet at @pingtimeout. And if you enjoyed this article and want to support my work, you can always buy me a coffee ☕️.