mardi 27 août 2013

How to not measure memory impact ?

Today, I was asked on Twitter to have a look at this article by Brice LEPORINI and provide feedback. This could not fit in a "comments" section so I wrote an article. Here it goes !

The experiment

Brice wrote an experiment to calculate the cost of going immutable and comparing it versus being mutable.

Although very risky, he emphasizes that his results cannot be used to draw bad/flame/godwin conclusions like "multable is soooo better that immutable, you functional suckers !". There are a lot of flawed benchmark on the internet that claim to be the only truth, so let's also rejoice when experiments reports try to be objective, complete and come with a disclaimer about their limits.

My specs

I ran the code on my laptop, without any application running in the background (not even a X server) to make sute the system was quiet. I used OpenJDK 1.7.0_40. My CPU is a quad-core Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz.

Also, my memory bandwidth is about 2.5GB/s. This is an important information.

The attempt

I ran the test with the following parameters :
  • Parallel GC, PrintGCDetails, PrintTenuringDistribution, Xms1g, Xmx1g
  • Parallel GC, PrintGCDetails, PrintTenuringDistribution, Xms6g, Xmx6g, Xmn5g
  • CMS, PrintGCDetails, PrintTenuringDistribution, Xms1g, Xmx1g
  • CMS, PrintGCDetails, PrintTenuringDistribution, Xms6g, Xmx6g, Xmn5g
  • G1, PrintGCDetails, PrintTenuringDistribution, Xms1g, Xmx1g
  • G1, PrintGCDetails, PrintTenuringDistribution, Xms6g, Xmx6g, Xmn5g

CMS and G1 runs were only for science ;-)

The results

Here is the table of results that I built from this runs, sorted by full average :

TestMemoryGCIterationsMaxMinSampledAvgFullAvg
ImmutabilityTest6mx 6ms 5mnG11000038612116418031361
ImmutabilityTest6mx 6ms 5mnParallel1000011813109316051217
MutabilityTest6mx 6ms 5mnG11000013058102713101180
ImmutabilityTest6mx 6ms 5mnCMS1000012182108116031171
ImmutabilityTest1mx 1msCMS1000013522103115861101
ImmutabilityTest1mx 1msParallel1000012756102316131084
MutabilityTest6mx 6ms 5mnParallel10000315779411336997
MutabilityTest1mx 1msParallel1000062898691072895
MutabilityTest1mx 1msCMS1000096048601068893
MutabilityTest6mx 6ms 5mnCMS100006518770973819

For each test, two files were generated :
  • One file that has the console output (run-xxxxx)
  • One file for the GC log

The file names follow this convention : <TYPE>-<GC>-<MX>-<MS>-<MN?>.log where
  • TYPE is "run" or "gc", depending on the log
  • GC is "parallel", "cms" or "g1", depending on the GC that was used
  • MX is the max heap size in GB
  • MS is the min heap size in GB
  • MN is the young generation size in GB
They are available on this Github repository.

So, what can we conclude from these results ? Well, Not much, I'm afraid.

There are lots of problems with this experiment on my machine. One of those is that we are measuring things that take about 1 millisecond. This resolution is very very very low, and we cannot see significant differences at this timescale.

Also, having a look at the GC logs show that the GC durations, for PSYoungGen and ParNew are spread between 1ms and 10ms. Say it differently : a GC cycle take between 1 and 10 times the duration of an iteration. Therefore a GC cycle would have a huuuge impact of our microbenchmark.

Now, the real stuff. If we look at the "Parallel GC, 1mx, 1ms" GC log, we can see two different phases in memory utilization. Those two phases are the immutable test first and then the mutable test.


Now, if we look at the allocation rate, things get really interresting.





See that almost flat cloud of dots at ~2.5GB/s ? This means that at that time, the application was creating that much garbage. At the beggining of this article, I mentionned that my memory bandwidth was about 2.5GB/s, now you can see it clearly.

The ImmutabilityTest is actually limited by the size of the wires between my CPU and my memory chips. To say this differently, the ImmutabilityTest does not measure how fast we can create Immutable dates holder, it measure how fast the data can go from my CPU to my RAM.

Therefore, all my ImmutabilityTest results in the previous table are invalid.

Final Thoughts

It is quite often for microbenchmark not to measure different things than what we expect. This is another example of such misconceptions : the results were quite stable, they let us be confident of the experiment and yet were completely wrong.

I think Brice had noticed the limits of this benchmark, but did not push the analyze to its end. I hope this article added some clarifications and will give more materials for future experiments.

Btw, I did not mention CMS nor G1 because CMS never ran and because G1 needed to be tuned a bit. This could also be an interresting topic :-)

mardi 13 août 2013

GC log format change

Recently, I noticed that System.gc() calls are reported differently in GC logs. This articles reports the changes that happened and the evolution of GC logs format for System.gc() calls.

Disclaimer

System.gc() calls are bad (like in "baaaaaaad"). However, they are almost easy to detect so it did not took long to fix them.

This article is mostly about 1.7.0_40, even if other versions are referenced.

Hotspot GC logs format is always subject to change without notice. The information given here may not be correct for previous or future versions of Hotspot JVM.

System.gc() detection

To find if you have System.gc() calls, it is easy. You just have to grep your GC logs on "Full GC (System)". If it matches, then you have some of those nasty calls. If it does not, you are clean.

But using a recent JVM, things gets complicated. Consider the following program :

public class Test {
    public static void main(String... args) {
        System.gc();
    }
}

Run it with an old JVM and you will get a classic "Full GC (System)" line in the logs.

$ java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

$ java -verbose:gc -XX:+PrintGCDetails Test
[GC [PSYoungGen: 629K->256K(36736K)] 629K->256K(120768K), 0.0007280 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
[Full GC (System) [PSYoungGen: 256K->0K(36736K)] [PSOldGen: 0K->120K(84032K)] 256K->120K(120768K) [PSPermGen: 2566K->2566K(21248K)], 0.0028790 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 

But with the current latest stable version, System.gc() calls are reported as regular full GC.

$ java -version
java version "1.7.0_40"
OpenJDK Runtime Environment (IcedTea 2.4.1) (ArchLinux build 7.u40_2.4.1-2-x86_64)
OpenJDK 64-Bit Server VM (build 24.0-b50, mixed mode)

$ java -verbose:gc -XX:+PrintGCDetails Test 
[GC [PSYoungGen: 634K->352K(36864K)] 634K->352K(120832K), 0.0018710 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[Full GC [PSYoungGen: 352K->0K(36864K)] [ParOldGen: 0K->226K(83968K)] 352K->226K(120832K) [PSPermGen: 2726K->2725K(21504K)], 0.0076090 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 

Things get even more tricky : in that particular case, you can deduce that this was a System.gc() calls :
  • It cannot be triggered by heap over-utilization since the old gen occupancy was empty (0 KB)
  • It cannot be triggered by PermGen overflow since the perm gen was not full (2.7 MB)
  • It cannot be triggered by heap fragmentation since the old gen was empty, the parallel GC were used and the heap did not grow
  • It could be triggered by an oversized object allocation since the heap occupancy did not increased

However in the general case, the old gen is rarely completely empty so we should generally not be able to find System.gc() calls that way.

What about CMS ?

Using CMS, we have a bit more information.

$ java -verbose:gc -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC Test 
[Full GC[CMS: 0K->227K(84032K), 0.0111990 secs] 673K->227K(121856K), [CMS Perm : 2727K->2726K(21248K)], 0.0112790 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 

A "Full GC" line with CMS means promotion failure, concurrent mark failure or System.gc(). On recent JVM, Full GC logs based on promotion failure contain the string "promotion failure". As well, Full GC logs caused by CMF contains the string "concurrent mode failure".

In this particular case and particular JVM version, we can infer that this line is a call to System.gc().

What about G1 ?

Strangely enough, G1 is the most easy GC to deal with, in this situation. The output is the following :

$ java -verbose:gc -XX:+PrintGCDetails -XX:+UseG1GC Test 
[Full GC 323K->226K(7168K), 0.0222030 secs]
   [Eden: 1024.0K(24.0M)->0.0B(3072.0K) Survivors: 0.0B->0.0B Heap: 323.0K(124.0M)->226.6K(7168.0K)]
 [Times: user=0.01 sys=0.01, real=0.02 secs] 

Usually, G1 logs with details comes with a LOT of information. Here, so little data must mean that this is not a regular young or mixed collection. This is definitely a System.gc() call.

What about Java 8 ?

This pattern is not the same for Java 8 VM. Indeed, such calls will be reported with the string "System.gc()".

$ java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b99)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b41, mixed mode)

With the parallel GCs :

$ java -verbose:gc -XX:+PrintGCDetails Test
[GC (System.gc()) [PSYoungGen: 634K->352K(36864K)] 634K->352K(120832K), 0.0185000 secs] [Times: user=0.01 sys=0.10, real=0.02 secs] 
[Full GC (System.gc()) [PSYoungGen: 352K->0K(36864K)] [ParOldGen: 0K->252K(83968K)] 352K->252K(120832K), [Metaspace: 2585K->2585K(110592K)], 0.0037170 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 

With CMS :

$ java -verbose:gc -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC Test
[Full GC (System.gc()) [CMS: 0K->258K(84032K), 0.0108710 secs] 673K->258K(121856K), [Metaspace: 2585K->2585K(110592K)], 0.0114610 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 

With G1 :

$ java -verbose:gc -XX:+PrintGCDetails -XX:+UseG1GC Test
[Full GC (System.gc())  372K->252K(7168K), 0.0146070 secs]
   [Eden: 1024.0K(24.0M)->0.0B(3072.0K) Survivors: 0.0B->0.0B Heap: 372.4K(124.0M)->253.0K(7168.0K)], [Metaspace: 2585K->2585K(110592K)]
 [Times: user=0.01 sys=0.01, real=0.01 secs] 


Final Thoughts

Now, you know that you have to look even more closely at GC logs to find suspicious activity like Full GC calls that look like System.gc() calls.

Btw, life must be really hard for the folks that write GC logs parsers. It surely takes time to keep up with all that changes !

samedi 10 août 2013

Resampling the audio track of a video

MOOC platforms like Coursera and MIT OpenCourseWare are fantastic. But sometimes the video provided are not really audible, especially when you are in noisy public transportation. This article shows how to solve this issue.

The issue

Take any course from MIT OCW. It is likely coming from a classroom, with some echo, little noise. The problem is that when you are in a bus and want to watch your video lessons, you are very easily disturbed by the environment (horns, brakes noises, people speaking loudly).

How can we transform a video so that the volume is higher ? ffmpeg+sox is a very efficient combination for that.

The process

First, we have to extract the audio track from the video, then amplify the volume, and finally rebuild the whole video using the amplified audio track.

Here, I assume that the file you want to modify is stored in ${video}.

To extract the audio track in a file named "${video}.wav", use this command :
ffmpeg -i $video ${video}.wav

Then, find the maximum amplification you can use without introducing noise :
sox ${video}.wav -n stat 2>&1 | grep "Volume adjustment"

This will return, for instance, "Volume adjustment : 2.041".

Use this value to amplify the audio track :
sox -v  ${video}.wav ${video}.louder.wav

Finally, rebuild the video using the amplified audio track :
ffmpeg -i $video -i ${video}.louder.wav \
  -map 0:0 -map 1:0 \
  -strict -2 \
  -acodec aac -ab 96k -vcodec copy resampled.${video}
Note that you may need to use the audio codec "libfaac", depending on your version of ffmpeg and how it was built. In that case, drop the parameter "-scrict -2" with is only required to enable the experimental audio codec "aac"

That's it. Now you can clean the temporary files that were generated :
rm ${video}.wav ${video}.louder.wav
mv ${video} ${video}.old
mv resampled.${video} ${video}

Final thoughts

That was simple, now you can enjoy/learn without being affected with a too-low volume problem.

Any other issue with MOOC or video encoding/decoding/resampling ? Let's discuss about that in the comments !