mardi 13 août 2013

GC log format change

Recently, I noticed that System.gc() calls are reported differently in GC logs. This articles reports the changes that happened and the evolution of GC logs format for System.gc() calls.

Disclaimer

System.gc() calls are bad (like in "baaaaaaad"). However, they are almost easy to detect so it did not took long to fix them.

This article is mostly about 1.7.0_40, even if other versions are referenced.

Hotspot GC logs format is always subject to change without notice. The information given here may not be correct for previous or future versions of Hotspot JVM.

System.gc() detection

To find if you have System.gc() calls, it is easy. You just have to grep your GC logs on "Full GC (System)". If it matches, then you have some of those nasty calls. If it does not, you are clean.

But using a recent JVM, things gets complicated. Consider the following program :

public class Test {
    public static void main(String... args) {
        System.gc();
    }
}

Run it with an old JVM and you will get a classic "Full GC (System)" line in the logs.

$ java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

$ java -verbose:gc -XX:+PrintGCDetails Test
[GC [PSYoungGen: 629K->256K(36736K)] 629K->256K(120768K), 0.0007280 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
[Full GC (System) [PSYoungGen: 256K->0K(36736K)] [PSOldGen: 0K->120K(84032K)] 256K->120K(120768K) [PSPermGen: 2566K->2566K(21248K)], 0.0028790 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 

But with the current latest stable version, System.gc() calls are reported as regular full GC.

$ java -version
java version "1.7.0_40"
OpenJDK Runtime Environment (IcedTea 2.4.1) (ArchLinux build 7.u40_2.4.1-2-x86_64)
OpenJDK 64-Bit Server VM (build 24.0-b50, mixed mode)

$ java -verbose:gc -XX:+PrintGCDetails Test 
[GC [PSYoungGen: 634K->352K(36864K)] 634K->352K(120832K), 0.0018710 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[Full GC [PSYoungGen: 352K->0K(36864K)] [ParOldGen: 0K->226K(83968K)] 352K->226K(120832K) [PSPermGen: 2726K->2725K(21504K)], 0.0076090 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 

Things get even more tricky : in that particular case, you can deduce that this was a System.gc() calls :
  • It cannot be triggered by heap over-utilization since the old gen occupancy was empty (0 KB)
  • It cannot be triggered by PermGen overflow since the perm gen was not full (2.7 MB)
  • It cannot be triggered by heap fragmentation since the old gen was empty, the parallel GC were used and the heap did not grow
  • It could be triggered by an oversized object allocation since the heap occupancy did not increased

However in the general case, the old gen is rarely completely empty so we should generally not be able to find System.gc() calls that way.

What about CMS ?

Using CMS, we have a bit more information.

$ java -verbose:gc -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC Test 
[Full GC[CMS: 0K->227K(84032K), 0.0111990 secs] 673K->227K(121856K), [CMS Perm : 2727K->2726K(21248K)], 0.0112790 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 

A "Full GC" line with CMS means promotion failure, concurrent mark failure or System.gc(). On recent JVM, Full GC logs based on promotion failure contain the string "promotion failure". As well, Full GC logs caused by CMF contains the string "concurrent mode failure".

In this particular case and particular JVM version, we can infer that this line is a call to System.gc().

What about G1 ?

Strangely enough, G1 is the most easy GC to deal with, in this situation. The output is the following :

$ java -verbose:gc -XX:+PrintGCDetails -XX:+UseG1GC Test 
[Full GC 323K->226K(7168K), 0.0222030 secs]
   [Eden: 1024.0K(24.0M)->0.0B(3072.0K) Survivors: 0.0B->0.0B Heap: 323.0K(124.0M)->226.6K(7168.0K)]
 [Times: user=0.01 sys=0.01, real=0.02 secs] 

Usually, G1 logs with details comes with a LOT of information. Here, so little data must mean that this is not a regular young or mixed collection. This is definitely a System.gc() call.

What about Java 8 ?

This pattern is not the same for Java 8 VM. Indeed, such calls will be reported with the string "System.gc()".

$ java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b99)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b41, mixed mode)

With the parallel GCs :

$ java -verbose:gc -XX:+PrintGCDetails Test
[GC (System.gc()) [PSYoungGen: 634K->352K(36864K)] 634K->352K(120832K), 0.0185000 secs] [Times: user=0.01 sys=0.10, real=0.02 secs] 
[Full GC (System.gc()) [PSYoungGen: 352K->0K(36864K)] [ParOldGen: 0K->252K(83968K)] 352K->252K(120832K), [Metaspace: 2585K->2585K(110592K)], 0.0037170 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 

With CMS :

$ java -verbose:gc -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC Test
[Full GC (System.gc()) [CMS: 0K->258K(84032K), 0.0108710 secs] 673K->258K(121856K), [Metaspace: 2585K->2585K(110592K)], 0.0114610 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 

With G1 :

$ java -verbose:gc -XX:+PrintGCDetails -XX:+UseG1GC Test
[Full GC (System.gc())  372K->252K(7168K), 0.0146070 secs]
   [Eden: 1024.0K(24.0M)->0.0B(3072.0K) Survivors: 0.0B->0.0B Heap: 372.4K(124.0M)->253.0K(7168.0K)], [Metaspace: 2585K->2585K(110592K)]
 [Times: user=0.01 sys=0.01, real=0.01 secs] 


Final Thoughts

Now, you know that you have to look even more closely at GC logs to find suspicious activity like Full GC calls that look like System.gc() calls.

Btw, life must be really hard for the folks that write GC logs parsers. It surely takes time to keep up with all that changes !

Aucun commentaire:

Enregistrer un commentaire