vendredi 26 avril 2013

WANTED : Stacktraces

Lately, I ran into a situation on my production environment : I was getting NullPointerExceptions that did not have any stacktrace. In this article, I will describe what happened and analyse the cause of such a strange behaviour. This article is inspired by Andrey Vityuk's blog, where I found the first pieces of information I needed to fully understand the problem, so, thanks Andrey !

The problem

Until then, you were having a day just like every other one, getting rid of bugs in a production system, one by one. Everything else being done, you open that old ticket that mentions NullPointerException. Fine, usual business, seems not critical, and you think "let's just grab the production logs of yesterday !".

But then, you realise that all you have are lines like these :
2012-12-12 15:42:01 [main] ERROR Exception during feature xyz
java.lang.NullPointerException
2012-12-12 15:47:22 [main] ERROR Exception during feature xyz
java.lang.NullPointerException
2012-12-12 15:48:00 [main] ERROR Exception during feature xyz
java.lang.NullPointerException


The first thing you think is "Did the developper really write log.error("Error during feature xyz\n" + e.getClass()) ?". And also "How am I supposed to solve this bug without knowing where the NPE is thrown ?".

So you open the java file that contains the feature xyz and there, you see something that you did not expect :
log.error("Exception during feature xyz", e);


Your logging framework is correctly used, nothing to report here, but then why do you have no stacktrace ?

Guess what ? It's Hotspot's fault !

Fortunately, you find Andrey Vityuk's blog and discover that Hotspot can simply throw Exceptions that do not have stacktraces for performance sake.

Indeed, if Hotspot detects that you have a hot method that throws exceptions, it can decide to remove the stacktrace so that the exception throwing mechanism is faster. Some people might call this "a bug", I won't.

We should always remember that Hotspot job is to make our code run faster. Period.

When Hotspot optimises code, it rewrites that code, it merges methods, swaps instructions and compiles it to native code. Once this is done, you don't have anything that look like your Java classes. Unfortunately, this comes at a very high price when Exceptions starts to mess around since Hotspot has to do the exact opposite work to rebuild the consistent stacktrace that you are expecting.

By definition, Exceptions are exceptional (or at least, they should be). In some code bases, we can find exceptions that are used to implement business cases. This can for sure be considered as a code smell.

The code

Consider the following code :

public class StackTraceGenerator {
    public static void main(String[] args) {
        for (int i = 0; i < 16_000; i++) {
            System.err.print("iteration " + i + ", ");
            throwAndCatchNullPointerException();
        }
    }

    private static void throwAndCatchNullPointerException() {
        try {
            throwNPE();
        } catch (NullPointerException e) {
            System.err.print(e.hashCode() + ", ");
            e.printStackTrace();
        }
    }

    private static int throwNPE() {
        return ((Object) null).hashCode();
    }
}


If you run this code with C2 (-server), you may eventually see that it generates exceptions without stacktraces. On my machine, it happens around iteration 10,050.

$ java -server fr.pingtimeout.StackTraceGenerator
iteration 0, 1442002549, java.lang.NullPointerException
        at fr.pingtimeout.StackTraceGenerator.throwNPE(StackTraceGenerator.java:27)
        at fr.pingtimeout.StackTraceGenerator.throwAndCatchNullPointerException(StackTraceGenerator.java:19)
        at fr.pingtimeout.StackTraceGenerator.main(StackTraceGenerator.java:13)
[...]
iteration 10015, 533361599, java.lang.NullPointerException
        at fr.pingtimeout.StackTraceGenerator.throwNPE(StackTraceGenerator.java:27)
        at fr.pingtimeout.StackTraceGenerator.throwAndCatchNullPointerException(StackTraceGenerator.java:19)
        at fr.pingtimeout.StackTraceGenerator.main(StackTraceGenerator.java:13)
iteration 10016, 27836516, java.lang.NullPointerException
iteration 10017, 27836516, java.lang.NullPointerException
[...]


Exceptions without Stacktraces, behind the scenes

In the print statement, I included the hashCode of the exception that was thrown. This is completely intended since it shows that after the optimisation is applied, the same exception is thrown for each call to throwNPE.

Indeed, the way Hotspot optimises this code is the following :
  • Create an instance of the exception that should be thrown
  • Clear its stacktrace
  • Cache this exception
  • Throw this exception instead of creating a new one for future calls


So hotspot does not only truncate stacktrace but also optimises the instance creation. At first, I thought that it was an optimisation in the way try/catch blocks work under heavy pressure, but it is not the case. Simply put, this optimisation consists in a singleton for a given exception, the rest of the code stays identical. Main gain : Hotspot does not have to rebuild a new stacktrace from optimised code each time the exception is thrown.

This behaviour is triggered by the JVM flag OmitStackTraceInFastThrow which is activated by default. We could get rid of it using -XX:-OmitStackTraceInFastThrow but I would not do that.

Final thoughts

This exception was the result of a coding mistake and happened enough time so that hotspot optimised it. It should be easily reproducible on any machine and be fixed right away.

We should not reduce the performance of a whole system in order for a bug to be easily fixed.

Ever faced this situation ? Another war story to share ? Let's discuss it in the comments !

1 commentaire:

  1. Good research Pierre!
    From my point of view this optimisation itself is a really useful. Thought, it is a bad smell to build you logic on exceptions , but so many frameworks/applications out there rely on exceptions for validation or some sort of it. It's unlikely we can completely get rid of that.
    In production system I would rather monitor logs found errors earlier before they even get optimized.

    RépondreSupprimer