It’s a quite common practice to prepare immutable data during class initialization and save the results in static final fields. In fact, this is exactly what static initializers are designed for.

Here is a typical example that builds some static table at initialization time:

public class StaticExample {
    static final long[] TABLE = new long[100_000_000];

    static {
        TABLE[0] = 0;
        for (int i = 1; i < TABLE.length; i++) {
            TABLE[i] = nextValue(TABLE[i - 1]);
        }
    }

    private static long nextValue(long seed) {
        return seed * 0x123456789L + 11;
    }

    ...
}

On my laptop with JDK 11.0.1 static initializer fills the array of 100M elements in about 540 ms.

Now let’s simply remove static and fill the array in the constructor.

public class NonStaticExample {
    final long[] TABLE = new long[100_000_000];

    {
        TABLE[0] = 0;
        for (int i = 1; i < TABLE.length; i++) {
            TABLE[i] = nextValue(TABLE[i - 1]);
        }
    }

    private static long nextValue(long seed) {
        return seed * 0x123456789L + 11;
    }

    public static void main(String[] args) {
        new NonStaticExample();
    }
}

The constructor fills the similar array in 138 ms. Almost 4 times faster!

Why is static initializer slow?

This must be related to JIT compilation, so let’s run the test with
-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

   443   75     3   StaticExample::<clinit> (45 bytes)
                       @ 34   StaticExample::nextValue (10 bytes)   not inlineable
   444   76 %   4   StaticExample::<clinit> @ 15 (45 bytes)
   445   74 %   3   StaticExample::<clinit> @ 15 (45 bytes)   made not entrant
                       @ 34   StaticExample::nextValue (10 bytes)   failed initial checks

Oops… When compiling static initializer (called <clinit> in a class file) both C1 and C2 failed to inline nextValue method. Here we get to the first problem:

HotSpot does not inline methods of uninitialized classes.

The explicit check can be found in the source code. Since invocation of <clinit> is a part of class initialization procedure, the class is not considered initialized while <clinit> is running.

Surprise in recent JDK updates

Would you expect JDK updates 11.0.2 and 8u202 to fix the problem? Just try to run the above example. What took 540 ms on JDK 11.0.1 now lasted 60 seconds on JDK 11.0.2! :open_mouth:

However, the output of -XX:+PrintCompilation was the same as before, <clinit> was still compiled. What caused the dramatic slowdown then? Time to engage async-profiler.

Your browser does not support SVG

Most of CPU time is spent inside JVM runtime - SharedRuntime::resolve_static_call_C(). But why?

We’ve seen that class initialization is a complicated procedure which ensures that static initializer executes in a thread-safe manner at most once. However, there was a zero-day bug JDK-8215634 that allowed HotSpot JVM to invoke a static method in violation of JVMS. I explained this problem in detail on Stack Overflow.

The bug has been fixed in JDK 11.0.2 and 8u201, but at the cost of the terrible performance degradation :disappointed:
Now if the class is uninitialized, the resolved invokestatic target is not saved in the constant pool cache, so each invocation of a static method needs to go through the resolution procedure again and again.

Deoptimization knock-out

It sounds unbelievable, but the above slowdown is not even the worst one. Let’s slightly modify the example by moving array update out of <clinit>:

public class StaticExample {
    static final long[] TABLE = new long[100_000_000];

    static {
        TABLE[0] = 0;
        for (int i = 1; i < TABLE.length; i++) {
            calcNextValue(i);
        }
    }

    private static void calcNextValue(int index) {
        TABLE[index] = TABLE[index - 1] * 0x123456789L + 11;
    }

    ...
}

The algorithm hasn’t changed, right? Except that now it takes forever.
Or, to be precise, more than 20 minutes :sleeping::sleeping::sleeping:

Compilation log shows desperate attemts to compile the method, but they all eventually result in deoptimization and a fall back to the interpreter.

   610  238   4   StaticExample::calcNextValue (21 bytes)
   610  238   4   StaticExample::calcNextValue (21 bytes)   made not entrant
   611  239   4   StaticExample::calcNextValue (21 bytes)
   611  239   4   StaticExample::calcNextValue (21 bytes)   made not entrant
   611  240   4   StaticExample::calcNextValue (21 bytes)
   612  240   4   StaticExample::calcNextValue (21 bytes)   made not entrant
   612  241   4   StaticExample::calcNextValue (21 bytes)
   612  241   4   StaticExample::calcNextValue (21 bytes)   made not entrant

It turns out that access to a static field from a static method of uninitialized class may be an overwhelming obstacle for HotSpot compiler.

Will it be fixed anytime soon?

Yes, to some extent. The bug is known - JDK-8188133 and is addressed in OpenJDK 13 with a possibility to backport later to OpenJDK 11.

Unfortunately, the fix covers only one particular case when <clinit> is the root method of the compilation. It’s too easy to break the precondition if a hot loop moves from <clinit> to some other method called by static initializer.

    static {
        prepareTable();
    }
    
    private static void prepareTable() {
        TABLE[0] = 0;
        for (int i = 1; i < TABLE.length; i++) {
            calcNextValue(i);
        }
    }

Here prepareTable() becomes the compilation root, and all the problems of uninitialized class return back.

How to live with this knowledge then?

A good news - the workaround is pretty straightforward:

Just don’t do heavy computation in an uninitialized class directly.

If you put the computation logic in a helper class with no static initializer, it won’t suffer from performance penalty.

public class StaticExample {
    static final long[] TABLE = Helper.prepareTable();

    private static class Helper {

        static long[] prepareTable() {
            long[] table = new long[100_000_000];
            for (int i = 1; i < table.length; i++) {
                table[i] = nextValue(table[i - 1]);
            }
            return table;
        }

        static long nextValue(long seed) {
            return seed * 0x123456789L + 11;
        }
    }
}

See also

UPDATE: For some reason I missed the recent post by Claes Redestad on the same topic. Sorry about that. I still think my article complements it with some interesting details.