Beware of computation in static initializer
It’s a quite common practice to prepare immutable data during class
initialization and save the results in static final
fields.
In fact, this is exactly what static initializers are designed for.
Here is a typical example that builds some static table at initialization time:
public class StaticExample {
static final long[] TABLE = new long[100_000_000];
static {
TABLE[0] = 0;
for (int i = 1; i < TABLE.length; i++) {
TABLE[i] = nextValue(TABLE[i - 1]);
}
}
private static long nextValue(long seed) {
return seed * 0x123456789L + 11;
}
...
}
On my laptop with JDK 11.0.1 static initializer fills the array of 100M elements in about 540 ms.
Now let’s simply remove static
and fill the array in the constructor.
public class NonStaticExample {
final long[] TABLE = new long[100_000_000];
{
TABLE[0] = 0;
for (int i = 1; i < TABLE.length; i++) {
TABLE[i] = nextValue(TABLE[i - 1]);
}
}
private static long nextValue(long seed) {
return seed * 0x123456789L + 11;
}
public static void main(String[] args) {
new NonStaticExample();
}
}
The constructor fills the similar array in 138 ms. Almost 4 times faster!
Why is static initializer slow?
This must be related to JIT compilation, so let’s run the test with
-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
443 75 3 StaticExample::<clinit> (45 bytes)
@ 34 StaticExample::nextValue (10 bytes) not inlineable
444 76 % 4 StaticExample::<clinit> @ 15 (45 bytes)
445 74 % 3 StaticExample::<clinit> @ 15 (45 bytes) made not entrant
@ 34 StaticExample::nextValue (10 bytes) failed initial checks
Oops… When compiling static initializer (called <clinit>
in a class file) both C1 and C2
failed to inline nextValue
method. Here we get to the first problem:
HotSpot does not inline methods of uninitialized classes.
The explicit check can be found in the source code.
Since invocation of <clinit>
is a part of class initialization procedure,
the class is not considered initialized while <clinit>
is running.
Surprise in recent JDK updates
Would you expect JDK updates 11.0.2 and 8u202 to fix the problem? Just try to run the above example. What took 540 ms on JDK 11.0.1 now lasted 60 seconds on JDK 11.0.2!
However, the output of -XX:+PrintCompilation
was the same as before,
<clinit>
was still compiled. What caused the dramatic slowdown then?
Time to engage async-profiler.
Most of CPU time is spent inside JVM runtime - SharedRuntime::resolve_static_call_C()
.
But why?
We’ve seen that class initialization is a complicated procedure which ensures that static initializer executes in a thread-safe manner at most once. However, there was a zero-day bug JDK-8215634 that allowed HotSpot JVM to invoke a static method in violation of JVMS. I explained this problem in detail on Stack Overflow.
The bug has been fixed in JDK 11.0.2 and 8u201, but at the cost of the terrible
performance degradation
Now if the class is uninitialized, the resolved invokestatic
target is not saved
in the constant pool cache, so each invocation of a static method needs to go
through the resolution procedure again and again.
Deoptimization knock-out
It sounds unbelievable, but the above slowdown is not even the worst one.
Let’s slightly modify the example by moving array update out of <clinit>
:
public class StaticExample {
static final long[] TABLE = new long[100_000_000];
static {
TABLE[0] = 0;
for (int i = 1; i < TABLE.length; i++) {
calcNextValue(i);
}
}
private static void calcNextValue(int index) {
TABLE[index] = TABLE[index - 1] * 0x123456789L + 11;
}
...
}
The algorithm hasn’t changed, right? Except that now it takes forever.
Or, to be precise, more than 20 minutes
Compilation log shows desperate attemts to compile the method, but they all eventually result in deoptimization and a fall back to the interpreter.
610 238 4 StaticExample::calcNextValue (21 bytes)
610 238 4 StaticExample::calcNextValue (21 bytes) made not entrant
611 239 4 StaticExample::calcNextValue (21 bytes)
611 239 4 StaticExample::calcNextValue (21 bytes) made not entrant
611 240 4 StaticExample::calcNextValue (21 bytes)
612 240 4 StaticExample::calcNextValue (21 bytes) made not entrant
612 241 4 StaticExample::calcNextValue (21 bytes)
612 241 4 StaticExample::calcNextValue (21 bytes) made not entrant
It turns out that access to a static field from a static method of uninitialized class may be an overwhelming obstacle for HotSpot compiler.
Will it be fixed anytime soon?
Yes, to some extent. The bug is known - JDK-8188133 and is addressed in OpenJDK 13 with a possibility to backport later to OpenJDK 11.
Unfortunately, the fix covers only one particular case when <clinit>
is the root method of the compilation.
It’s too easy to break the precondition if a hot loop moves from <clinit>
to some other method called by static initializer.
static {
prepareTable();
}
private static void prepareTable() {
TABLE[0] = 0;
for (int i = 1; i < TABLE.length; i++) {
calcNextValue(i);
}
}
Here prepareTable()
becomes the compilation root, and all the problems of
uninitialized class return back.
How to live with this knowledge then?
A good news - the workaround is pretty straightforward:
Just don’t do heavy computation in an uninitialized class directly.
If you put the computation logic in a helper class with no static initializer, it won’t suffer from performance penalty.
public class StaticExample {
static final long[] TABLE = Helper.prepareTable();
private static class Helper {
static long[] prepareTable() {
long[] table = new long[100_000_000];
for (int i = 1; i < table.length; i++) {
table[i] = nextValue(table[i - 1]);
}
return table;
}
static long nextValue(long seed) {
return seed * 0x123456789L + 11;
}
}
}
See also
UPDATE: For some reason I missed the recent post by Claes Redestad on the same topic. Sorry about that. I still think my article complements it with some interesting details.