Goodbye Perm-Gen, hello Metaspace

What went wrong

After a TIM release, servers started running out of memory - the JVM's memory usage increasing, with continuous GC runs

Investigation

The usual tedious investigation followed. It was easy to spin up a new production instance to test on, and create heap dumps.

From the other end, we knew what groups of changes were in the release: just needed to work back from each end and meet in the middle.

Meeting in the middle

The metaspace usage of the cyclic servers showed a huge change in the time period under investigation:

The what usage?

The permanent generation was removed from Hotspot in version 8 (JRockit never had it). Metaspace serves many of the same purposes, being used for classloaders and class data. It is a dynamically growing area, allocated natively.

The JVM flags for TIM still limited the perm gen size (which was ignored) but the metaspace here was able to grow without a limit, and consumed native memory beyond the JVM's heap slab.

Why?

Normally our permgen/metaspace usage is pretty stable. Our apps tend not to use the things that typically cause leaks: complicated app servers with specialised class loaders. Although note that even processing XSLT could cause this issue.

One of the changes in this release was to implement all TIM's system variable interfaces using lambdas, dynamically built from LambdaMetafactory.

(There's a whole other talk in here about how proxies and interface default methods don't get along)

But.... why?

Current hypothesis is that lambda classes aren't cached, unlike proxy classes. That is consistent with the heap dumps taken, there were far more lambda classes implementing the variables than there are variables.

Solutions and recommendations

Simply marking the system variable implementations as singletons fixed the issue. In general, something to be careful of if you've managed to find a reason for dealing with the pain of LambdaMetafactory.

Limit the metaspace size in the JVM:

-XX:MetaspaceSize=125m -XX:MaxMetaspaceSize=500m

Aftermath