Tuesday, July 21, 2020

Long Garbage Collection time on processing.. alert.

Recently our J2EE application converted to Springboot architecture. Although I 've mentioned here as Springboot architecture, it is not a complete Springboot powered application. Our front end is not MVC or a Rest service, it is JSF 2.X. We are not using embedded Tomcat, instead standalone Tomcat. 

Having all these in this application,  we ran the show around 2 weeks no issues, a good migration. Great work !!

But one fine day, Boom!. We received Long Garbage Collection alert. Tomcat restarted, everything looks ok, back to work. Nothing specifically found in the application logs. So we thought it is just one isolated case.

After two weeks, alert received again. No dead locks in thread dump. Wasn't able to get the memory dump since it takes at least 30 min to create the dump, where the system is  not available for the customers while obtaining heap dump. Issue was now happening almost daily. Day and midnight where there were no significant load in the system. We knew that this was not related to the load, yet something else triggering. 


Above is the pattern of garbage collection in Edan space (Young Gen). You can see, when the issue triggered, it was continuously calling for long time.



Above is old gen memory. You can see that it was growing slowly, and have started calling garbage collection later. To respect to the data, I had to mask some of the details here. Most important thing is the graph pattern.

Meanwhile, in the tomcat console, we found there were long lasting http sessions in the server. Those were few days old, yet still there in the memory. This was a requirement in the application, but it has this negative effect of long lasting sessions. Fixed this issue, and now no more long lasting sessions. Further, we closed all the Streams, IO/http connections after used, and optimised some of the memory usage in the application, but still issue was occurring.

One day midnight we were able to get the heap dump. The heap dump analysis showed us that there are lot of String objects there in the memory. (No choice, strings are highly used in the application for various logics, for report and email generation, etc.) We used Eclipse MAT and VisualVM to analyse the heap dump. Eclipse MAT showed that there is a memory leakage in the class loader in their leak detection analysis.

Above are from Eclipse MAT.

But interestingly, VisualVM gave us a hint that large objects were held by Jasper report classes.  However, we didn't have an idea how this happened and how to simulate the issue in our DEV or UAT environment as the issue was not occurring always. We further analysed the user behaviour when the issue occurred and found a way to simulate the issue.

Marvellous !!!  The culprit was Jasper reports. When the pdf file does not have enough space to add the string generated in runtime, Jasper report library getting stuck and keep accumulating the memory internally. Allocated more space in the pdf file to include the string, and the issue was fixed. 😊😊😊

(GC Algorithm is G1 GC.)

Few hints I can give,

  • Check the sessions in Tomcat console - this will give you long lasting sessions with the timing.
  • Check the threads in Tomcat console - this will show you whether any thread is stuck in Serving state. Which could be a problematic thread.
  • Check the large objects in VisualVM.
  • If you are a customer facing application, then use GA Analytics to see user flow.