Tracking down a NullPointerException in uPortal

holdorph's picture

When debugging a problem on a project a few months ago, we ran across an interesting discovery.

The situation was a Portlet was returning a NullPointerException trying to access the PortletSession as seen in the following code.

request.getPortletSession().getAttribute("key");

Further investigation showed that it was the return of getPortletSession() that was null, not the request. Hunting through the Pluto code I discovered that there was no way for it to return null unless the underlying HttpServletRequest.getSession() call returned null. This problem was only occurring in the performance test environment and was not reproducible in individual testing.

The Servlet specification says that getSession() is the same as getSession(true) and will always return a Session object. It does say that certain error condition may generate a stacktrace. Nowhere is it supposed to return null (when create is true).

So we opened up the Tomcat code. This was an interesting experience. In the first 3-5 lines of code there was an immediate return of null if a particular local variable was not set. And going farther, there were at least 4 ways null could be returned, despite the 'create' value being set to true. Clearly upholding the specification was not the priority in this code. This information did not fix any problem for us though. We needed to figure out what the root cause was and try to make it go away.

Well we didn't have a lot to go on. Of the 4 cases that may return null, 3 of them were easy to consider 'not likely' (like the webapp context local variable not being set). So we were left with only one likely culprit. There was a check on the Session object to determine (if it existed) was it valid or not. This simply checked a local variable of the session, and also checked timeout conditions. This was our best 'guess' then, was something timeout related going on. At the same time, it was discovered this condition was happening in exactly the same place every time, and this place was taking a long time to render because a web service call was taking a long time to resolve. Lennard then looked at some timeout values. The channel render timeout value in uPortal was set to a minute, but the timeout value of the tomcat connection was set to 20 seconds. So Tomcat was timing out before uPortal stopped trying to let the portlet render. Lennard then went to try to prove this theory, by inserting a sleep() into a portlet to force the problem to reproduce the problem. He succeeded.

Lennard then changed the timeouts and ran the test case again, and no longer saw the problem. So, although there is no precise line of code we can point to, we're sure tomcat is somehow causing the session to be considered 'invalid' if code tries to retreive it after the connection timeout value passed.

There are a few lessons to be learned here.

  • Timeouts need to be set up correctly. Specifically the tomcat timeout should always be greater then the channel render timeout.
  • Access to the Session should be earlier in your render if possible, and before any calls outside of the system that you will not be in control of (for how long they take to execute).
  • You can't always believe the specification at face value. Open source is great because you can go investigate what the code is really doing.

In reality this only changes the way an error is displayed on screen to the end user. The underlying Web Service is still taking longer then a minute to run, so the user is never going to wait around that long. We had to put some caching in place and try to make it so users never run that web service call.

---- Cris J H