Home | about | blogs | moodles-race-with-cas-server

Moodle's race with CAS Server

Share it now!

A client was testing our integration between CAS Server 3.5.2.1 and Moodle 2.6. The lead tester reported that it was often requiring two authentications attempts to login into CAS Server after being redirected there from Moodle. Interestingly enough I was rarely able to invoke the same symptom. The exact symptom is that a user sees the login page, enters their credentials, and after hitting the Login button, the user is show an empty login form. There is no error message, no pre-populated username field. Just the empty login form. Interestingly enough, it only seem to be a problem when the initial authentication was coming from Moodle. If the user was already authenticated there was no problem.

After opening up the Spring Webflow's logging, I was able to trace through to the login action, but here the Spring Webflow complained about a missing session. This is odd because CAS Server's default Java web session length is 5 minutes. (That's fairly short for most web applications, but CAS only needs a Java web session during the initial user authentication.) So I turned on Chrome's network display tab in the Developer Tools. What I found was that Moodle was performing a Gateway request (/cas/login?service=...&gateway=true) immediately before requiring user authentication (/cas/login?service=...). 

It took a little more digging to undertand 1) why was Moodle doing a Gateway authentication request immediately before requiring user authentication, 2) why was CAS Server (sometimes) having abnesia when the user logged in, and 3) most importantly, how do we fix it? Here's a break down into each of those questions:

1) Why was Moodle doing a Gateway authentication request immediately before requiring user authentication?

The answer to this question is fairly straight forward. In Moodle's /auth/cas/auth.php->loginpage_hook(), is a call to phpCAS::checkAuthentication(). This benignly named* method actually calls/invokes CAS's Gateway feature: If the user already has a SSO session give the calling service a service ticket, otherwise return immediately. I would argue that phpCAS::forceAuthentication() should actually be called here (and not two if statements later) and would have the affect that the developers really desired... not making two round trips to CAS Server in less than 2 seconds. 

So I didn't do the research, but my guess is that this less than performant code has been in Moodle since CAS authentication was introduced. I thought it was curious that no one had found this issue for so long. But, it certainly explains why the Gateway call is being made followed by a standard authentication request, so besides adding a little extra load to CAS Server, why would CAS Server not accept the first login form submission?

Note: I personally think the method names of the PHP CAS Client are mis-leading, so I completely understand why this mistake was made.

2) Why was CAS Server (sometimes) having abnesia when the user logged in?

This answer came fairly quickly as well. In CAS Server version 3.5.1, the TerminateWebSessionListener was introduced. This addition added the benefit of ending the user's Java web sessions after the login-webflow ended. This should save server memory since the memory is freed immediately instead of waiting the 5 minutes for the session to timeout. The default expiration when the webflow ends is set to 2 seconds.

This is where the light bulb went off. Moodle's gateway call was kicking off a Java web session in CAS Server and after the immediate redirect back to Moodle the session explodes in 2 seconds. If the round trips are fast enough with network latency, etc. then when the user is shown the login page on the second return to CAS Server (before the session explodes) the existing one is used. But unless the user can type in their credentials and submit them within this narrowing window, the session doesn't exists when the user hits the Login button. Then Spring Webflow starts a new session and restarts the login-webflow and the user is stuck staring at a shiny new login page scratching their head wondering why.

So the cause was a race condition between Moodle and CAS Server when the CAS Server was version 3.5.1 or greater and when the client/server responses are quick, which admitly is only getting faster these days.

3) How do we fix it?

The fix from the CAS Server side is actually pretty simple. In the cas-servlet.xml file, the terminateWebSessionListener bean's timeToDieInSeconds property needs to be set to zero (0), like this: 

<bean id="terminateWebSessionListener" class="org.jasig.cas.web.flow.TerminateWebSessionListener"
      p:serviceManagerUrl="${cas.securityContext.serviceProperties.service}"
      p:timeToDieInSeconds="0" />

This ends the java web session immediately after the login-webflow completes. From there calling, mvn clean package and deploying the new cas.war fixed things right up.

Epilogue

Recently, I've had another client that was experiencing this issue with a non-Moodle application. It was PHP-based, but they indicated that they were not calling phpCAS::checkAuthentication(), but I recommended trying to changing the timeToDieInSeconds anyways. This seem to have fixed it. I didn't have the ability to analyze this one like the first issue, but I'm guessing there are other scenarios that this fix might help.

Return to the blog listing page