Challenges of Custom Protocol Handlers in Java

drew's picture

Over the past few days I have learned a lot about (and struggled mightily with) custom URL protocol handlers in Java. I'm grateful that the authors of Java included a mechanism for supporting custom URL protocols -- e.g. classpath:// to refer to a resource in the Java classpath -- but now that I have worked with that mechanism myself in some detail, I believe it needs to be revisited and (ideally) enhanced.

For some time now I've been using a custom protocol handler to support the classpath protocol in my pet project: Cernunnos. Task implementations in Cernunnos are normally written to read resources like XML files, .properties files, and XSL Transformations from URLs. This approach means that these resources may reside in the file system, at an HTTP or FTP address, or any location that can be referenced and reached by a URL in Java. Adding a protocol handler for classpath:// makes these Task implementations more flexible.

There are potentially three ways to support custom protocol handlers in Java:

  1. Implement URLStreamHandlerFactory, instantiate your class, and call the URL.setURLStreamHandlerFactory method
  2. Create a class named Handler that extends URLStreamHandler and place it in a package hierarchy that ends with the name of the protocol you want to support. For example, you could create com.mycompany.protocols.classpath.Handler to support the classpath:// protocol. Additionally, you must set the 'java.protocol.handler.pkgs' system property -- at startup -- to include the part of your package structure before the protocol name ('com.mycompany.protocols' in this example)
  3. Instantiate a custom URLStreamHandler and provide a reference to it whenever you construct a URL object

Until just a few days ago, I had been using approach #1 in Cernunnos. It worked perfectly until I tried to use Cernunnos technology in a servlet container -- Tomcat, in this case. The documentation for approach #1, clearly states "[t]his method can be called at most once in a given Java Virtual Machine." Guess what? -- Tomcat had already called it. I got an error like the following:


java.lang.Error: factory already defined
        at java.net.URL.setURLStreamHandlerFactory(URL.java:1074)
        at org.danann.cernunnos.runtime.ScriptRunner.<clinit>(ScriptRunner.java:43)
        at org.apache.jsp.WEB_002dINF.jsp.index_jsp._jspService(index_jsp.java:89)
        at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:332)
        ...

Approach #2 comes with some bitter pills when it comes to application containers:

  • Handler implementations must be visible to the System class loader, and therefore they must be packaged separately from your application and grafted to your container "after market"
  • The 'java.protocol.handler.pkgs' system property is only read one time, as the JVM starts up, so (in most cases) you have to dig into and modify the scripts that launch your container

These steps not only introduce meaningful (and unwanted) complexity, they're not governed by standards, which means they would need to be re-explored, re-created, and re-documented for every container where your application ultimately gets deployed. What a pain. This approach is ugly, but I fear it's the most viable.

There's a very serious flaw in approach #3, which takes it completely off the table (in my book): it only works for URL objects created within your code. Custom URL protocol handlers are attractive to me because they allow existing technologies to access new types of resources and be used in innovative new ways. Individual technologies must be recoded to use approach #3, which defeats the purpose.

drew wills