Resourceful cachingResourceful caching

One of the strengths of the Wicket framework is that while it enables sky-high feats of abstraction on the trapeze of application state, a tire swing is available when you need a state closer to the ground. For stateless resources, on the other hand, all you’ll ever need is a caching safety net.

The original, and still conceptually dominant, model of web resources is a filesystem hierarchy exposed as a URL path. The Java servlet specification follows this tradition, requiring that the servlet container serve files in directories found under the application root. However, many users bypass the servlet container for this task and use a plain web server, because the servlet container adds no value and may be slower.

Wicket doesn’t follow the traditional resource model of serving a filesystem hierarchy to the web; it uses the Java namespace for organization. That is, classes and packages. Consider some of the resources linked from this page:

/resources/com.typeturner.pages.Home/typeturner.css
/resources/com.typeturner.Typeturner/image?name=ganymede
/resources/com.typeturner.Typeturner/code-rss

The first is a stylesheet. Before compilation it is a plain file that lives in the source hierarchy. At runtime, it is a classpath resource. While that resolution mechanism is straightforward, it is entirely foreign to Apache and other web servers. There is no way to serve typeturner.css but through Wicket and you can’t, therefore, bypass Wicket or the servlet container.

Why does Wicket offer this funky organization scheme? Firstly, there are some organizational advantages to keeping resources near the UI classes that use them—if you disagree you can always nest them in folders below the app root and refer to them the old fashioned way. But that’s missing the context: this is only the simplest case of Wicket’s larger resource scheme, which enables an application to serve any kind of resource dynamically. For dynamic resources there is no filesystem arrangement to harken back to. (cgi-bin?) Using a Java namespace is the most natural option available.

The second resource listed above is an uploaded photo for this weblog, which is mounted at image in the Typeturner namespace and referred to with a selected identifier. It is a blob in the database, so, it’s slow. The third resource is the RSS feed, generated on the fly using Rome. These are far from flat files that Apache could serve, but at the same time, they’re web resources that take some time to produce that we would really like to optimize (even on today’s wonderful hardware).

Let’s assume you’ve added correct Last-Modified headers to your custom Wicket resources. (i.e., Let’s assume you’ve done the hard part already. But whatever. If you care about web performance you need to set those anyway, so git ’er done!) Wicket does it for you for the static package resources, so, check. Let’s also assume you’re using Apache to serve by reverse proxy, AJP or just HTTP, because that is really easy.

Okay, then, it turns out that Apache has been experimenting with a caching module for a while and in version 2.2 (like you have in Hardy Heron) it is quite stable. So, while, Coderspiel has actually told this story before in the context of the fiendish Varnish cache, having it solidly built into Apache makes it worth adding to any reasonably busy Wicket application.

Apache offers separate disk and memory caches, and although the memory cache is probably great and all, if your main concern is huge images that bog down the MySQL and Java pipeline you should probably start with the disk cache. Then, the resource size does not much matter and, magically, it’s like you implemented an annoying-to-maintain filesystem-based image upload system instead of a database-based one. (And don’t ever come down—freebase!)

Um, code… there is not much of that. In Ubuntu after you a2enmod cache and disk_cache, the cache is rashly enabled for all requests. That might not be a great idea if your application is mostly dynamic, like most Wicket applications. It shouldn’t cause failure, because Wicket sets a bunch of cache busting headers, but you may as well restrict the cache to requests that it may be able to help with. That is, resources. So in mods-available/disk_cache.conf, make it so:

CacheEnable disk /resources/

Bug alert! Apache’s mod_cache seems to mishandle some 304 Not Modified responses that are not accompanied by an Expires header, and Wicket isn’t setting Expires for unmodified resources. This mostly affects Safari, which sends a max-age=0 with no If-Modified-Since for its reload operations. The Apache cache then checks Wicket for updated content, but it returns the brief 304 response back to Safari which isn’t expecting it, since the request was unconditional. Bah! Coderspiel will do what it can to see that this is fixed.

Restart Apache and start curl -I ng headers. The first request will fetch the resource like always, but after that (given the same request headers) you should see an Age header reporting the seconds the item has been in the cache. If that’s not there, it’s not being cached for whatever reason. Also, if using AJP you’ll be used to seeing your Jetty or other container listed as the server, but when Apache serves a cached resource it lists itself as the server.

This should be faster even for static package resources, because the request won’t have to snake its way through the JVM, but it would be an unremarkable difference. The real boon is a shortcut to data from the database, and to an even greater extent, transformations of that data.

With timestamps and caching, finally you can serve web resources by trapeze without fear of neck breakage.

Codercomments

if you say to wicket that the resource is cache able then wicket already also does lastmodified checks and so on.

But Wicket doesn’t (and can’t) know the last modified time of database blobs, or an RSS feed, unless the application tracks and returns it in lastModifiedTime.

Add a comment