Baking for GoogleBaking for Google

In my last dispatch on the glories of rich, stateful OOP for the Web, I mentioned the little issue of Wicket’s funny-looking URLs, but purposely left out the oft-confused issue of search-engine crawlability.

Like humans exchanging links, search engines need URLs to be consistent. And you might assume that since your Wicket application presents static URLs to users, it would do the same for search engines. But then you would have forgotten about cookies.

Wicket enables Java’s URL rewriting feature for cookieless browsing. Since Google crawls the Web as a cookieless “user,” your bookmarkable links (no matter how pretty you’ve made them through URL mounting) will have a decidedly ugly (and inappropriate) jsessionid appended to them.

While it would be nice to turn off URL rewriting just for search engines, that would require request sniffing that is complicated, and hard to test. (Google, please index my site now!) Another option, one that is employed by sites all over the Web facing similar problems, is to reduce support for browsers with cookies disabled by turning off URL rewriting.

This was tricky to do previously in Wicket, but that’s been remedied. I’ve made it very easy to reduce cookieless support in the latest Databinder snapshot. Just call the new setCookielessSupported(false) method in DataApplication. When users with cookies disabled try to browse to a page that requires a session, they’ll be presented with a customized session expired page.

As you’re deciding whether or not to enable this anti-feature, keep in mind that cookieless users will still be able to visit every bookmarkable page in your site. They just won’t be able to submit Wicket forms, action links, etc. You have to weigh the needs of the few who choose to disable a highly useful, ten-year old Web feature against the millions who might find your content more easily through search engines.

I’ve disabled cookieless session support for the recipe book, as it’s the example most appropriate for indexing. Google hasn’t updated its inbound links as of this writing; sometimes that can take weeks.

I hope you enjoy the new feature. Or if you’re one of those people who browses with cookies turned off, go ahead and contribute a search engine sniffer to Wicket.

Unless you’re too busy with your stealth browsing.

Codercomments

Funny thing is that your site gets mentioned as nbr 2, with this url:

http://databinder.net/recipe/app/show/id/19;jsessionid=66kiemewvs53

Yep, that is with the jsessionid in it. It’s not pretty, but another proof it’s not in Google’s way for indexing.

I was surprised it was ranked that high. I guess there aren’t too many British Lemon Meringue Pie recipes on the web.

Meaningless observation:

Google has updated their inbound links to remove the jsessionid, and our very own pie recipe is now number 1. One explanation is that googlebot thinks the page is new, and it likes new pages. Another is that googlebot likes pertty URLs as much as humans do. Anyway, this rules.

mmmm…. coooookies….

Add a comment