The right tool for the slobThe right tool for the slob

Somebody noticed that Lucene doesn’t use Lucene to index its own site; it presents a Google site search like everybody else. And not just that but their sub-project Nutch, which is specifically for web searching, doesn’t use itself. Natch.

We’ve heard this story before, and the defendants have a common response: we would love to use our wonderful web software for our own web site, but—damn thee, cruel fates!—it’s not the right tool for this highly particular job.

How is it that some fancy-pants framework is always the right tool for an abstract job and PHP is the right tool for a real job?

It seems like someone’s selling a load of crap. But in this case, no: Lucene is not a load of crap. It’s a very down to earth project that does something people need and does it well enough to be respected—even outside the Java community. Lucene is just fine, thank you.

So if the project produces a free product that is worth using for individual sites, why don’t they use it for their own site again? Whatever things a personal site search engine is good for (triggered updates, customized results display, not feeding daddy Googlebucks) apply as well to Lucene’s home site. Aren’t they a little enthusiastic about sending their own software out to the world? Probably they are, but the culture of large scale Java projects, and Apache specifically, is against it. Random comment: runs the Apache httpd server and serves static pages. Allowing every sub-project to run its own software on that server would be pretty insane. Look at how many there are.

If it’s “insane” to let programmers of Apache projects run their own software on Apache servers, that doesn’t say much for their revered vetting process for projects and programmers. Yes, system administration isn’t the same as programming—got it—but any project for the web ought to have someone competent in that as well. It matters.

Java deployment is treated as some kind of Apollo mission, to be undertaken only with careful planning and a healthy budget. Open source projects generally have neither of those, so Java projects are left writing code for a void. Good projects get hooked up with a patron company that pays for practical, internal applications to develop alongside the public framework. Bad projects build elaborately useless public frameworks and scoff at wasting their precious brains on mere applications programming. That everyone is scared of Java deployment plays right into their lazy, conceited plans.

And it’s not the case that every project with a complicated deployment reverts to static HTML or PHP for its own webfront. The Hibernate blog runs on Seam. This weblog is rendered using various technologies it hyperventilates over. And freaking Smalltalk programmers—what could possibly be harder to deploy?—host their own site, their “CMS” if you must, for the Seaside web framework. The site isn’t always fast, it’s probably not always running, but at least the framework has the courage to show itself.

This isn’t 1999. You don’t need a server dedicated to one Java application; a virtualized server is perfectly adequate. Take one meaty server and create virtuals for every project to use, or not use, as suits its disposition. Certainly the exposed site should be an Apache Web Server, but there’s this little thing called mod_proxy_ajp—perhaps the Apache project has heard about it? (If not, it’s discussed in the updated Databinder deployment tutorial.) You can isolate software.

Anyway, eating other people’s dog food is disgusting.


Look at the list of projects that hosts. There are some 55 or so top-level projects, plus tons more in the incubator and in sub-projects. Not all of them are web projects, but many of them are. And they’re not just Java – Tcl, Perl, C++, etc. Letting all of those projects run their own code on a server for demo purposes would be a nightmare in terms of administration, security, and resources. If you think you can click your heels together and say “virtual servers” and make all of that go away, you’re kidding yourself.

If you feel like donating enough hardware and admin time to Apache to make your dream a reality, go for it. Otherwise, you’re not helping.

It’s not my dream, or my cause to help Apache. (I have independent open source projects to help.) But I will point out lame excuses when I see them: virtualization supports any programming language (come on…) and any operating system, without heel clicking. I use several myself, and my hosting company is either “insane” or they’ve set it up so that I can’t bring down other sites, or compromise their security. Apache could do the same, and very cost-effectively for 55 virtual servers. But it hasn’t been their priority to put project code onto the real web, and it’s one of the many reasons I have lukewarm feelings towards the megaproject.

Good luck to you/them!

That was the most ridiculous article I have read in a long time. Lucene is an absolutely incredible package that has and will continue to be the driving force behind many advanced web based systems. A simple web site like the homepage of this product and many of the other apache products simply do not need the horsepower and features of lucene. It would work for them, but id rather them spend those extra few hours developing than integrating it into the site with no real benefit. Google search works fine. Why use a sledghammer to drive in a tac.

Most likely also, the site is not managed by the developers themselves, it seems to be a template for apache projects. Anyway…what is your real point.

Perhaps you should read the post until you discover its “real point” so you’ll know what to defend.

“How is it that some fancy-pants framework is always the right tool for an abstract job and PHP is the right tool for a real job?”

Comments like this seem like they’re aimed at the Lucene/Nutch developers. However, those developers do not control or administer the Apache servers powering For all you know (unless you’ve had conversations with them to the contrary), they’d love to have their own site indexed with their own tools, but that’s not their decision.

Your choice of quoted passage (”…Allowing every sub-project to run its own software on that server would be pretty insane…”) seems to be a slam against those in charge of those servers. And I agree with you that, in an ideal world, Apache would have virtualized servers for each sub-project so they could demo what they need.

But there is absolutely no question that hosting ~55 virtualized servers requires more resources (hardware plus staff) than hosting ~55 static Web sites. If you took a look at their public financial records, you would see that in 2005-06, they only had a bit under $100K in donations to their non-profit (other years are lower), and they say elsewhere on their site that “While all of the administrative effort is currently through unpaid volunteers, we need financial assistance to purchase bandwidth and keep our servers running.”

Ask the Slicehost guys how far $100K will go. Heck, Apache shells out $20K/year just for co-lo services, and that’s just for their ~55 static Web sites.

“Yes, system administration isn’t the same as programming—got it—but any project for the web ought to have someone competent in that as well. It matters.”

It’s not a lack of talent; it’s a lack of funds, and perhaps time. They appear to be fairly frugal, by looking at their financial records, and for a non-profit, that’s a good thing. You can’t afford to be reliant on donations remaining steady year-in and year-out, nor can you rely on corporate backers (one management shuffle and your funding may dry up). So they’re building a reserve fund to cover operational costs, and they’re keeping operational costs down so that fund can keep the lights on for a longer period of time.

Nor can you, J. Random Outsider, demand that unpaid volunteers do things to make you happy, so if the issue is lack of time from people with the talent, that’s kinda the way it is. Perhaps with enough contributions, they would elect to fund an sysadmin position that could resolve issues with lack of volunteer time.

“Apache could do the same, and very cost-effectively for 55 virtual servers.”

Perhaps they’ve done the cost estimates and came to a different conclusion. Remember that firms like Slicehost are amortizing their investment over many more virtual servers than ~55. While ~55 seems like a lot, it may fall in that dead zone where you can’t get by with just a rack-shelf of PCs, and that the infrastructure to “do it right” costs enough that, even spread over 55, is prohibitively expensive.

Small caveat: So long as you are hosting your own site (or using a system like Amazon’s EC2) Smalltalk websites are easy to deploy.

  • put the image and VM on the server
  • set up Apache proxypass rules


Mark, thanks for taking the time to comment.

As “J. Random Outsider” to Apache I’m not making any demands; I’m engaging in criticism. I know as well as you that 55 is not a big number; the suggestion to the contrary is what got on my nerves in the first place. And likewise I do assume that the project that’s grown from the world’s most popular web server has the talent to administer a virtualized setup.

So what’s left are funding and priorities. Their priority is a squeaky clean image, a site for their projects that never goes down and never slows down. That doesn’t interest me in the slightest. If I’m looking for software to run on the web I want to see it running on the web, and if the site holding it is less than 100% stable I understand that perfectly, because I’m a programmer. Who is Apache making their site for?

It’s not like Wikipedia always works perfectly. (Remember in 2003?) Or even wealthy businesses like Facebook—do people stop using those sites when they fail on occasion? The entire popular web is active and therefore unstable; against that backdrop Apache’s insistance on static front ends is out of touch. They could build a playground for projects; instead it’s a museum.

On the micro scale, a smallish project can run on a $30 server / month virtual server just fine. Why would such a project, that is worth $360 a year to itself, move to Apache and host like they’re on Go Daddy?

I’d wager that a re-prioritized $20k / year could include a single virtualizing server to be used optionally by projects through proxying, and that Apache’s interest in the topic hasn’t gone far enough to do an estimate. They’ve probably got a line of Web 0.1 sysadmins stomping their feet saying it will never happen on their watch. But I predict that it will happen, within ten years, or Apache will no longer be able to attract interesting web projects.

So, I guess we’ll find out.

James, I was hoping a Smalltalk person would stop by and say that. ;)

“As “J. Random Outsider” to Apache I’m not making any demands; I’m engaging in criticism.”

Rants get downgraded as “criticism” in my book, simply for tone. The post that these comments are attached to, IMHO, qualifies as a rant. That gets you more traffic and comments but may not get better results.

“Why would such a project, that is worth $360 a year to itself, move to Apache and host like they’re on Go Daddy?”

And here we get to the crux of the issue: why do some projects aim to be under the Apache umbrella?

I’m fairly certain it ain’t for the hosting. Probably, it’s for the cachet, possibly for the legal infrastructure a registered non-profit affords (e.g., centralized copyright ownership?).

Personally, I’d have little interest in having projects as subprojects of Apache. Hosting wouldn’t be the issue – there’s nothing preventing Lucene/Nutch from grabbing their own slice on Slicehost (or their preferred VPS) and using that as an adjunct to their Apache hosted area. My issue would be branding: in the world of open source, you are your brand. I’m not convinced that the boost in visibility from Apache will pay off long-term compared to the probable headaches in having “bosses” with some measure of control over the project.

“But I predict that it will happen, within ten years, or Apache will no longer be able to attract interesting web projects.”

I don’t know about that. Again, I don’t think projects choose to affiliate with Apache because of Apache’s hosting. I also don’t think projects choose not to affiliate with Apache because of Apache’s hosting. There are other criteria at play here (e.g., initial visibility boost, organization for IBM and kin to legally donate code to) that are probably more important.

Don’t get me wrong: I agree that Apache’s hosting is kinda bland, and the quantity and quality of documentation varies widely per project. They’re not the worst (I find Codehaus-hosted projects to be worse to navigate, for example), but they’re far from the best. Their priorities lie elsewhere, as you suggest. Our difference is that you appear to believe that they’ve made the wrong choice, while I believe I don’t know for certain what the other priorities are and therefore cannot pass judgment.

Mark, feel free to grade this post and this criticism however you wish in your book. I will do the same for open source procects and the sites they present themselves with. Thanks for stopping by!

For the record: Apache Infra does support virtualized machines (aka: “the zones”) for projects that want to use them – and Lucene takes advantage of a virtual server to run the automated build systems for each of the sub projects, but we try to minimize the load and network usage on these virtual machines since they are already fairly taxed.

That’s reason #2 why no one has ever bothered to put a lot of work into setting something up like this (any of the other 4 dozen times someone has brought it up).

Reason #1 is a lot simpler…

When I have a free hour, I can either spend it improving Lucene code (either adding features or fixing bugs), or I can spend it answering questions on the user email list, or I can spend it setting up some software so that instead of using the search box we get for free by running forrest we have a search box that points at a Lucene powered search application for our documentation.

This may seem silly to you, but I care more about the code and the users then I do about impressing people, so I’ve never chosen that third option.

You might disagree with my choices, but then again: I’m the one volunteering, and you’re the one posting criticisms of a project you don’t participate in … so I guess that says everything that needs to be said about our respective priorities.

Huh … your comments system seemed to pre-truncate(?) the first paragraph of my comment …

Don’t worry about my commenting system, Hoss. Sometimes its Markdown processor chews up blocks of text with colons, but I go back and escape them. Even for comments that attack me personally.

In addition to being the one posting criticism of things I don’t participate in, I’m also another one volunteering for open source projects. I’ve faced criticism of my “volunteer work” too. (None of this is as virtuous as feeding the homeless. We do enjoy programming, right?)

I know all about your Reason #1, and it has no place in a debate. I spend some of my free time coding for my project or others, some answering questions around the internet, some of it writing about topics I find compelling, some of it reading, and some of it completely wasted doing god knows what. Who cares how I or you choose to slice it? Certainly it’s no one else’s business.

I’m thrilled to hear that Apache has these “zones” and that you’re using them. It’s the most promising thing I’ve heard about the project in a while. To me it seems like rather a shame to just be using it for builds, which can normally execute in a common environment, but then I use my virtuals for builds as well as app servers, because it’s convenient.

Don’t underestimate the value to your project of “impressing people.” There’s nothing vain or unproductive about it. And please try to understand that independent criticism, though it’s rarely received with pleasure, has a long and important tradition. It’s not going anywhere, and the position taken in many open source projects, that the only legitimate criticism is self-criticism, is deeply misguided. A criticism-firewall promotes an isolationist culture that steadily drifts from popular reality (and therefore relevance).

Thanks for taking some of your free time to read and respond.

Wow - some of the people you’re not attacking in this post seem awfully defensive.

I don’t have a dog in this fight - but it sure is entertaining to watch.

Off topic; I used to work for Compaq, in their Netelligent division. We used a lot of gear that was produced by the company before they were bought by Compaq, but none of the Compaq-branded gear.

Why? The rules changed. NetWorth saw the value in having IT eat their own dogfood. Compaq .. not so much. IT had to buy the stuff and pay list price, just like any customer.

So we had a few Compaq servers and of course we had Compaq desktops. But we simply did not have the budget to upgrade network gear.

Yeah, I never thought about it with respect to hardware. I guess any time you’re not using your own work product it’s a threat to morale.

Spot-on. Eating your own dogfood is important to credibility. Whether ASF can afford to host the welter of technologies currently under their wing is a moot point to outsiders. It’s kind of like the joke about the redneck divorcing his new bride after finding out she was a virgin: “If she ain’t good enough for her own family…” That sort of thing…

Argue all you like about whether what’s feasible on the back end, but it doesn’t make a dent in the credibility problem.

And, frankly, I think that ASF’s efforts could stand some prioritizing after the explosion of supported projects in recent years. (How many servlet/JSP containers does anyone need, honestly?) I’d feel much better about donating (again) if I knew that my dollar was nurturing the best-of-breed, rather than germinating anything that looked promising. A bit of an exaggeration. But I’ve been wearing the project management hat a little too long to have much patience with mere chest-thumping. If you use your own product day in and day out, you’ll know where more bodies are buried. That brings up my comfort level, and my willingness to throw money at its PayPal account and stake some of my own credibility by recommending it to my bosses or my peers.

Add a comment