Tsearch2: Too slow for you
I’ve been following in Andrew Smith’s footsteps for a database import of freedb, as he seems to be the only person publishing import scripts that cares about a normal database structure.
Andrew’s a PostgreSQL believer and his walk-through instructions are written for that database. I followed his instructions exactly on my first run through, but then decided to diverge in order to make the tables more ActiveRecord friendly.
I thought I might as well have a simple “discs” table with the disc titles as a varchar(255) right there. (This lead to more wasted text than I expected, because of duplicate disc entries, but those dups have got to be dealt with some day anyway.)
A full-text search of disc titles is more appropriate than a starts-with search, so I set up Tsearch2, following the lengthy instructions for word stemming with an iSpell English language dictionary.
Hours later, after plenty of vacuum full analyzing, I had an index that was far too slow to use. I had followed the instructions to the letter and double checked that the stemming dictionary was working, but it was just dog slow—several seconds per query.
That seemed strange, since Google manages to search the whole Internet in a fraction of a second. So I used Google to research other full-text indices. It turns out MySQL has built in full text indexing, so I gave it a go. I managed to load the tabulated file that I’d created for PostgreSQL into MySQL, then added a full-text index on disc titles.
While the info I had googled consistently pegged MySQL’s implementation as easiest to set-up, it also alluded to to it being slower than Tsearch2:Or, if you’re on PostgreSQL, you can simply use tsearch2. It still requires some parsing but works much better, faster and has proper stemming.
(Uh… “simply?” Simply reverse global warming, while you’re at it.)
I suspect that people assume Tsearch2 is faster because it’s harder to set up, and can do fancy stemming. But I compared the default setup for both engines, and my unusable multi-second Tsearch2 query came down to a fraction of a second in MySQL.
I’m sure a PostgreSQL groupie would say that I need to tune my index, or whatever-the-hell, but the default configuration’s performance is so poor that I won’t bother. I’d rather spend time tuning the implementation that started off 10 times faster.
Good bye for now PostgreSQL, maybe some day you’ll have a built in full-text that is worthy of your followers’ enthusiasm. Until then, I’m back to my bread and butter MySQL, chugging along for this little demo.
Update: That demo and my toyings with freedb are history. These days my baby is Databinder.
Codercomments
The downside of using full text indexing in MySQL is that it only works with MyISAM tables which are not transactional and could easily get corrupted. There are some plugins that add this feature to the Innodb tables but the ones that I found are not above version 1.0 yet…
Yeah, but they are working on it. If I needed full-text search for a site, keeping one table as MyISAM for a fast, built-in search is a free option I’d be glad to have.
Again, the mysql solution doesn’t work with the transactional database engines. So the speed of the full text search in mysql comes at the cost of data integrity. I know which I prefer.
No need to repeat comments from two and a half years ago, really.
Have a look at that website. PostgreSQL comes configured to run on low-memory systems. Increase it’s shared buffer size and the maximum shared buffer size of your kernel (if you’re running Linux) and you should see a drastic performance increase. I don’t see how you can have a “normal database structure” without foreign keys. Unless all of your tables are independent, which is seldom the case.
It seems the “Web site” box doesn’t actually get displayed anywhere. Less than optimal, here it is: http://www.revsys.com/writings/postgresql-performance.html.
I won’t be looking at that web site since I ceased to be interested in this question, oh, three years ago, but somebody is linking here from some hot and heavy debate wiki so I guess the fun will never end. The url from a comment is linked from the name in the signature. It’s optimally sneaky!
Hey kids! Comments on this post are closed (as in, will be deleted without hesitation or remorse). Please vent squirrelly rage somewhere else. Thanks.
Add a comment