« The World of Warcraft Honor System: A Retrospective Analysis | Main | MOTU Mach Five: The Review I Wish I Had Read »

Large indexes, dates, and Lucene

November 16, 2005

I spent a lot of time trying to figure out how to get the best possible performance out of our Lucene implementation at WhatsOnWhen; we have an awful lot of information, all of which is only valid for a range of dates. This fundamentally means that everything we do is dependent on performing date ranges across a lot of content which, itself, contains date ranges for its validity.

I've just written up the technique we used to build and construct a high-performance Lucene implementation for handling lots of dates on Lucene's wiki for all to read.

If you have any comments or issues with it, feel free to drop me a mail - I'm always interested in hearing feedback on whether this works for you as well as it has worked for us. :)

Code isn't available - at the moment, the code needed for indexing is embedded in our index-writing code, and our custom DateRange and DateRangeDateIterator classes use our internal debugging API; but everything you need to know, in principle, is covered in the article to construct your own implementation.

Date range processing has long been one of Lucene's weaknesses - I hope one of the core developers reads that, decides it's a great idea for the core, and implements it into the Lucene core architecture for 2.0 - it does exactly what it says on the tin, essentially allowing for huge ranges of dates to be processed without hitting boolean query limits; it extends in both directions - the strategy can easily be built to support fine-grain to milliseconds as well as outwards to decades, centuries, and if one wished, further.

It's a solid strategy that works. I hope you find it useful.

TrackBack

TrackBack URL for this entry:
http://www.ctoforaday.com/cgi-bin/mt/mt-tb.cgi/50

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About This Article

This page contains an article posted on November 16, 2005 11:18 AM.

The previous post in this blog was The World of Warcraft Honor System: A Retrospective Analysis.

The next post in this blog is MOTU Mach Five: The Review I Wish I Had Read.

Many more can be found on the home page or by looking through the full article list.

www.flickr.com
gblock's items Go to gblock's photostream
Creative Commons License
This weblog is licensed under a Creative Commons License.

PS3 ID: CTOForADay
Wii: 1974 6313 6054 0208