Over 10 years ago, I wrote my first blog post. Since then, I've authored books, had kids, traveled the world, found Trish and blogged about it all.

Roller Searching - Powered by Lucene

Lucene Logo

Thanks to Min, we now have searching in Roller. He wrote a wicked-ass Lucene implementation using the util.concurrent package from Doug Lea. Here's how it works:

  • When Roller starts, it checks to see if the index is OK, and if not, rebuilds it. The index then goes into RAM and stays there until you destroy the servlet context - then it's written to disk. The location is configurable, but defaults to $(user.home} + File.separator + "roller-index".
  • A user's index is updated when they add/delete weblog entries.
  • A user can rebuild their own index via a button on the Website Settings page.
  • An Admin can rebuild a user's index from the "Admin" page and rebuild all users' indexes from the Config page.
  • The IndexManager is the central entry point, and it lives in RollerContext.getIndexManager(). For indexing, searching, etc. you use one of the following operations:

    - AddWeblogOperation
    - RebuildUserIndexOperation
    - RemoveWeblogOperation
    - SearchOperation

    After creating these ops, set any op-specific configuration options and then pass it to the IndexManager.executeIndexOperation() method.
  • Behind the scenes, there is an background thread running. This thread only performs one operation at a time. If an op is added when the thread it busy, the op will be queued. The way Lucene works is that most operations can be threaded. Lucene supports the concept of add, delete, read, query, and optimize. The only methods that cannot be active at the same time are IndexReader::delete() and IndexWriter::add(). Therefore, the operations that perform these operations are put into the background thread queue that garantees that these ops wont be performed at the same time. Searching doesn't interfere with these ops, so it can be run in any thread.

I created a #showSearchForm macro that renders a <form> with a textbox (size=20) and a "Search" submit button. I also added this to all the current themes - so if you developed a theme for Roller - you might want to check it out (username: test, passwd: roller). You can edit it right on the site if you want, then copy/send me the adjusted files. CSS seems to need the most tweaking for these to look right.

Please enter any bugs/enhancements in Roller's JIRA instance. The only one I've seen so far is that a user has to build their index manually before they get any search results. I don't know that this is a bug, just wanted to mention it. Doesn't get comments yet either - a NPE from weblogMgr.getComments() (when adding a new post) kept me banging my head against the wall for an hour - so I commented it out.

Try it, you might like it. ;-)

2 minutes later: Here's a bug - if you update an entry numerous times, it will get presented as numerous times (should be deleted and re-indexed).

Posted in Java at Jul 22 2003, 11:41:59 PM MDT 2 Comments

A correction: the search index is not stored in RAM, on a big site like JRoller that would be impossible, instead, it is stored on disk.

Posted by Dave Johnson on July 26, 2004 at 09:24 PM MDT #

I have roller running on Tomcat, and on a Windows XP. The search is not re-indexing on the change of a web log entry or addition. I set the Search Index Directory to: c:\tomcat 4.1\webapps\roller\temp\cache\roller-index. Even forcing a rebuild of the index does not result in search returning entries back that are in the blog. Any suggestions?

Posted by Dean on October 21, 2004 at 06:17 AM MDT #

