Lucene Sort and Filter
Apache Lucene is a great search technology. Usually you perform searches on a search index, and you try to optimize the ranking of the results. But sometimes you would like to simply sort the result by one field like a date.
Let’s discuss the following use case: You maintain a search index of HTML documents, each time you create or update a page an update of the search index will be automatically triggered. Additionally you would like to store the history of the indexed documents in the index. The search index may consist of the following fields:
- id
- page
- body
- updated
- version
- live
The following table shows a snapshot of the index. The columns are the fields, and each row of the table represents one document of the Lucene index.

- id identifies each individual document and is always required with a Lucene index.
- page identifies each HTML page that is indexed.
- body is the text of the HTML page, normal searches search terms in this field.
- updated is the date of the last update, represented as long value
- version stores a version history of the indexed pages
- live is just a field that marks the actual version of a page. This helps to filter the actual page documents.
Now I would like to list the latest updated pages. This can be easily done with a search on the search index via sorting reverse on the field “updated”. Additionally I want to filter out only pages that are live, because otherwise the about.html page will be displayed twice.
...
Searcher searcher = new IndexSearcher(indexPath);
Query query = new MatchAllDocsQuery();
Filter filter = new LiveFilter();
Sort sort = new Sort("updated", true);
Hits hits = searcher.search(query, filter, sort);
...
First get a Lucene searcher of a search index. For the query we would like to return all documents in this case. Additionally filter out only documents with the field live equals Y, and sort by the field updated in reverse order. Be aware, the sorted field (in this case “updated”) must not be tokenized.
The LiveFilter class looks like this:
public class LiveFilter extends Filter {
public BitSet bits(IndexReader reader)
throws IOException {
BitSet bitSet = new BitSet(reader.maxDoc());
Term term = new Term("live", "Y");
TermDocs docs = reader.termDocs(term);
while (docs.next()) {
bitSet.set(docs.doc());
}
return bitSet;
}
}