Lucene custom scoring for numeric fields

I would like to have, in addition to standard term search with tf-idf similarity over text content field, scoring based on "similarity" of numeric fields. This similarity will be depending on distance between the value in query and in document (e.g. gaussian with m= [user input], s= 0.5)

I.e. let's say documents represent people, and person document have two fields:

  • description (full text)
  • age (numeric).

I want to find documents like

description:(x y z) age:30

but age to be not the filter, but rather part of score (for person of age 30 multiplier will be 1.0, for 25-year-old person 0.8 etc.)

Can this be achieved in a sensible manner?

EDIT: Finally I found out this can be done by wrapping ValueSourceQuery and TermQuery with CustomScoreQuery. See my solution below.

EDIT 2: With fast-changing versions of Lucene, I just want to add that it was tested on Lucene 3.0 (Java).

6
задан jakub.g 24 March 2014 в 19:40
поделиться