一:Lucene3.6过滤器的实现
以空间搜索为例,以下是代码:
public static final class DistanceFilter extends Filter
{
private static final long serialVersionUID = 1L;
private float Radius;
private double x;
private double y;
public DistanceFilter(String location, float radius)
{
this.Radius = radius;
String[] parts = location.split(",");
this.x = Double.valueOf(parts[0]);
this.y = Double.valueOf(parts[1]);
}
private static double rad(double d)
{
return d * Math.PI / 180.0;
}
public DocIdSet getDocIdSet(IndexReader reader) throws IOException
{
OpenBitSet result = new OpenBitSet(reader.maxDoc());
TermDocs td = reader.termDocs(new Term("type", "restaurant"));
double destination_x;
double destination_y;
while (td.next())
{
Document doc = reader.document(td.doc());
String field = doc.get("location");
String[] loca = field.split(",");
destination_x = Double.valueOf(loca[0]);
destination_y = Double.valueOf(loca[1]);
if (getDistance(x, y, destination_x, destination_y, Radius))
{
result.set((long) td.doc());
}
}
return result;
}
public boolean getDistance(double x, double y, double dx, double dy, float radius)
{
double radLat1 = rad(x);
double radLat2 = rad(y);
double wadLat1 = rad(dx);
double wadLat2 = rad(dy);
double a = radLat1 - wadLat1;
double b = radLat2 - wadLat2;
double s = 2 * Math.asin(Math.sqrt(Math.pow(Math.sin(a / 2), 2) + Math.cos(radLat1) * Math.cos(wadLat1)
* Math.pow(Math.sin(b / 2), 2)));
s = s * EARTH_RADIUS;
if (s > radius)
return false;
return true;
}
在Lucene3.6中TermDoc这个类非常有用,可以直接从IndexReader中读取有关term的所有信息,过滤就是从TermDoc中展开。
方法getDocIdSet(IndexReader reader)是从filter中继承的必须实现的方法,返回值是DocIdSet类型的,就是说只要条件符合,result的set方法就将这篇文档标记为需要返回的类型,未被标记的就不返回,从而实现过滤功能。
二:Luene4.0过滤器的实现
在Lucene4.0中,TermDoc这个类被删除了,但是提供了其他的类来代替,具体来讲是用TermsEnum/FieldsEnum/DocsEnum这几个类来代替TermDoc/TermEnum/TermPosition,所以,为了实现过滤要稍微麻烦一点,以下是代码:
public DocIdSet getDocIdSet(AtomicReaderContext context, Bits livedocs) throws IOException
{
OpenBitSet result = new OpenBitSet(context.reader().maxDoc());
double destination_x;
double destination_y;
int document;
BytesRef term=new BytesRef("restaurant");
DocsEnum docEnum = context.reader().termDocsEnum(livedocs,"type",term,false);
while((document=docEnum.nextDoc())!=DocsEnum.NO_MORE_DOCS){
Document doc = context.reader().document(document);
String xfield = doc.get("X");
String yfield=doc.get("Y");
destination_x = Double.valueOf(xfield);
destination_y = Double.valueOf(yfield);
if (getDistance(x, y, destination_x, destination_y, Radius))
{
result.set(docEnum.docID());
}
}
return result;
}
该处省去了跟Lucene3.6中相同的代码,看起来4.0中实现过滤要比3.6中复杂一些。
首先是IndexReader被替换成了AtomicReaderContext,TermDoc被替换成了DocsEnum,具体实现详见代码。