首先看一个测试类
@Test
public void searchSimilar() throws Exception {
IndexReader ir = IndexReader.open(FSDirectory.open(new File(INDEX_PATH)));//打开索引
ImageSearcher is = ImageSearcherFactory.createDefaultSearcher();//创建一个图片搜索器
FileInputStream fis = new FileInputStream(SEARCH_FILE);//搜索图片源
BufferedImage bi = ImageIO.read(fis);
ImageSearchHits ish = is.search(bi, ir);//根据上面提供的图片搜索相似的图片
for (int i = 0; i < 9; i++) {//显示前10条记录(根据匹配度排序)
System.out.println(ish.score(i) + ": " + ish.doc(i).getFieldable(DocumentBuilder.FIELD_NAME_IDENTIFIER).stringValue());
}
System.out.println("分界线*****************");
Document d = ish.doc(0);//匹配度最高的记录
ish = is.search(d, ir);// 从结果集中再搜索
for (int i = 0; i < 4; i++) {
System.out.println(ish.score(i) + ": " + ish.doc(i).getFieldable(DocumentBuilder.FIELD_NAME_IDENTIFIER).stringValue());
}
}
把鼠标放在is.search(bi, ir)的search上面可以看到调用的是search(BufferedImage arg0, IndexReader arg1),想查看他的实现的时候发现有好多,如图
具体是哪一个呢,可以看一下第4行代码,
ImageSearcher is = ImageSearcherFactory.createDefaultSearcher();//创建一个图片搜索器
会发现调用了createDefaultSearcher()方法,在lire源代码中找到此方法,如下
/**
* Returns a new default ImageSearcher with a predefined number of maximum
* hits defined in the {@link ImageSearcherFactory#NUM_MAX_HITS} based on the {@link net.semanticmetadata.lire.imageanalysis.CEDD} feature
*
* @return the searcher instance
*/
public static ImageSearcher createDefaultSearcher() {
return new GenericFastImageSearcher(NUM_MAX_HITS, CEDD.class, DocumentBuilder.FIELD_NAME_CEDD);
}
会看到是调用了GenericFastImageSearcher类,跳到此类,找到对应的search方法,代码如下
public ImageSearchHits search(BufferedImage image, IndexReader reader) throws IOException {
logger.finer("Starting extraction.");
LireFeature lireFeature = null;
SimpleImageSearchHits searchHits = null;
try {
lireFeature = (LireFeature) descriptorClass.newInstance();
// Scaling image is especially with the correlogram features very important!
BufferedImage bimg = image;
if (Math.max(image.getHeight(), image.getWidth()) > GenericDocumentBuilder.MAX_IMAGE_DIMENSION) {
bimg = ImageUtils.scaleImage(image, GenericDocumentBuilder.MAX_IMAGE_DIMENSION);
}
lireFeature.extract(bimg);
logger.fine("Extraction from image finished");
float maxDistance = findSimilar(reader, lireFeature);
searchHits = new SimpleImageSearchHits(this.docs, maxDistance);
} catch (InstantiationException e) {
logger.log(Level.SEVERE, "Error instantiating class for generic image searcher: " + e.getMessage());
} catch (IllegalAccessException e) {
logger.log(Level.SEVERE, "Error instantiating class for generic image searcher: " + e.getMessage());
}
return searchHits;
}
上面代码中if的作用是如果输入图像分辨率过大,当然这里是大于默认值1024,就将图像缩小.
接着用extract方法提取图像的特征值。
接着用findSimilar方法进行查找相似的图片
然后新建一个ImageHits用来存储查找结果。
最后返回这个结果。
关于extact方法,我看的是CEDD的,是一些列复杂的算法,如果想详细查看的话,请搜索下CEDD算法。
下面是findSimilar()方法的代码,
/**
* @param reader
* @param lireFeature
* @return the maximum distance found for normalizing.
* @throws java.io.IOException
*/
protected float findSimilar(IndexReader reader, LireFeature lireFeature) throws IOException {
float maxDistance = -1f, overallMaxDistance = -1f;
boolean hasDeletions = reader.hasDeletions();
// clear result set ...
docs.clear();
int docs = reader.numDocs();
for (int i = 0; i < docs; i++) {
// bugfix by Roman Kern
if (hasDeletions && reader.isDeleted(i)) {
continue;
}
Document d = reader.document(i);
float distance = getDistance(d, lireFeature);
assert (distance >= 0);
// calculate the overall max distance to normalize score afterwards
if (overallMaxDistance < distance) {
overallMaxDistance = distance;
}
// if it is the first document:
if (maxDistance < 0) {
maxDistance = distance;
}
// if the array is not full yet:
if (this.docs.size() < maxHits) {
this.docs.add(new SimpleResult(distance, d));
if (distance > maxDistance) maxDistance = distance;
} else if (distance < maxDistance) {
// if it is nearer to the sample than at least on of the current set:
// remove the last one ...
this.docs.remove(this.docs.last());
// add the new one ...
this.docs.add(new SimpleResult(distance, d));
// and set our new distance border ...
maxDistance = this.docs.last().getDistance();
}
}
return maxDistance;
}
查询Lucene的API可以发现hasDeletions()方法的作用是 Returns true if any documents have been deleted
下面调用了getDistance()方法,得到两个的相似距离,代码如下:
/**
* Main similarity method called for each and every document in the index.
*
* @param document
* @param lireFeature
* @return the distance between the given feature and the feature stored in the document.
*/
protected float getDistance(Document document, LireFeature lireFeature) {
tempBinaryValue = document.getBinaryValue(fieldName);
if (tempBinaryValue != null && tempBinaryValue.length > 0) {
cachedInstance.setByteArrayRepresentation(tempBinaryValue);
return lireFeature.getDistance(cachedInstance);
} else {
logger.warning("No feature stored in this document! (" + descriptorClass.getName() + ")");
}
return 0f;
}
在这里又调用了一个getDistance()方法,在这里可以直接跳到CEDD类下的此方法,通过算法计算得到值。
最后通过比较得到需要的值。
这里返回到search()方法这里用到了SimpleImageSearchHits()的构造方法,代码如下
public SimpleImageSearchHits(Collection<SimpleResult> results, float maxDistance) {
this.results = new ArrayList<SimpleResult>(results.size());
this.results.addAll(results);
// this step normalizes and inverts the distance ...
// although its now a score or similarity like measure its further called distance
for (Iterator<SimpleResult> iterator = this.results.iterator(); iterator.hasNext(); ) {
SimpleResult result = iterator.next();
result.setDistance(1f - result.getDistance() / maxDistance);
}
}
一次搜索的调用就完成了,很多地方都没看懂,只是想先把这个流程走下来。希望懂的人能指点