PyLucene实战

最新推荐文章于 2021-04-14 11:15:12 发布

fan_hai_ping

最新推荐文章于 2021-04-14 11:15:12 发布

阅读量1.3w

点赞数

分类专栏：编程基础文章标签： lucene python query java file import

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/fan_hai_ping/article/details/7966461

版权

编程基础专栏收录该内容

28 篇文章 1 订阅

订阅专栏

PyLucene是Java版Lucene的Python版封装。这个工具的目标是让Python使用Lucene的文本索引和搜索能力。它与Java版Lucene的最新版本是兼容的。PyLucene把一个带有JAVA VM的Lucene嵌入到Python进程中。你可以在http://lucene.apache.org/pylucene/网站上找到更多的PyLucene详情。

本文中，我们将描述如何使用PyLucene构建搜索索引和查询一个搜索索引。你可以从先前的文档看到Lucene3.0安装说明。

PyLucene-Win32下的安装包可以从下面的网址中找到：

http://code.google.com/a/apache-extras.org/p/pylucene-extra/downloads/list

注：使用PyLucene必须安装Java SDK。

1. 使用PyLucene创建索引

使用下面的代码基于PyLucene来创建索引

#!/usr/bin/env python

import os,sys,glob

import lucene

fromlucene import SimpleFSDirectory, System, File, Document, Field, \

StandardAnalyzer, IndexWriter, Version

"""

Example of Indexing with PyLucene 3.0

"""

def luceneIndexer(docdir,indir):

"""

IndexDocuments from a directory

"""

lucene.initVM()

DIRTOINDEX= docdir

INDEXIDR= indir

indexdir= SimpleFSDirectory(File(INDEXIDR))

analyzer= StandardAnalyzer(Version.LUCENE_30)

index_writer= IndexWriter(indexdir,analyzer,True,\

IndexWriter.MaxFieldLength(512))

fortfile in glob.glob(os.path.join(DIRTOINDEX,'*.txt')):

print"Indexing: ", tfile

document= Document()

content= open(tfile,'r').read()

document.add(Field("text",content,Field.Store.YES,\

Field.Index.ANALYZED))

index_writer.addDocument(document)

print"Done: ", tfile

index_writer.optimize()

printindex_writer.numDocs()

index_writer.close()

你必须提供两个参数给luceneIndexer()函数。

1）一个保存被索引文档的目录路径；

2）一个索引存储的目录路径。

2. 使用Pylucene查询

下面的代码用于查询Pylucene创建的索引。

#!/usr/bin/env python

import sys

import lucene

fromlucene import SimpleFSDirectory, System, File, Document, Field,\

StandardAnalyzer, IndexSearcher, Version,QueryParser

"""

PyLucene retriver simple example

"""

INDEXDIR = "./MyIndex"

def luceneRetriver(query):

lucene.initVM()

indir= SimpleFSDirectory(File(INDEXDIR))

lucene_analyzer= StandardAnalyzer(Version.LUCENE_30)

lucene_searcher= IndexSearcher(indir)

my_query= QueryParser(Version.LUCENE_30,"text",\

lucene_analyzer).parse(query)

MAX= 1000

total_hits =lucene_searcher.search(my_query,MAX)

print"Hits: ",total_hits.totalHits

forhit in total_hits.scoreDocs:

print"Hit Score: ",hit.score, "Hit Doc:",hit.doc, "HitString:",hit.toString()

doc= lucene_searcher.doc(hit.doc)

printdoc.get("text").encode("utf-8")

luceneRetriver("really coolrestaurant")

在代码中，我们认为的指定索引目录为INDEXDIR=./MyIndex，你也可以使用命令行参数(sys.argv)来接收索引目录来替换它。

当使用函数luceneRetriver()时，你必须给一个查询作为参数。

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
PyLucene实战

PyLucene是Java版Lucene的Python版封装。这个工具的目标是让Python使用Lucene的文本索引和搜索能力。它与Java版Lucene的最新版本是兼容的。PyLucene把一个带有JAVA VM的Lucene嵌入到Python进程中。你可以在http://lucene.apache.org/pylucene/网站上找到更多的PyLucene详情。本文中，我们将描述如何
复制链接

扫一扫

专栏目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。