Zend Framework之Search_Lucene实例

//创建具有不同特点的字段
    $doc = new Zend_Search_Lucene_Document();// Field is not tokenized, but is indexed and stored within the index.
    // Stored fields can be retrived from the index.
    $doc->addField(Zend_Search_Lucene_Field::Keyword('doctype', 'autogenerated'));
    // Field is not tokenized nor indexed, but is stored in the index.
    $doc->addField(Zend_Search_Lucene_Field::UnIndexed('created', time()));
    // Binary String valued Field that is not tokenized nor indexed,
    // but is stored in the index.
    $doc->addField(Zend_Search_Lucene_Field::Binary('icon', $iconData));
    // Field is tokenized and indexed, and is stored in the index.
    $doc->addField(Zend_Search_Lucene_Field::Text('annotation', 'Document annotation text'));
    // Field is tokenized and indexed, but that is not stored in the index.
    $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', 'My document content'));

Keyword关键词字段是被保存和被索引的,意思是它们既可以被搜索,也可以在搜索结果中被显示。它们并没有以记号化的方式拆分为多个不同的词。在 Zend_Search_Lucene 中,枚举形的数据库字段通常可以很好的转化为关键词字段。

UnIndexed不索引字段是不可搜索的,但是它们会在搜索结果中返回用于生成点击信息。数据库的时间戳、主键、文件系统的路径、以及其它标识是不索引字段的好的候选人。

Binary二进制字段是不记号化和不被索引的,但是被保存以供生成点击信息。它们可以用于保存任何以二进制方式编码的信息,例如图标等等。

Text文本字段是被保存的、被索引的和记号化的。文本字段适合用于保存像是主题、标题这样既能被搜索又能作为搜索结果返回的信息。

UnStored不保存字段是记号化和被索引的,但并不保存在索引中。大量的文本信息最好使用这种字段类型。保存的数据在硬盘上创建了大量的索引,如果你需要搜索而不需要在搜索结果中显示这些数据,就使用不保存字段。当结合使用 Zend_Search_Lucene 索引和关系数据库时最适合使用不保存字段。你通过不保存字段索引大量数据用于搜索,并通过作为标识的特定字段从你的关系数据库中获取它们。

表 27.1. Zend_Search_Lucene_Field 类型

字段类型保存索引记号化二进制
Keyword
UnIndexed
Binary
Text
UnStored

   
    //创建索引
    // Setting the second argument to TRUE creates a new index
    $index = new Zend_Search_Lucene('/data/my-index', true);
    $doc = new Zend_Search_Lucene_Document();
    // Store document URL to identify it in search result.
    $doc->addField(Zend_Search_Lucene_Field::Text('url', $docUrl));
    // Index document content
    $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $docContent));
    // Add document to the index.
    $index->addDocument($doc);
    // Write changes to the index.
    $index->commit();
   
    //更新索引
    // Open existing index
    $index = new Zend_Search_Lucene('/data/my-index');
    $doc = new Zend_Search_Lucene_Document();
    // Store document URL to identify it in search result.
    $doc->addField(Zend_Search_Lucene_Field::Text('url', $docUrl));
    // Index document content
    $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $docContent));
    // Add document to the index.
    $index->addDocument($doc);
    // Write changes to
    $index->commit();
   
    //搜索索引
    $index = new Zend_Search_Lucene('/data/my_index');
    $index->find($query);
   
    //搜索结果是一个 Zend_Search_Lucene_Search_QueryHit 对象数组
    $index = new Zend_Search_Lucene('/data/my_index');
    $hits = $index->find($query);
    foreach ($hits as $hit) {  
     echo $hit->id;   
     echo $hit->score;   
     echo $hit->title;   
     echo $hit->author;
    }
   
    //原始的 Zend_Search_Lucene_Document 对象可以从 Zend_Search_Lucene_Search_QueryHit 获得
    $index = new Zend_Search_Lucene('/data/my_index');
    $hits = $index->find($query);
    foreach ($hits as $hit) {  
     // return Zend_Search_Lucene_Document object for this hit   
     echo $document = $hit->getDocument();   
     // return a Zend_Search_Lucene_Field object   
     // from the Zend_Search_Lucene_Document   
     echo $document->getField('title');   
     // return the string value of the Zend_Search_Lucene_Field object   
     echo $document->getFieldValue('title');   
     // same as getFieldValue()   
     echo $document->title;
    }
   
    //Zend_Search_Lucene 使用和 Java Lucene 一样的评分算法。搜索结果是按照分值进行排序的。分值越大,相应的搜索结果点击排在排位越靠前。
    $hits = $index->find($query);
    foreach ($hits as $hit) {   
     echo $hit->id;   
     echo $hit->score;
    }
   
    //单项查询
    $hits = $index->find('word1');//查询字串
    //通过 API 创建查询:
    $term = new Zend_Search_Lucene_Index_Term('word1');   
    $query = new Zend_Search_Lucene_Search_Query_Term($term);   
    $hits = $index->find($query);
   
    //多项查询
    $hits = $index->find('+word1 author:word2 -word3');//查询字串
    //通过 API 创建查询:
    $query = new Zend_Search_Lucene_Search_Query_MultiTerm();   
    $query->addTerm(new Zend_Search_Lucene_Index_Term('word1'), true);   
    $query->addTerm(new Zend_Search_Lucene_Index_Term('word2'), null);   
    $query->addTerm(new Zend_Search_Lucene_Index_Term('word3'), false);   
    $hits = $index->find($query);
   
    //短语查询
    $query1 = new Zend_Search_Lucene_Search_Query_Phrase();
    // Add 'word1' at 0 relative position.
    $query1->addTerm(new Zend_Search_Lucene_Index_Term('word1'));
    // Add 'word2' at 1 relative position.
    $query1->addTerm(new Zend_Search_Lucene_Index_Term('word2'));
    // Add 'word3' at 3 relative position.
    $query1->addTerm(new Zend_Search_Lucene_Index_Term('word3'), 3);
   
    Zend_Search_Lucene_Search_Query_Phrase([array $terms[, array $offsets[, string $field]]]);
    //查询 zend framework
    $query = new Zend_Search_Lucene_Search_Query_Phrase(array('zend', 'framework'));
    //将会搜索短语“zend ????? download”并匹配'zend platform download'、 'zend studio download'
    $query = new Zend_Search_Lucene_Search_Query_Phrase(array('zend', 'download'), array(0, 2));
    //在title字段中搜索zend framework
    $query = new Zend_Search_Lucene_Search_Query_Phrase(array('zend', 'framework'), null, 'title');
   
    Zend_Search_Lucene_Search_Query_Phrase::addTerm(Zend_Search_Lucene_Index_Term $term[, integer $position]);
    //查询zend framework
    $query = new Zend_Search_Lucene_Search_Query_Phrase();
    $query->addTerm(new Zend_Search_Lucene_Index_Term('zend'));
    $query->addTerm(new Zend_Search_Lucene_Index_Term('framework'));
    将会搜索短语“zend ????? download”并匹配'zend platform download'、 'zend studio download'
    $query = new Zend_Search_Lucene_Search_Query_Phrase();
    $query->addTerm(new Zend_Search_Lucene_Index_Term('zend'), 0);
    $query->addTerm(new Zend_Search_Lucene_Index_Term('framework'), 2);
    //在title字段中搜索zend framework
    $query = new Zend_Search_Lucene_Search_Query_Phrase();
    $query->addTerm(new Zend_Search_Lucene_Index_Term('zend', 'title'));
    $query->addTerm(new Zend_Search_Lucene_Index_Term('framework', 'title'));
    //查询模糊因子
    // Query without a gap.
    $query = new Zend_Search_Lucene_Search_Query_Phrase(array('word1', 'word2'));
    // Search for 'word1 word2', 'word1 ... word2'
    $query->setSlop(1);
    $hits1 = $index->find($query);
    // Search for 'word1 word2', 'word1 ... word2','word1 ... ... word2', 'word2 word1'
    $query->setSlop(2);
    $hits2 = $index->find($query);
   
    //字符集
    $doc = new Zend_Search_Lucene_Document();
    $docText = iconv('ISO-8859-1', 'ASCII//TRANSLIT', $docText);
    $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $docText));
    $query = iconv('', 'ASCII//TRANSLIT', $query);$hits = $index->find($query);
简单例子
    $index = new Zend_Search_Lucene('my-index', true);
   
    $doc = new Zend_Search_Lucene_Document();
   
    // Store document URL to identify it in search result.
    $doc->addField(Zend_Search_Lucene_Field::Text('url', 'http://www.eyuwo.com'));
   
    // Index document content
    $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', 'this is just a test of Zend_Search_lucene'));
   
    // Add document to the index.
    $index->addDocument($doc);
   
    // Write changes to the index.
    $index->commit();
    //$query='test';//查询的关键字
    $term = new Zend_Search_Lucene_Index_Term('test');   
    $query = new Zend_Search_Lucene_Search_Query_Term($term);   
    $hits = $index->find($query);//得到查询的结果
    if(!empty($hits)){
     foreach ($hits as $hit) {
        echo 'Score:'.$hit->score.'<br>';
        echo 'Url:'.$hit->url.'<br><hr>';
        echo 'ID:'.$hit->id.'<br><hr>';
     }
    }else {
     error_message("没有相关的信息!");
    }

转载于:https://www.cnblogs.com/wangbin/archive/2010/09/20/1831717.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
您好!针对您的问题,可以使用 Lucene 的分词器和分析器来进行去除停用词和词干提取。以下是一个简单的示例: ```java import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.StopAnalyzer; import org.apache.lucene.analysis.en.PorterStemFilter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.util.Version; import java.io.IOException; import java.io.StringReader; public class LuceneStemmingExample { public static void main(String[] args) throws IOException { String text = "Lucene is a Java full-text search engine. " + "Lucene is not a complete application, but rather a code library " + "and lucene is used to add search capability to applications."; Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_7_7_0); analyzer = new StopAnalyzer(Version.LUCENE_7_7_0); // 添加停用词 analyzer = new PorterStemFilter(analyzer); // 添加词干提取器 StringReader reader = new StringReader(text); TokenStream tokenStream = analyzer.tokenStream("", reader); CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); tokenStream.reset(); while (tokenStream.incrementToken()) { System.out.println(charTermAttribute.toString()); } tokenStream.end(); tokenStream.close(); } } ``` 在上面的示例中,我们使用了 Lucene 的 `StandardAnalyzer` 分析器来对文本进行分词,然后使用了 `StopAnalyzer` 停用词分析器来去除停用词,最后使用了 `PorterStemFilter` 词干提取器来对词进行词干提取。最终输出的结果如下: ``` lucene java full text search engine lucene complete application rather code library lucene used add search capability applications ``` 可以看到,输出的词已经被去除了停用词并被进行了词干提取。希望这个示例对您有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值