Get Term frequency in Lucene using Zend Framework

I use Zend Framework to provide a PHP version Lucene. Currently, Zned should be the best PHP wrapper for Java Lucene.

Create a Zend_Search_Lucene , using the method termFreqs to get the term frequecy for a specific term. There is no implementation in Zend for Java method like termDocs , which provides a Term => <docNum, freq>*.

So now I have to iterate all terms for figure out the frequency. But better for nothing :)

Sample code to get term frequency:

<?php /** Zend_Search_Lucene_Index_TermsStream_Interface */ require_once 'Zend/Search/Lucene.php'; $index_dir = 'lucene/index'; $index = new Zend_Search_Lucene($index_dir); $term_title = new Zend_Search_Lucene_Index_Term('database', 'title'); $term_contents = new Zend_Search_Lucene_Index_Term('database', 'contents'); echo "total number documents in this index: ".$index->numDocs()."<hr>"; //print_r($index->getDirectory()); exit; // get all field names //print_r($index->getFieldNames()); exit; //Returns true if index contain documents with specified term. //$index->hasTerm($term); //get all terms //print_r($index->terms()); tf_df($term_title); tf_df($term_contents); function tf_df($term){ global $index; echo "search for \"".$term->text."\" in title<br>"; echo "Term Fequency: "; print_r($index->termFreqs($term)); echo "<br>Document Fequency: "; echo $index->docFreq($term); echo "<br>"; docs($term); echo "<hr>"; } function docs($term){ global $index; // Retrieving documents with termDocs() method $docIds = $index->termDocs($term); foreach ($docIds as $id) { $doc = $index->getDocument($id); $title = $doc->title; $contents = $doc->contents; echo $title."<br>"; } }

Sample code to crate index:

<?php require_once 'prss_lib.php'; require_once 'Zend/Search/Lucene.php'; require_once "lib/stats/phpstats.inc"; //session_start(); if(isset($_GET['search']))$_SESSION['search'] = $search = $_GET['search']; else $search = ''; if(isset($_GET['category'])) $_SESSION['categoty'] = $category = $_GET['category']; else $category = 'acad'; $start = date("r"); ?> <div align="center"><img src="images/apache_simple.png" mce_src="images/apache_simple.png" alt="Apache" /> <img src="images/lucene_green.gif" mce_src="images/lucene_green.gif" alt="Apache Lucene" /> <form action='search-index.php'> <h4>Search: <input type="text" size="30" name="search" value="<?php if(isset($_SESSION['search'])) echo $_SESSION['search']; ?>"> <input type="submit" value="Search"> <INPUT TYPE=RADIO NAME="category" VALUE="acad" <?php if(!isset($_SESSION['categoty']) || $_SESSION['categoty'] === 'acad') echo 'checked';?> >Academic <INPUT TYPE=RADIO NAME="category" VALUE="news" <?php if(isset($_SESSION['categoty']) && $_SESSION['categoty'] === 'news') echo 'checked';?> >News <INPUT TYPE=RADIO NAME="category" VALUE="both" <?php if(isset($_SESSION['categoty']) && $_SESSION['categoty'] === 'both') echo 'checked';?> >Both </h4> </form> </div> <?php switch ($category){ case 'acad': $index_dir = 'lucene/acad'; break; case 'news': $index_dir = 'lucene/news'; break; case 'both': $index_dir = 'lucene/index';break; default: echo "Wrong parameter!"; exit; } if($search){ //echo "<hr/>"; $index = new Zend_Search_Lucene($index_dir); $total = $index->count(); $hits = $index->find(strtolower($search)); $hit_count = count($hits); echo "<h4> Index contains {$total} documents. "; //echo "<br>"; $end = date("r"); echo "Search for <font color = red>\"$search\" </font> returned " .$hit_count. " hits. "; $diff = get_time_difference($start, $end); echo "(".($diff['minutes']*60+$diff['seconds'])." seconds)</h4>"; //echo "(".($end-$start)." seconds)</h4>"; //$html = new Zend_Search_Lucene_Document_Html(); // $weight = 0; foreach ($hits as $hit) { echo "<ul>"; $arr = $hit->getDocument()->getFieldNames(); $field_link = $arr[0]; $field_title = $arr[1]; $field_content = $arr[2]; $link_text = trim(substr($hit->$field_link,0)); $title_text = trim(substr($hit->$field_title,0)); $content_text = trim(substr($hit->$field_content,0)); echo "<li><a href = ".$link_text.">".$title_text."</a> ( ID: $hit->id | Score:".sprintf('%.2f', $hit->score)." )<br>"; echo "$content_text</li>"; echo "</ul>"; } ?> <div align="center"> <form> <h4>Search: <input type="text" size="30" name="search" value="<?php if(isset($search)) echo $search; ?>"> <input type="submit" value="Search"> <INPUT TYPE=RADIO NAME="category" VALUE="acad" <?php if(!isset($_SESSION['categoty']) || $_SESSION['categoty'] === 'acad') echo 'checked';?> >Academic <INPUT TYPE=RADIO NAME="category" VALUE="news" <?php if(isset($_SESSION['categoty']) && $_SESSION['categoty'] === 'news') echo 'checked';?> >News <INPUT TYPE=RADIO NAME="category" VALUE="both" <?php if(isset($_SESSION['categoty']) && $_SESSION['categoty'] === 'both') echo 'checked';?> >Both </h4> </form> <a href="index.php" mce_href="index.php">Back to Homepage</a> </div> <?php }?>

Sample code to search index

<?php require_once 'prss_lib.php'; require_once 'Zend/Search/Lucene.php'; require_once "lib/stats/phpstats.inc"; //session_start(); if(isset($_GET['search']))$_SESSION['search'] = $search = $_GET['search']; else $search = ''; if(isset($_GET['category'])) $_SESSION['categoty'] = $category = $_GET['category']; else $category = 'acad'; $start = date("r"); ?> <div align="center"><img src="images/apache_simple.png" mce_src="images/apache_simple.png" alt="Apache" /> <img src="images/lucene_green.gif" mce_src="images/lucene_green.gif" alt="Apache Lucene" /> <form action='search-index.php'> <h4>Search: <input type="text" size="30" name="search" value="<?php if(isset($_SESSION['search'])) echo $_SESSION['search']; ?>"> <input type="submit" value="Search"> <INPUT TYPE=RADIO NAME="category" VALUE="acad" <?php if(!isset($_SESSION['categoty']) || $_SESSION['categoty'] === 'acad') echo 'checked';?> >Academic <INPUT TYPE=RADIO NAME="category" VALUE="news" <?php if(isset($_SESSION['categoty']) && $_SESSION['categoty'] === 'news') echo 'checked';?> >News <INPUT TYPE=RADIO NAME="category" VALUE="both" <?php if(isset($_SESSION['categoty']) && $_SESSION['categoty'] === 'both') echo 'checked';?> >Both </h4> </form> </div> <?php switch ($category){ case 'acad': $index_dir = 'lucene/acad'; break; case 'news': $index_dir = 'lucene/news'; break; case 'both': $index_dir = 'lucene/index';break; default: echo "Wrong parameter!"; exit; } if($search){ //echo "<hr/>"; $index = new Zend_Search_Lucene($index_dir); $total = $index->count(); $hits = $index->find(strtolower($search)); $hit_count = count($hits); echo "<h4> Index contains {$total} documents. "; //echo "<br>"; $end = date("r"); echo "Search for <font color = red>\"$search\" </font> returned " .$hit_count. " hits. "; $diff = get_time_difference($start, $end); echo "(".($diff['minutes']*60+$diff['seconds'])." seconds)</h4>"; //echo "(".($end-$start)." seconds)</h4>"; //$html = new Zend_Search_Lucene_Document_Html(); // $weight = 0; foreach ($hits as $hit) { echo "<ul>"; $arr = $hit->getDocument()->getFieldNames(); $field_link = $arr[0]; $field_title = $arr[1]; $field_content = $arr[2]; $link_text = trim(substr($hit->$field_link,0)); $title_text = trim(substr($hit->$field_title,0)); $content_text = trim(substr($hit->$field_content,0)); echo "<li><a href = ".$link_text.">".$title_text."</a> ( ID: $hit->id | Score:".sprintf('%.2f', $hit->score)." )<br>"; echo "$content_text</li>"; echo "</ul>"; } ?> <div align="center"> <form> <h4>Search: <input type="text" size="30" name="search" value="<?php if(isset($search)) echo $search; ?>"> <input type="submit" value="Search"> <INPUT TYPE=RADIO NAME="category" VALUE="acad" <?php if(!isset($_SESSION['categoty']) || $_SESSION['categoty'] === 'acad') echo 'checked';?> >Academic <INPUT TYPE=RADIO NAME="category" VALUE="news" <?php if(isset($_SESSION['categoty']) && $_SESSION['categoty'] === 'news') echo 'checked';?> >News <INPUT TYPE=RADIO NAME="category" VALUE="both" <?php if(isset($_SESSION['categoty']) && $_SESSION['categoty'] === 'both') echo 'checked';?> >Both </h4> </form> <a href="index.php" mce_href="index.php">Back to Homepage</a> </div> <?php }?>

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值