每天一剂Rails良药之acts_as_ferret

[url=http://ferret.davebalmain.com/trac/]Ferret[/url]是Ruby的文本搜索引擎,它基于[url=http://lucene.apache.org/]Apache Lucene[/url]

安装Ferret非常简单:
[code]
gem install ferret
[/code]

Ferret是一堆C代码的Ruby代码封装,Ferret是针对Ruby的而不是RoR的
而[url=http://projects.jkraemer.net/acts_as_ferret/wiki]Acts As Ferret[/url]则是针对RoR的

我们有两种方式安装Acts As Ferret:
1,以gem方式安装
[code]
gem install acts_as_ferret
[/code]
然后在environment.rb里添加
[code]
require 'acts_as_ferret'
[/code]
2,以plugin方式安装
[code]
ruby script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret
[/code]
(虽然推荐以插件方式安装acts_as_ferret,但貌似这样安装不起作用,svn地址改为acts_as_ferret也不行,再议)

先看一个简单的例子,我们先给需要做index的model加上如下代码:
[code]
class Member < ActiveRecord::Base
acts_as_ferret :field => [:first_name, :last_name]
end
[/code]
然后假如我们做如下search;
[code]
members = Member.find_id_by_contents("Gregg")
[/code]
这将发生如下事情:
1,在Rails程序目录下创建/index/development/member子目录,所有的index文件将在这里创建
2,Members的first/last name将加入到该index,以后每次add/update/delete一个member都会[b]增量更新index[/b]
如果你需要[b]重新生成index[/b],只需删除相应的文件目录并重启服务器,下次search时就会重新生成index
3,ActsAsFerret将会在我们的index上调用Ferret的search_each方法
4,我们将得到10条如下形式的结果:
[code]
members = [
{:model => "Member", :id => "4", :score => "1.0"},
{:model => "Member", :id => "4", :score => "0.93211"},
{:model => "Member", :id => "4", :score => "0.32212"},
...
]
[/code]
我们得到每个member的id和search score
即使有超过10条的结果,我们默认将只能得到10条返回的结果

Options
1,offset,默认为0
2,limit,默认为10,:all将返回所有结果
3,sort

如果我们需要查询结果的数量
[code]
total_results = Member.total_hits("Gregg")
[/code]
或者这样做
[code]
results = []
total_results = Member.find_id_by_contents("Gregg") { |result|
results.push result
}
[/code]

如果我们需要查询结果为model
[code]
results = []
total_results = Member.find_id_by_contents("Gregg") { |result|
results.push Member.find(result[:id])
}
[/code]
我们有更好的办法,find_by_contents
[code]
@results = Member.find_by_contents("Gregg")
[/code]
这将做如下事情:
1,首先使用find_id_by_contents得到ids
2,使用返回的ids来查询Model
3,返回ActsAsFerret::SearchResults对象Array(可以认为是ActiveRecord对象,但是多一些额外的特性)
我们可以做如下的事情
[code]
members = Member.find_by_contents("Gregg")

# It gives us total hits!
puts "Total hits = #{members.total_hits}"
for member in members
puts "#{member.first_name} #{member.last_name}"

# And the search Score!
puts "Search Score = #{member.ferret_score}"
end
[/code]
注意total_hits和ferret_score就是额外的特性

对搜索结果分页
1,对你的model加上如下方法
[code]
def self.full_text_search(q, options = {})
return nil if q.nil? or q==""
default_options = {:limit => 10, :page => 1}
options = default_options.merge options

# get the offset based on what page we're on
options[:offset] = options[:limit] * (options.delete(:page)to_i-1)

# now do the query with our options
results = Member.find_by_contents(q, options)
return [results.total_hits, results]
end
[/code]
修改application.rb
[code]
def pages_for(size, options = {})
default_options = {:per_page => 10}
options = default_options.merge options
pages = Paginator.new self, size, options[:per_page], (params[:page]||1)
return pages
end
[/code]
然后修改controller
[code]
def search
@query = params[:query]
@total, @members = Member.full_text_search(@query, :page => (params[:page]||1))
@pages = pages_for(@total)
end
[/code]
在view中的代码
[code]
<%= link_to 'Previous page', { :page => @pages.current.previous, :query => @query } if @pages.current.previous %>
<%= pagination_links(@pages, :params => { :query => @query }) %>
<%= link_to 'Next page', { :page => @pages.current.next, :query => @query } if @pages.current.next %>
[/code]

查询字符串
1,搜索"Gregg Pollack"将返回在ANY域中以ANY顺序排列的包含"Gregg"[b]和[/b]"Pollack"的结果
2,搜索"Gregg OR Pollack"将返回包含"Gregg"[b]或[/b]"Pollack"的结果
3,搜索"Gregg~"模糊查询,将返回包含"Gregg"的结果
4,搜索"first_name:Gregg"将返回first name为"Gregg"的结果
5,搜索"+first_name:Gregg -last_name:Jones"将返回first name为"Gregg"并且last name不是"Jones"的结果
更复杂的查询条件参考[url=http://lucene.apache.org/java/docs/queryparsersyntax.html]Apache Lucene Parser Syntax[/url]

搜索多个表
[code]
class Book < ActiveRecord::Base
acts_as_ferret :fields => [:title, :author_name]

def author_name
return "#{self.author.first_name} #{self.author.last_name}"
end
end
[/code]
也就是说我们可以搜索model方法返回的任何东西,甚至是tags
[code]
class Book < ActiveRecord::Base
acts_as_taggable
acts_as_ferret :fields => [:title, :tags_with_spaces]

def tags_with_spaces
return self.tag_names.join(" ")
end
end
[/code]

排序
假如我们需要对title搜索并对title排序,但是排序的field要求不能被index和search,我们可以这样做
[code]
acts_as_ferret :fields => {
:title => {},
:tags_with_spaces => {},
:title_for_sort => {:index => :untokenized}
}

def title_for_sort
return self.title
end
[/code]
然后我们可以在controller里按title排序搜索了
[code]
s = Ferret::Search::SortField.new(:title_for_sort, :reverse => false)
@total, @members = Member.full_text_search(@query, {:page => (params[:page]||1), :sort => s}
[/code]

存储数据
默认情况下acts_as_ferret只对数据做index而不存储数据,如果数据很小,我们可以这样做来在index里存储数据来加快search数据
[code]
acts_as_ferret :field => {
:title => {:store => :yes},
:author_name => {:store => :yes}
}
[/code]
我们给model添加如下search方法
[code]
def self.find_storage_by_contents(query, options = {})
# Get the index that acts_as_ferret created for us
index = self.ferret_index
results = []

@ search_each is the core search function from Ferret, which Acts_as_ferret hides
total_hits = index.search_each(query, options) do |doc, score|
result = {}

# Sotre each field in a hash which we can reference in our views
result[:name] = index[doc][:name]
result[:author_name] = idnex[doc][:author_name]

# We can even put the score in the hash, nice!
result[:score] = score

results.push result
end
return block_given? ? total_hits : [ttal_hit, results]
end
[/code]
这样我们根本就不用接触数据库就可以查询数据,view代码
[code]
<% @results.each_with_index do |result, index| %>
<%= index %>. <%= result[:name] %> by
<%= result[:author_name] %> <br/>
Score: <%= result[:score] %>
<% end %>
[/code]

Highlighting
我们来看看怎样用ferret做出Google搜索结果bold关键字的效果
前提是上面所说存储搜索的fields
我们修改上面的search方法
[code]
def find_storage_by_contents(query, options = {})
index = self.ferret_index # Get the index that acts_as_ferret created for us
results = []

# search_each is the core search function from Ferret, which Acts_as_ferret hides
total_hits = index.search_each(query, options) do |doc, score|
result = {}

# Store each field in a hash which we can reference in our views
result[:name] = index.highlight(query, doc,
:field => :name,
:pre_tag => "<strong>",
:post_tag => "</strong>",
:num_excerpts => 1)
result[:author_name] = index.highlight(query, doc,
:field => :author_name,
:pre_tag => "<strong>",
:post_tag => "</strong>",
:num_excerpts => 1)
result[:score] = score # We can even put the score in the hash, nice!

results.push result
end
return block_given? ? total_hits : [total_hits, results]
end
[/code]

使用Boost
如果我们希望对title的搜索结果的score要比author的搜索结果的score稍高,我们可以使用boost参数
[code]
acts_as_ferret :fields => {
:title => {:boost => 2}
:author => {:boost => 0}
}
[/code]
但是boost参数只能提高score,而不能分开title和author的搜索结果

Production环境下使用
由于性能问题,Production环境下最好以[url=http://projects.jkraemer.net/acts_as_ferret/wiki/DrbServer]DRb Server[/url]的形式运行

注:本文参考[url=http://www.railsenvy.com/2007/2/19/acts-as-ferret-tutorial]Acts_As_Ferret Tutorial [/url]
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值