nutch 分布式搜索-cluster-hdfs index

最新推荐文章于 2024-11-17 21:07:59 发布

leibnitz09

最新推荐文章于 2024-11-17 21:07:59 发布

阅读量83

点赞数

分类专栏： search java nutch 文章标签：大数据

nutch 同时被 3 个专栏收录

33 篇文章 0 订阅

订阅专栏

18 篇文章 0 订阅

订阅专栏

16 篇文章 0 订阅

订阅专栏

此过程也很简单，步骤是：

a.put the indexes to hdfs respectively;

b.let search server three xml files related with hdfs,core,mapred -site.xml be same with the hadoop-slave's respectively;

c.retrieve the path of index in hdfs,then use them start the search server one by each;

d.start web container

note:

有人说用了分布式搜索后，每次查询都生成一个mr，性能会很差...

我觉得这些人没弄懂hadoop的动作机制，真是的胡说八道。

client通过 rpc向search-servers請求，然后交由servers来做真正的搜索任务，当然还是用到lucene的功能来实现。而hadoop向lucene提供了透明的文件流存取，根本不会开mr来实现！

如果还不相信，可以只开启start-dfs.sh便得以验证。

Nutch search RPC 调用原理

1.client端先获取一个RPCSearchBean proxy，然后在调用search(Query)时，由先将query中的参数及名字等转换为RPC.Invocation，然后对它封送(serialized parameters),然后通过 socket传送到由search-server.txt定义好了第一台server（remote)中；

2.remote端启动一个驻守thread:DistributedSearch$Server，用于处理client的requests.过程是：

a。通过消费者模式产生listener,hander(s),responser.其中handler(s)负责将calls去调用本地的NutchBean相应方法，当然了。这个过程需要在bytestream中对参数deserialized 为Invocation,然后根据其中的class name,method name等参数进行local invoke。

3.responer将上述的結果反封送为byte stream并送给client，然后由proxy deserialized为真正方法的返回object。

4.根据 search-servers.txt重复其它的servers

5.对所有的結果进行整合。完毕

see

hdfs data flow-part reading

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。