第50课：HadoopMapReduce倒排索引解析与实战

最新推荐文章于 2020-05-14 22:10:01 发布

段智华

最新推荐文章于 2020-05-14 22:10:01 发布

阅读量915

点赞数

分类专栏：大数据IMF传奇行动－Spark Hadoop

热烈祝贺Gavin大咖2024年北京航空航天大学两本新书《Transformer&ChatGPT解密：原理、源码及案例》、《Transformer& Rasa 解密: 原理、源码及案例》出版发行

本文链接：https://blog.csdn.net/duan_zhihua/article/details/50726368

版权

大数据IMF传奇行动－Spark Hadoop 专栏收录该内容

109 篇文章 200 订阅 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

本文介绍了如何使用Hadoop MapReduce实现倒排索引，通过示例展示了数据处理过程，包括Map、Combiner和Reduce阶段，并提供了运行结果和源代码。通过这个例子，读者可以理解倒排索引的工作原理及其在大数据处理中的应用。

摘要由CSDN通过智能技术生成

1数据文件

[root@master invertedindex]#cat file1.txt
Spark is so powerful
[root@master invertedindex]#cat file2.txt
Spark is the most exciting thing happening in big data today
[root@master invertedindex]#cat file3.txt
Hello Spark Hello again Spark
[root@master invertedindex]#

2、运行结果

[root@master invertedindex]# hadoop dfs -cat /library/outputinvertedindex4/part-r-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

16/02/23 08:51:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hello file3.txt:2
Spark file3.txt:2;file1.txt:1;