利用Hadoop streaming 进行词频统计

最新推荐文章于 2020-12-22 20:20:18 发布

weixin_30362801

最新推荐文章于 2020-12-22 20:20:18 发布

阅读量194

点赞数

原文链接：http://www.cnblogs.com/BigWatermelon/p/10844953.html

版权

创建一个文件夹

bin/hdfs dfs -mkdir /input

将要统计的文件上传到hadoop
bin/hadoop fs -put /test.txt /input

利用hadoop进行词频统计
bin/hadoop jar share/hadoop/tools/lib/Hadoop-streaming-2-9-2.jar –input /test.txt –output /user/results.txt –mapper /bin/cat -reducer /usr/bin/wc

删除results.txt文件

./bin/hadoop dfs -rmr /user/results.txt

查看results.txt文件目录

bin/hadoop dfs -ls /user/results.txt

查看统计结果

bin/hadoop dfs -ls /user/results.txt/part-0000

转载于:https://www.cnblogs.com/BigWatermelon/p/10844953.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30362801

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
利用Hadoop streaming 进行词频统计

创建一个文件夹bin/hdfs dfs -mkdir /input将要统计的文件上传到hadoopbin/hadoop fs -put /test.txt /input利用hadoop进行词频统计bin/hadoop jar share/hadoop/tools/lib/Hadoop-streaming-2-9-2.jar –input /test.txt –output /us...
复制链接

扫一扫