请确保你已经配置好了hadoop,如果没有,点这里。
1.准备数据:
向一个txt文件中写入一段英文,并将其命名为a.txt,它的内容为:
There are moments in life when you miss someone so much that you just want to pick them from your dreams and hug them for real!
Dream what you want to dream;go where you want to go;be what you want to be,because you have only one life and one chance to do all the things you want to do.
2.启动Hadoop,在HDFS中新建文件夹:
start-all
hadoop fs -mkdir /input_dir
3.上传a.txt到该文件夹,并查看文件是否上传成功:
hadoop fs -put D:/Program_Hadoop/wordcount/a.txt /input_dir
hadoop fs -ls /input_dir/
也可以查看文件内容(倒数四行就是该a.txt的内容,前两句只不过提醒你不要用这种脚本执行hdfs命令)
hadoop dfs -cat /input_dir/a.txt
4.运行程序统计word出现的次数:
hadoop jar D:/Program_Hadoop/hadoop-2.10.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar wordcount /input_dir /output_dir
5.查看运行结果:
hadoop dfs -cat /output_dir/*
至此,统计已经结束了哦!
6.最后,关闭hadoop:
stop-all
关闭cmd。
三、总结与思考
额,我这边有一个疑问,就是如上面的输出结果所示,be,because为什么被合在一起算成1了。它的机制是以特殊字符做连接符吗?或许是这样,或许不是这样,谁知道呢?