1 启动 HDFS
[hadoop@node1 ~]$ start-dfs.sh
Starting namenodes on [node1]
node1: starting namenode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-namenode-node1.out
node4: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-datanode-node4.out
node3: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-datanode-node3.out
node2: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-datanode-node2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-secondarynamenode-node1.out
[hadoop@node1 ~]$
访问HDFS管理网站http://node1:5007
1.1 上传文件到 HDFS
[hadoop@node1 ~]$ ll
total 8
drwxrwxr-x. 6 hadoop hadoop 95 Oct 16 10:18 apps
-rw-rw-r--. 1 hadoop hadoop 707 Oct 16 12:15 derby.log
drwxrwxr-x. 4 hadoop hadoop 28 Sep 14 19:02 hbase
drwxrwxr-x. 4 hadoop hadoop 32 Sep 14 14:44 hdfsdir
drwxrwxr-x. 5 hadoop hadoop 133 Oct 16 12:15 metastore_db
-rw-r--r--. 1 hadoop hadoop 48 Oct 12 17:12 words1.txt
drwxrwxr-x. 4 hadoop hadoop 29 Sep 16 22:58 zookeeper
[hadoop@node1 ~]$ more words1.txt
hello tom
hello jerry
hello henny
hello tom
[hadoop@node1 ~]$ hdfs dfs -mkdir -p /wc
[hadoop@node1 ~]$ hdfs dfs -put words1.txt /wc/1.log
[hadoop@node1 ~]$ hdfs dfs -put words1.txt /wc/2.log
[hadoop@node1 ~]$ hdfs dfs -put words1.txt /wc/3.log
2 启动 Spark集群
[hadoop@node1 ~]$ /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
node3: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out
node2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out
2.1 启动 Spark Shell
[hadoop@node1 ~]$ /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master spark://node1:7077 --executor-memory 521m --total-executor-cores 2
http://node1:8080/
2.2 统计单词
scala> sc.textFile("hdfs://node1:9000/wc").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).collect
res1: Array[(String, Int)] = Array((hello,12), (tom,6), (henny,3), (jerry,3))
scala>
2.3 将统计结果保存到 HDFS
scala> sc.textFile("hdfs://node1:9000/wc").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).saveAsTextFile("hdfs://node1:9000/wc_out")
[hadoop@node1 ~]$ hdfs dfs -ls /
Found 4 items
drwxrwxrwx - hadoop supergroup 0 2018-09-19 15:47 /ceshi
drwxrwxrwx - hadoop supergroup 0 2018-09-19 15:45 /hbase
drwxr-xr-x - hadoop supergroup 0 2018-10-16 15:00 /wc
drwxr-xr-x - hadoop supergroup 0 2018-10-16 15:40 /wc_out
[hadoop@node1 ~]$ hdfs dfs -ls /wc_out
Found 4 items
-rw-r--r-- 3 hadoop supergroup 0 2018-10-16 15:40 /wc_out/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 11 2018-10-16 15:40 /wc_out/part-00000
-rw-r--r-- 3 hadoop supergroup 8 2018-10-16 15:40 /wc_out/part-00001
-rw-r--r-- 3 hadoop supergroup 20 2018-10-16 15:40 /wc_out/part-00002
[hadoop@node1 ~]$
[hadoop@node1 ~]$ hdfs dfs -cat /wc_out/part-*
(hello,12)
(tom,6)
(henny,3)
(jerry,3)
[hadoop@node1 ~]$