Spark学习笔记(2)—— Spark 和HDFS版 wordcount

1 启动 HDFS

[hadoop@node1 ~]$ start-dfs.sh

Starting namenodes on [node1]
node1: starting namenode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-namenode-node1.out
node4: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-datanode-node4.out
node3: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-datanode-node3.out
node2: starting datanode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-datanode-node2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/apps/hadoop-2.7.6/logs/hadoop-hadoop-secondarynamenode-node1.out
[hadoop@node1 ~]$ 

访问HDFS管理网站http://node1:5007

1.1 上传文件到 HDFS

[hadoop@node1 ~]$ ll
total 8
drwxrwxr-x. 6 hadoop hadoop  95 Oct 16 10:18 apps
-rw-rw-r--. 1 hadoop hadoop 707 Oct 16 12:15 derby.log
drwxrwxr-x. 4 hadoop hadoop  28 Sep 14 19:02 hbase
drwxrwxr-x. 4 hadoop hadoop  32 Sep 14 14:44 hdfsdir
drwxrwxr-x. 5 hadoop hadoop 133 Oct 16 12:15 metastore_db
-rw-r--r--. 1 hadoop hadoop  48 Oct 12 17:12 words1.txt
drwxrwxr-x. 4 hadoop hadoop  29 Sep 16 22:58 zookeeper
[hadoop@node1 ~]$ more words1.txt 
hello tom
hello jerry
hello henny
hello tom

[hadoop@node1 ~]$ hdfs dfs -mkdir -p /wc
[hadoop@node1 ~]$ hdfs dfs -put words1.txt /wc/1.log
[hadoop@node1 ~]$ hdfs dfs -put words1.txt /wc/2.log
[hadoop@node1 ~]$ hdfs dfs -put words1.txt /wc/3.log


2 启动 Spark集群

[hadoop@node1 ~]$ /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
node3: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out
node2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out

2.1 启动 Spark Shell

[hadoop@node1 ~]$ /home/hadoop/apps/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master spark://node1:7077 --executor-memory 521m --total-executor-cores 2

http://node1:8080/
在这里插入图片描述

2.2 统计单词

scala> sc.textFile("hdfs://node1:9000/wc").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).collect
res1: Array[(String, Int)] = Array((hello,12), (tom,6), (henny,3), (jerry,3))   

scala> 

2.3 将统计结果保存到 HDFS

scala> sc.textFile("hdfs://node1:9000/wc").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).saveAsTextFile("hdfs://node1:9000/wc_out")

[hadoop@node1 ~]$ hdfs dfs -ls /
Found 4 items
drwxrwxrwx   - hadoop supergroup          0 2018-09-19 15:47 /ceshi
drwxrwxrwx   - hadoop supergroup          0 2018-09-19 15:45 /hbase
drwxr-xr-x   - hadoop supergroup          0 2018-10-16 15:00 /wc
drwxr-xr-x   - hadoop supergroup          0 2018-10-16 15:40 /wc_out
[hadoop@node1 ~]$ hdfs dfs -ls /wc_out
Found 4 items
-rw-r--r--   3 hadoop supergroup          0 2018-10-16 15:40 /wc_out/_SUCCESS
-rw-r--r--   3 hadoop supergroup         11 2018-10-16 15:40 /wc_out/part-00000
-rw-r--r--   3 hadoop supergroup          8 2018-10-16 15:40 /wc_out/part-00001
-rw-r--r--   3 hadoop supergroup         20 2018-10-16 15:40 /wc_out/part-00002
[hadoop@node1 ~]$ 

[hadoop@node1 ~]$ hdfs dfs -cat /wc_out/part-*
(hello,12)
(tom,6)
(henny,3)
(jerry,3)
[hadoop@node1 ~]$ 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值