一、目的:
使用wordcount官方自带案例,熟悉spark-submit和spark-shell两种提交spark应用程序方法。
二、操作目的
1.使用spark-submit提交
(1)启动hdfs
(2)spark根目录下执行
bin/spark-submit --master local[2] \
--class org.apache.spark.examples.streaming.NetworkWordCount \
--name NetworkWordCount \
/opt/modules/spark-2.1.0-bin-2.7.3/examples/jars/spark-examples_2.11-2.1.0.jar bigdata.ibeifeng.com 9999
2.使用spark-shell提交(当之前配置metastore的时候)
(1)启动hdfs
(2)启动shell
./spark-shell --master local[2]
(3)启动metastore
bin/hive --service metastore &
(4)写入代码
import org.apache.spark.streaming.{Seconds, StreamingContext}
val ssc = new StreamingContext(sc, Seconds(4))
val lines = ssc.socketTextStream("bigdata.ibeifeng.com", 9999)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
(5)开启nc
nc -lk 9999
3.使用spark-shell提交(当之前没有配置metastore的时候)
【阿里云】
(1)开启hdfs
(2)开启spark-shell的local模式
bin/spark-shell --master local[2]
(3)安装nc
yum install -y nc
打开9999端口
nc -lk 9999
(5)在shell中输入
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val ssc = new StreamingContext(sc,Seconds(10))
val dstream = ssc.socketTextStream("hadoop", 9999)
val resultDStream = dstream.flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _)
resultDStream.print()
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
(4)在nc端口输入测试内容