1.Kafka的测试
确保Kafka的正常工作,为后续工作打好基础
(1)启动zookeeper
(2)启动kafka
(3)创建topic
(4)分别启动生产者和消费者,测试本topic能否正常生产和消费消息
2.spark streaming应用程序开发
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
object KafkaReceiverWordCount {
def main(args: Array[String]): Unit = {
if(args.length != 4){
System.err.println("Usage:KafkaReceiveWordCount <zkQuorum> <group> <topics> <numThreads>" )
}
val Array(zkQuorum,group,topics,numThreads) = args
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("KafkaReceiverWordCount")
val ssc = new StreamingContext(sparkConf,Seconds(5))
val topicMap = topics.split(",").map((_,numThreads.toInt)).toMap
val messages = KafkaUtils.createStream(ssc,zkQuorum,group,topicMap)
messages.map(_._2).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
}
3.本地测试
输入相应参数,运行应用程序,在服务器上的Kafka producer上输入测试文本a a b b c c c
在控制台上得到相应结果
4.服务器测试
将写好的应用程序用maven打包,将jar包上穿到服务器端,启动spark-submit
在Kafka producer上输入测试文本a a b b c c c
在服务器上得到相应结果