1、spark streaming应用程序与客户端程序的连接方式如下
2、在myeclipse中创建maven项目DStreamTest,创建WordStream包,在该包下创建WordCount.scala类(具体方法参考本人之前的博客:https://blog.csdn.net/weixin_40393128/article/details/102669873,MyEclipse下利用Maven打包并运行Spark的Scala程序)
3、pom.xml代码
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com2</groupId>
<artifactId>DStreamTest</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency> <!-- Spark -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.6.2</version>
<scope>provided</scope>
</dependency>
<dependency> <!-- Spark Streaming -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.2</version>
<scope>provided</scope>
</dependency>
<dependency><!-- Log -->
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.12</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- mixed scala/java compile -->
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<!-- for fatjar -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>assemble-all</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<mainClass>WordStream.WordCount</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>alimaven</id>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
</repository>
</repositories>
</project>
4、WordCount.scala代码
package WordStream
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}
import org.apache.spark.{SparkConf, SparkContext}
// Create a local StreamingContext with two working thread and batch interval of 1 second.
// The master requires 2 cores to prevent from a starvation scenario.
object WordCount {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("SocketWordFreq")
.setMaster("local[2]")
val ssc = new StreamingContext(conf, Seconds(1))
// Create a DStream that will connect to hostname:port, like localhost:8999
val lines = ssc.socketTextStream("localhost", 8999)
// Split each line into words
val words = lines.flatMap(_.split(" "))
// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print()
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
}
}
5、在WordStream包中再创建一个新的Java类,ClientApp.java,代码如下
package WordStream;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.PrintStream;
import java.io.PrintWriter;
import java.net.ServerSocket;
import java.net.Socket;
public class ClientApp
{
public static void main(String[] args)
{
try
{
System.out.println("Defining new Socket");
ServerSocket soc = new ServerSocket(8999);
System.out.println("Waiting for Incoming Connection");
Socket clientSocket = soc.accept();
System.out.println("Connection Received");
OutputStream outputStream = clientSocket.getOutputStream();
for (;;)
{
PrintWriter out = new PrintWriter(outputStream, true);
BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Waiting for user to input some words");
String words = read.readLine();
System.out.println("words are received and now writing them to Socket");
out.println(words);
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
6、maven install将整个项目打包
7、将myeclipse的workspace中找到DStreamTest项目下的target目录中的DStreamTest-0.0.1-SNAPSHOT-jar-with-dependencies.jar文件拷贝到文件夹“下载”中
8、下面开始在两个终端中分别执行客户端程序和spark streaming程序
1)在下载目录下打开一个终端,输入如下命令运行ClientApp.class文件
java -classpath DStreamTest-0.0.1-SNAPSHOT-jar-with-dependencies.jar WordStream.ClientApp
2)在任意目录下打开一个终端,输入如下命令执行streaming程序
/usr/local/spark/bin/spark-submit --class WordStream.WordCount /home/hadoop/下载/DStreamTest-0.0.1-SNAPSHOT-jar-with-dependencies.jar
这个时候就可以在该终端下看到1)中终端下输入的单词了
9、停止进程
在终端中输入命令
netstat -nultp
输入命令杀死Java进程,可以看见Java进程已经关闭,且8中的1)、2)两个终端中都提示已杀死
kill -9 7947
kill -9 7969
10、利用updateStateByKey方法不断累计单词数量
package WordStream
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}
import org.apache.spark.{SparkConf, SparkContext}
// Create a local StreamingContext with two working thread and batch interval of 1 second.
// The master requires 2 cores to prevent from a starvation scenario.
object WordCount {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("SocketWordFreq")
.setMaster("local[2]")
val ssc = new StreamingContext(conf, Seconds(1))
//to use the method updateStateByKey,you must set checkpoint
ssc.checkpoint("file:///home/hadoop/下载/checkpoint")
// Create a DStream that will connect to hostname:port, like localhost:8999
val lines = ssc.socketTextStream("localhost", 8999)
// Split each line into words
val words = lines.flatMap(_.split(" "))
// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.updateStateByKey(updateFunction)
// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print()
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
}
def updateFunction(newValues: Seq[Int], runningCount: Option[Int]): Option[Int] = {
val newCount =runningCount.getOrElse(0)+newValues.sum
Some(newCount)
}
}