kafka,flink,阿里云平台数据准备
springboot整合kafka
flink官网
阿里云AMQP接入说明
springboot整合kafka
flink for Kafka
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>1.9.2</version>
<scope>compile</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.11</artifactId>
<version>1.9.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.11 -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>2.4.0</version>
<scope>compile</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.9.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-java -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.9.2</version>
</dependency>
<!-- flink-streaming的jar包,2.11为scala版本号 -->
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.9.2</version>
<scope>provided</scope>
</dependency>
</dependencies>
代码
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.util.Collector;
import java.util.Properties;
public class FlinkForKafka {
public static void main(String[] args) throws Exception {
// 创建Flink执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// Kafka参数
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "IP:9092");
properties.setProperty("group.id", "default");//消费组的ID
String inputTopic = "test"; // topic
String outputTopic = "WordCount";
// Source
FlinkKafkaConsumer<String> consumer =
new FlinkKafkaConsumer<String>(inputTopic, new SimpleStringSchema(), properties);
DataStream<String> stream = env.addSource(consumer);
// Transformations
// 使用Flink算子对输入流的文本进行操作
// 按空格切词、计数、分区、设置时间窗口、聚合
DataStream<Tuple2<String, Integer>> wordCount = stream
.flatMap((String line, Collector<Tuple2<String, Integer>> collector) -> {
String[] tokens = line.split("\\s");
// 输出结果 (word, 1)
for (String token : tokens) {
if (token.length() > 0) {
collector.collect(new Tuple2<>(token, 1));
}
}
})
.returns(Types.TUPLE(Types.STRING, Types.INT))
.keyBy(0)
.timeWindow(Time.seconds(5))
.sum(1);
// Sink
wordCount.print();
// execute
env.execute("kafka streaming word count");
}
}
1.出现如下错误
需要导入flink的lib
kafka生产者 消费者准备
消费者
@KafkaListener(topics = "test")
public void consume(String message){
System.out.println("receive msg "+ message);
}
生产者
@RestController
public class KafkaProducer {
private static final String MY_TOPIC = "test";
private String msg;
@Autowired
KafkaTemplate kafkaTemplate;
@PostMapping(value = "/kafka")
public void produce(@RequestParam(value = "msg")String msg){
kafkaTemplate.send(MY_TOPIC,msg);
}
}
遇到问题
kafkaTemplate报空指针异常
加入如下代码予以解决
@Autowired
KafkaTemplate kafkaTemplate;
public static AmqpJavaClientDemo amqpJavaClientDemo;
@PostConstruct
public void init() {
amqpJavaClientDemo = this;
amqpJavaClientDemo.kafkaTemplate = this.kafkaTemplate;
}
应用
amqpJavaClientDemo.kafkaTemplate.send("test",content);
解释
需要注解
@Component
@Autowired 自动装配
@PostConstruct
被@PostConstruct修饰的方法会在服务器加载Servlet的时候运行,并且只会被服务器调用一次,类似于Serclet的inti()方法。被@PostConstruct修饰的方法会在构造函数之后,init()方法之前运行。
实验效果
阿里云服务器调用数据 存入kafka,flink通过topic进行实时消费
打包jar时的配置需加上
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>FlinkForKafka</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
版本对应后上传flink web UI 界面效果
使用Flink提供的命令行工具 flink ,将我们刚刚打包好的作业提交到集群上。命令行的参数 --class 用来指定哪个主类作为入口。我们之后会介绍命令行的具体使用方法。
$ bin/flink run --class com.flink.tutorials.java.api.projects.wordcount.WordCountKafkaInStdOut /Users/luweizheng/Projects/big-data/flink-tutorials/target/flink-tutorials-0.1.jar
复制代码
这时,仪表盘上就多了一个Flink程序。
程序的输出会打到Flink主目录下面的 log 目录下的.out文件中,使用下面的命令查看结果:
$ tail -f log/flink--taskexecutor-.out
关闭job
webUI 就点cancel就好了
在命令行停止:
先查询目前在运行的job任务列表
执行bin/flink list命令,发现有一个正在运行的job
然后./bin/flink cancel