1 搭建Flink job工程
flink开发工程
flink采用maven管理工程,官网的template也是基于maven构建的。flink依赖比较好的地方是,所有的版本都是相同的,只要定义一个version变量就可以全部搞定。
maven依赖:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-cep_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-test-utils_2.11</artifactId>
<version>${flink.version}</version>
<scope>test</scope>
</dependency>
flink-streaming-java_2.11依赖,包含了所有的基础功能,可以得到所有的编程接口。
flink-cep_2.11依赖,提供了complex event processing的支持。可以定义复杂的事件模式。
flink-test-utils_2.11依赖,是单元测试依赖。实现的功能可以在本地测试环境中运行、验证,大大提高开发效率和代码质量。
其他依赖:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.11_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_2.11</artifactId>
<version>${flink.version}</version>
</dependency>
flink运行方式
在官网可以找到flink的各种部署方式,这里推荐两种:本地环境以及单元测试
本地环境的搭建步骤:
-
首先需要安装jdk1.8+。
-
其次需要到官网下载flink部署文件:https://flink.apache.org/downloads.html
-
解压后,就可以运行了。
./bin/start-cluster.sh
运行之后,会在本地启动两个进程:jobmanager以及task。
可以通过web管理本地环境,其中包括本地环境查看,运行状态查看,任务上传等等。 -
单元测试构建:
从开发的角度来说,相对于本地环境,单元测试才是真正有意义的环境。
首先我们需要在工程中添加单元测试依赖:flink-test-utils_2.11
之后,按照junit的单元测试开发就可以了,完美。public class SimpleTest extends AbstractTestBase {
@Test
public void test(){
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
…
}
} -
flink工程打包发布
maven提供的打包插件有如下三种:
jar:maven 默认打包插件,用来创建 project jar
maven-shade-plugin:用来打可执行包,executable(fat) jar
maven-assembly-plugin:支持定制化打包方式,例如 apache 项目的打包方式
注意:有的时候maven-shade-plugin 会抛出invalid LOC HEADER bad signature异常。查看log,找到processing JAR …**.jar 的最后一条日志,这里的jar就是有问题的jar。从maven删除jar,从新下载。
maven-jar-plugin生成的jar包,不携带任何依赖信息。如果是简单的操作,没有任何依赖,那么这种方式是没有问题的。
但是实际我们的工程依赖很多外部信息,所以需要将依赖jar随着工程一起发布。目前有两种方式:maven-shade-plugin和maven-assembly-plugin。官方实例采用了shade的方式,我们这里也采用这种方式。在实际的测试中,由于内网环境和外网不通,assembly会因为缺少依赖抛出各种问题,所以直接放弃这种方式了。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<!-- add Main-Class to manifest file -->
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.yunzhong.MainClass</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
2 Flink单元测试
单元测试依赖:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-test-utils_2.11</artifactId>
<version>${flink.version}</version>
<scope>test</scope>
</dependency>
单元测试数据生成器,模拟流数据源。可以根据业务自定义数据格式,从而对逻辑进行多维度测试。
基本的数据源可以分为:RichParallelSourceFunction和RichSourceFunction。
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
import org.stsffap.cep.monitoring.events.MonitoringEvent;
import org.stsffap.cep.monitoring.events.PowerEvent;
import org.stsffap.cep.monitoring.events.TemperatureEvent;
import java.util.Random;
import java.util.concurrent.ThreadLocalRandom;
public class MonitoringEventSource extends RichParallelSourceFunction<MonitoringEvent> {
private boolean running = true;
private final int maxRackId;
private final long pause;
private final double temperatureRatio;
private final double powerStd;
private final double powerMean;
private final double temperatureStd;
private final double temperatureMean;
private int shard;
private int offset;
private int count = 10;
public MonitoringEventSource(int maxRackId, long pause, double temperatureRatio, double powerStd, double powerMean,
double temperatureStd, double temperatureMean) {
this.maxRackId = maxRackId;
this.pause = pause;
this.temperatureRatio = temperatureRatio;
this.powerMean = powerMean;
this.powerStd = powerStd;
this.temperatureMean = temperatureMean;
this.temperatureStd = temperatureStd;
}
@Override
public void open(Configuration configuration) {
int numberTasks = getRuntimeContext().getNumberOfParallelSubtasks();
int index = getRuntimeContext().getIndexOfThisSubtask();
offset = (int) ((double) maxRackId / numberTasks * index);
shard = (int) ((double) maxRackId / numberTasks * (index + 1)) - offset;
}
public void run(SourceContext<MonitoringEvent> sourceContext) throws Exception {
while (running) {
MonitoringEvent monitoringEvent;
final ThreadLocalRandom random = ThreadLocalRandom.current();
if (shard > 0) {
int rackId = random.nextInt(shard) + offset;
if (random.nextDouble() >= temperatureRatio) {
double power = random.nextGaussian() * powerStd + powerMean;
monitoringEvent = new PowerEvent(rackId, power);
} else {
double temperature = random.nextGaussian() * temperatureStd + temperatureMean;
monitoringEvent = new TemperatureEvent(rackId, temperature);
}
sourceContext.collect(monitoringEvent);
}
count--;
if (count <= 0) {
running = false;
}
Thread.sleep(pause);
}
}
public void cancel() {
running = false;
}
}
单元测试和普通的junit单元测试类似,通过@Test注解进入测试。
单元测试类需要继承AbstractTestBase
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.core.fs.FileSystem.WriteMode;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.IngestionTimeExtractor;
import org.apache.flink.test.util.AbstractTestBase;
import org.apache.flink.util.Collector;
import org.junit.Test;
import org.stsffap.cep.monitoring.events.MonitoringEvent;
import org.stsffap.cep.monitoring.sources.MonitoringEventSource;
public class SimpleTest extends AbstractTestBase {
private static final int MAX_RACK_ID = 10;
private static final long PAUSE = 100;
private static final double TEMPERATURE_RATIO = 0.5;
private static final double POWER_STD = 10;
private static final double POWER_MEAN = 100;
private static final double TEMP_STD = 20;
private static final double TEMP_MEAN = 80;
@Test
public void test() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<MonitoringEvent> inputEventStream = env.addSource(new MonitoringEventSource(MAX_RACK_ID, PAUSE,
TEMPERATURE_RATIO, POWER_STD, POWER_MEAN, TEMP_STD, TEMP_MEAN))
.assignTimestampsAndWatermarks(new IngestionTimeExtractor<>());
inputEventStream.flatMap(new FlatMapFunction<MonitoringEvent, String>() {
private static final long serialVersionUID = 1L;
@Override
public void flatMap(MonitoringEvent value, Collector<String> out) throws Exception {
StringBuilder builder = new StringBuilder();
builder.append(value.getRackID());
builder.append(value.getClass().getName());
out.collect(builder.toString());
}
}).writeAsText("D://temp/flink-test/test.out",WriteMode.NO_OVERWRITE);
env.execute();
Thread.sleep(1000L);
}
}