Flink开发工程搭建

1 搭建Flink job工程

flink开发工程

flink采用maven管理工程,官网的template也是基于maven构建的。flink依赖比较好的地方是,所有的版本都是相同的,只要定义一个version变量就可以全部搞定。
maven依赖:

         <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-java_2.11</artifactId>
                <version>${flink.version}</version>
         </dependency>
         <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-cep_2.11</artifactId>
                <version>${flink.version}</version>
         </dependency>
         <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-test-utils_2.11</artifactId>
                <version>${flink.version}</version>
                <scope>test</scope>
         </dependency>

flink-streaming-java_2.11依赖,包含了所有的基础功能,可以得到所有的编程接口。
flink-cep_2.11依赖,提供了complex event processing的支持。可以定义复杂的事件模式。
flink-test-utils_2.11依赖,是单元测试依赖。实现的功能可以在本地测试环境中运行、验证,大大提高开发效率和代码质量。

其他依赖:

         <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-connector-kafka-0.11_2.11</artifactId>
                <version>${flink.version}</version>
         </dependency>
         <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-table_2.11</artifactId>
                <version>${flink.version}</version>
         </dependency>
         <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
                <version>${flink.version}</version>
         </dependency>

flink运行方式

在官网可以找到flink的各种部署方式,这里推荐两种:本地环境以及单元测试

本地环境的搭建步骤:

  • 首先需要安装jdk1.8+。

  • 其次需要到官网下载flink部署文件:https://flink.apache.org/downloads.html

  • 解压后,就可以运行了。
    ./bin/start-cluster.sh
    运行之后,会在本地启动两个进程:jobmanager以及task。
    可以通过web管理本地环境,其中包括本地环境查看,运行状态查看,任务上传等等。

  • 单元测试构建:
    从开发的角度来说,相对于本地环境,单元测试才是真正有意义的环境。
    首先我们需要在工程中添加单元测试依赖:flink-test-utils_2.11
    之后,按照junit的单元测试开发就可以了,完美。

    public class SimpleTest extends AbstractTestBase {
    @Test
    public void test(){
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

    }
    }

  • flink工程打包发布
    maven提供的打包插件有如下三种:

jar:maven 默认打包插件,用来创建 project jar
maven-shade-plugin:用来打可执行包,executable(fat) jar
maven-assembly-plugin:支持定制化打包方式,例如 apache 项目的打包方式
注意:有的时候maven-shade-plugin 会抛出invalid LOC HEADER bad signature异常。查看log,找到processing JAR …**.jar 的最后一条日志,这里的jar就是有问题的jar。从maven删除jar,从新下载。
maven-jar-plugin生成的jar包,不携带任何依赖信息。如果是简单的操作,没有任何依赖,那么这种方式是没有问题的。
但是实际我们的工程依赖很多外部信息,所以需要将依赖jar随着工程一起发布。目前有两种方式:maven-shade-plugin和maven-assembly-plugin。官方实例采用了shade的方式,我们这里也采用这种方式。在实际的测试中,由于内网环境和外网不通,assembly会因为缺少依赖抛出各种问题,所以直接放弃这种方式了。

                  <plugin>
                       <groupId>org.apache.maven.plugins</groupId>
                       <artifactId>maven-shade-plugin</artifactId>
                       <version>2.3</version>
                       <executions>
                             <!-- Run shade goal on package phase -->
                             <execution>
                                    <phase>package</phase>
                                    <goals>
                                           <goal>shade</goal>
                                    </goals>
                                    <configuration>
                                           <transformers>
                                                 <!-- add Main-Class to  manifest file -->
                                                 <transformer
                                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                                        <mainClass>org.yunzhong.MainClass</mainClass>
                                                 </transformer>
                                           </transformers>
                                    </configuration>
                             </execution>
                       </executions>
                </plugin>

2 Flink单元测试

单元测试依赖:

         <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-test-utils_2.11</artifactId>
                <version>${flink.version}</version>
                <scope>test</scope>
         </dependency>

单元测试数据生成器,模拟流数据源。可以根据业务自定义数据格式,从而对逻辑进行多维度测试。
基本的数据源可以分为:RichParallelSourceFunction和RichSourceFunction。

import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
import org.stsffap.cep.monitoring.events.MonitoringEvent;
import org.stsffap.cep.monitoring.events.PowerEvent;
import org.stsffap.cep.monitoring.events.TemperatureEvent;

import java.util.Random;
import java.util.concurrent.ThreadLocalRandom;

public class MonitoringEventSource extends RichParallelSourceFunction<MonitoringEvent> {
private boolean running = true;
private final int maxRackId;
private final long pause;
private final double temperatureRatio;
private final double powerStd;
private final double powerMean;
private final double temperatureStd;
private final double temperatureMean;
private int shard;
private int offset;
private int count = 10;

public MonitoringEventSource(int maxRackId, long pause, double temperatureRatio, double powerStd, double powerMean,
        double temperatureStd, double temperatureMean) {
    this.maxRackId = maxRackId;
    this.pause = pause;
    this.temperatureRatio = temperatureRatio;
    this.powerMean = powerMean;
    this.powerStd = powerStd;
    this.temperatureMean = temperatureMean;
    this.temperatureStd = temperatureStd;
}

@Override
public void open(Configuration configuration) {
    int numberTasks = getRuntimeContext().getNumberOfParallelSubtasks();
    int index = getRuntimeContext().getIndexOfThisSubtask();


    offset = (int) ((double) maxRackId / numberTasks * index);
    shard = (int) ((double) maxRackId / numberTasks * (index + 1)) - offset;
}
public void run(SourceContext<MonitoringEvent> sourceContext) throws Exception {
    while (running) {
        MonitoringEvent monitoringEvent;


        final ThreadLocalRandom random = ThreadLocalRandom.current();


        if (shard > 0) {
            int rackId = random.nextInt(shard) + offset;


            if (random.nextDouble() >= temperatureRatio) {
                double power = random.nextGaussian() * powerStd + powerMean;
                monitoringEvent = new PowerEvent(rackId, power);
            } else {
                double temperature = random.nextGaussian() * temperatureStd + temperatureMean;
                monitoringEvent = new TemperatureEvent(rackId, temperature);
            }


            sourceContext.collect(monitoringEvent);
        }
        count--;
        if (count <= 0) {
            running = false;
        }
        Thread.sleep(pause);
    }
}
public void cancel() {
    running = false;
}
}

单元测试和普通的junit单元测试类似,通过@Test注解进入测试。
单元测试类需要继承AbstractTestBase

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.core.fs.FileSystem.WriteMode;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.IngestionTimeExtractor;
import org.apache.flink.test.util.AbstractTestBase;
import org.apache.flink.util.Collector;
import org.junit.Test;
import org.stsffap.cep.monitoring.events.MonitoringEvent;
import org.stsffap.cep.monitoring.sources.MonitoringEventSource;
public class SimpleTest extends AbstractTestBase {
   private static final int MAX_RACK_ID = 10;
   private static final long PAUSE = 100;
   private static final double TEMPERATURE_RATIO = 0.5;
   private static final double POWER_STD = 10;
   private static final double POWER_MEAN = 100;
   private static final double TEMP_STD = 20;
   private static final double TEMP_MEAN = 80;
   @Test
   public void test() throws Exception {
         StreamExecutionEnvironment env =  StreamExecutionEnvironment.getExecutionEnvironment();
         env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
         DataStream<MonitoringEvent> inputEventStream = env.addSource(new  MonitoringEventSource(MAX_RACK_ID, PAUSE,
                       TEMPERATURE_RATIO, POWER_STD, POWER_MEAN, TEMP_STD,  TEMP_MEAN))
                       .assignTimestampsAndWatermarks(new  IngestionTimeExtractor<>());
         
         inputEventStream.flatMap(new FlatMapFunction<MonitoringEvent,  String>() {
                private static final long serialVersionUID = 1L;
                @Override
                public void flatMap(MonitoringEvent value, Collector<String>  out) throws Exception {
                       StringBuilder builder = new StringBuilder();
                       builder.append(value.getRackID());
                       builder.append(value.getClass().getName());
                       out.collect(builder.toString());
                }
         }).writeAsText("D://temp/flink-test/test.out",WriteMode.NO_OVERWRITE);
         env.execute();
         Thread.sleep(1000L);
   }
}
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值