前言
学习flink时写的一个本地demo在测试过程中报错,一个很简单的word count代码
package com.ieg.wc;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
/**
* @date : 2021/1/5 20:15
* 批处理 Word count
*/
public class WordCount {
public static void main(String[] args) throws Exception {
//创建执行环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//从文件中读取数据
String inputPath = "D:\\Project\\FlinkStudy\\src\\main\\resources\\hello.txt";
DataSource<String> inputDataSet = env.readTextFile(inputPath);
//对数据集进行处理,按空格分词,转换成(word,1)处理统计
DataSet<Tuple2<String,Integer>> resultSet = inputDataSet.flatMap(new MyFlatMapper() )
//按照第一个位置的word分组
.groupBy(0)
//将第二个位置上的数据求和
.sum(1);
resultSet.print();
}
//自定义类,实现FlatMapFunction接口
public static class MyFlatMapper implements FlatMapFunction<String, Tuple2<String,Integer>>{
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
//按空格分词
String[] words = value.split(" ");
for (String word : words) {
out.collect(new Tuple2<String, Integer>(word,1));
}
}
}
}
maven信息
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>FlinkStudy</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-java -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.10.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-java -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
报错信息
Exception in thread "main" java.lang.NullPointerException: Cannot find compatible factory for specified execution.target (=local)
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:104)
at org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:937)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:860)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:844)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
at com.ieg.wc.WordCount.main(WordCount.java:30)
然后开始逐步排查:
报错内容是空指针
报错行的内容是Dataset.print()方法,可以看下print方法的实现:
public void print() throws Exception {
List<T> elements = collect();
for (T e: elements) {
System.out.println(e);
}
}
print方法实际上就是调用collect方法,然后对得到的结果遍历输出,看下collect的实现:
public List<T> collect() throws Exception {
final String id = new AbstractID().toString();
final TypeSerializer<T> serializer = getType().createSerializer(getExecutionEnvironment().getConfig());
this.output(new Utils.CollectHelper<>(id, serializer)).name("collect()");
JobExecutionResult res = getExecutionEnvironment().execute();
ArrayList<byte[]> accResult = res.getAccumulatorResult(id);
if (accResult != null) {
try {
return SerializedListAccumulator.deserializeList(accResult, serializer);
} catch (ClassNotFoundException e) {
throw new RuntimeException("Cannot find type class of collected data type.", e);
} catch (IOException e) {
throw new RuntimeException("Serialization error while deserializing collected data", e);
}
} else {
throw new RuntimeException("The call to collect() could not retrieve the DataSet.");
}
}
看到这么一行:JobExecutionResult res = getExecutionEnvironment().execute();execute的实现:
public JobExecutionResult execute() throws Exception {
return execute(getDefaultName());
}
继续看
public JobExecutionResult execute(String jobName) throws Exception {
final JobClient jobClient = executeAsync(jobName);
try {
if (configuration.getBoolean(DeploymentOptions.ATTACHED)) {
lastJobExecutionResult = jobClient.getJobExecutionResult(userClassloader).get();
} else {
lastJobExecutionResult = new DetachedJobExecutionResult(jobClient.getJobID());
}
jobListeners.forEach(
jobListener -> jobListener.onJobExecuted(lastJobExecutionResult, null));
} catch (Throwable t) {
jobListeners.forEach(jobListener -> {
jobListener.onJobExecuted(null, ExceptionUtils.stripExecutionException(t));
});
ExceptionUtils.rethrowException(t);
}
return lastJobExecutionResult;
}
方法执行的第一行有一个executeAsync方法:
@PublicEvolving
public JobClient executeAsync(String jobName) throws Exception {
checkNotNull(configuration.get(DeploymentOptions.TARGET), "No execution.target specified in your configuration file.");
final Plan plan = createProgramPlan(jobName);
final PipelineExecutorFactory executorFactory =
executorServiceLoader.getExecutorFactory(configuration);
checkNotNull(
executorFactory,
"Cannot find compatible factory for specified execution.target (=%s)",
configuration.get(DeploymentOptions.TARGET));
CompletableFuture<? extends JobClient> jobClientFuture = executorFactory
.getExecutor(configuration)
.execute(plan, configuration);
try {
JobClient jobClient = jobClientFuture.get();
jobListeners.forEach(jobListener -> jobListener.onJobSubmitted(jobClient, null));
return jobClient;
} catch (Throwable t) {
jobListeners.forEach(jobListener -> jobListener.onJobSubmitted(null, t));
ExceptionUtils.rethrow(t);
// make javac happy, this code path will not be reached
return null;
}
}
这段代码是在package org.apache.flink.api.java…ExecutionEnvironment.java中
我们这个报错就是参数为空。configuration是类中声明的属性。
上述代码中依赖的org.apache.flink.api.scalaExecutionEnvironment.scala中没有configuration
不少博客都说加上
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.10.1</version>
</dependency>
我这里其实是有这个的。再检查下我的,发现多了一个provided
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>1.10.1</version>
<scope>provided</scope>
</dependency>
取消后便可成功执行
这里再记录下maven的一些东西
Maven的生命周期存在编译、测试、运行这些过程,那么显然有些依赖只用于测试,比如junit;有些依赖编译用不到,只有运行的时候才能用到,比如mysql的驱动包在编译期就用不到(编译期用的是JDBC接口),而是在运行时用到的;还有些依赖,编译期要用到,而运行期不需要提供,因为有些容器已经提供了,比如servlet-api在tomcat中已经提供了,我们只需要的是编译期提供而已。总结说来,在POM 4中,dependency中还引入了scope,它主要管理依赖的部署。大致有compile、provided、runtime、test、system等几个。
- compile:默认的scope,运行期有效,需要打入包中
- provided:编译期有效,运行期不需要提供,不会打入包中
- runtime:编译不需要,在运行期有效,需要导入包中。(接口与实现分离)
- test:测试需要,不会打入包中
- system:非本地仓库引入、存在系统的某个路径下的jar。(一般不使用)
因为我之前的是provided 所以运行期这个依赖并未使用,故和未加入一样了