Flink第一个入门程序

上篇:学习flink概述入门篇

下载java的Flink工程项目模板

在window环境下的cmd窗口下,执行该命令,生成flink(java语言)项目模板

mvn archetype:generate -DarchetypeGroupId=org.apache.flink -DarchetypeArtifactId=flink-quickstart-java -DarchetypeVersion=1.12.0 -DgroupId=cn._51doit.flink -DartifactId=flink-java -Dversion=1.0 -Dpackage=cn._51doit.flink -DinteractiveMode=false

执行过程如图所示:

打开idea工具以open方式打开该项目模板

删除无关紧要代码,比如:BatchJob、StreamingJob类,之后还需要把在该工程下的pom文件的<scope>provided</scope>注释掉,如图所示:

具体代码如下:

package cn._51doit.flink.day01;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

/**
 * 从指定的socket读取数据,对单词进行计算
 *
 */
public class StreamingWordCount {
    public static void main(String[] args) throws Exception {
        //创建Flink流式计算的执行环境(ExecutionEnvironment)
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //创建DataStream
        //Source
        DataStream<String> lines = env.socketTextStream("Master",8888);
        //调用Transformation开始
        //调用Transformation
        SingleOutputStreamOperator<String> wordsDataStream = lines.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String line, Collector<String> collector) throws Exception {
                String[] words = line.split("");
                for (String word : words) {
                    collector.collect(word);

                }
            }
        });

        //将单词和一组合
        SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = wordsDataStream.map(new MapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public Tuple2<String, Integer> map(String word) throws Exception {
                return Tuple2.of(word, 1);
            }
        });

        //分组
        KeyedStream<Tuple2<String, Integer>, String> keyed = wordAndOne.keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
            @Override
            public String getKey(Tuple2<String, Integer> tp) throws Exception {
                return tp.f0;
            }
        });

        //聚合
        SingleOutputStreamOperator<Tuple2<String, Integer>> summed = keyed.sum(1);

        //Transformation结束

        //调用Sinnk
        summed.print();

        //启动执行
        env.execute("StreamingWordCount");

    }
}

启动程序后,并在cmd窗口输入数据,就可以在idea的控制台监控打印输出单词的次数

对当前代码优化

package cn._51doit.flink.day01;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

/**
 * 从指定的socket读取数据,对单词进行计算
 *
 */
public class StreamingWordCountv2 {
    public static void main(String[] args) throws Exception {
        //创建Flink流式计算的执行环境(ExecutionEnvironment)
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //创建DataStream
        //Source
        DataStream<String> lines = env.socketTextStream("Master",8888);
        //调用Transformation开始
        //调用Transformation(优化代码)
        SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String line, Collector<Tuple2<String, Integer>> collector) throws Exception {
                String[] words = line.split(" ");
                for (String word : words) {
                    //new Tuple2<String,Integer>(word,1);
                    collector.collect(Tuple2.of(word, 2));

                }
            }
        });

        //分组
        KeyedStream<Tuple2<String, Integer>, String> keyed = wordAndOne.keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
            @Override
            public String getKey(Tuple2<String, Integer> tp) throws Exception {
                return tp.f0;
            }
        });

        //聚合
        SingleOutputStreamOperator<Tuple2<String, Integer>> summed = keyed.sum(1);

        //Transformation结束

        //调用Sinnk
        summed.print();

        //启动执行
        env.execute("StreamingWordCount");

    }
}

源码说明

getExecutionEnvironment点击进去可以看到getExecutionEnvironment方法,它会返回getExecutionEnvironment方法

继续点击getExecutionEnvironment的返回方法,可以看到

说明:

它会判断,你会创建threadLocalContextEnvironmentFactory,还是createExecutionEnvironment,如果是本地运行,将会创建一个createExecutionEnvironment方法,另假如在createExecutionEnvironment方法里面,他会有defaultLocalParallelism方法

public static LocalStreamEnvironment createLocalEnvironment(Configuration configuration) {
		if (configuration.getOptional(CoreOptions.DEFAULT_PARALLELISM).isPresent()) {
			return new LocalStreamEnvironment(configuration);
		} else {
			Configuration copyOfConfiguration = new Configuration();
			copyOfConfiguration.addAll(configuration);
			copyOfConfiguration.set(CoreOptions.DEFAULT_PARALLELISM, defaultLocalParallelism);
			return new LocalStreamEnvironment(copyOfConfiguration);
		}
	}

在defaultLocalParallelism 的availableProcessors方法里面,它会获取当前可用的逻辑核

private static int defaultLocalParallelism = Runtime.getRuntime().availableProcessors();

使用java Lambda编写WordCount

具体代码实现:

package cn._51doit.flink.day01;

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.LocalStreamEnvironment;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.Arrays;

/**
 * Lambda编写WordCount
 *
 */
public class LambdaStreamingWordCount {
    public static void main(String[] args) throws Exception {
        //LocalStreamEnvironment只能在local模式运行,通常用于本地测试
        LocalStreamEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(8);

        DataStreamSource<String> lines = env.socketTextStream("Master", 9999);

        //使用java8的Lambda表达式
        SingleOutputStreamOperator<String> words = lines.flatMap((String line, Collector<String> out) ->
                Arrays.stream(line.split(" ")).forEach(out::collect));

        //打印输出
        words.print();

        //抛出异常
        env.execute();


    }
}

打印输出错误信息:无法获取泛型信息,使用Lambda表达式,要有return返回信息,如图所示:

错误信息:

Exception in thread "main" org.apache.flink.api.common.functions.InvalidTypesException: The return type of function 'main(LambdaStreamingWordCount.java:22)' could not be determined automatically, due to type erasure. You can give type information hints by using the returns(...) method on the result of the transformation call, or by letting your function implement the 'ResultTypeQueryable' interface.
	at org.apache.flink.api.dag.Transformation.getOutputType(Transformation.java:484)
	at org.apache.flink.streaming.api.datastream.DataStream.addSink(DataStream.java:1294)
	at org.apache.flink.streaming.api.datastream.DataStream.print(DataStream.java:975)
	at cn._51doit.flink.day01.LambdaStreamingWordCount.main(LambdaStreamingWordCount.java:26)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The generic type parameters of 'Collector' are missing. In many cases lambda methods don't provide enough information for automatic type extraction when Java generics are involved. An easy workaround is to use an (anonymous) class instead that implements the 'org.apache.flink.api.common.functions.FlatMapFunction' interface. Otherwise the type has to be specified explicitly using type information.
	at org.apache.flink.api.java.typeutils.TypeExtractionUtils.validateLambdaType(TypeExtractionUtils.java:351)
	at org.apache.flink.api.java.typeutils.TypeExtractionUtils.extractTypeFromLambda(TypeExtractionUtils.java:176)
	at org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:515)
	at org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:168)
	at org.apache.flink.streaming.api.datastream.DataStream.flatMap(DataStream.java:637)
	at cn._51doit.flink.day01.LambdaStreamingWordCount.main(LambdaStreamingWordCount.java:22)

为了解决这个问题,所以对Lambda代码进行改造

package cn._51doit.flink.day01;

import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.LocalStreamEnvironment;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.Arrays;

/**
 * Lambda编写WordCount
 *
 */
public class LambdaStreamingWordCount {
    public static void main(String[] args) throws Exception {
        //LocalStreamEnvironment只能在local模式运行,通常用于本地测试
        LocalStreamEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(8);

        DataStreamSource<String> lines = env.socketTextStream("Master", 9999);

        //使用java8的Lambda表达式
        SingleOutputStreamOperator<String> words = lines.flatMap((String line, Collector<String> out) ->
                Arrays.stream(line.split(" ")).forEach(out::collect)).returns(Types.STRING); //使用Lambda表达式,要有return返回信息

        SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = words.map(w -> Tuple2.of(w, 1)).returns(Types.TUPLE(Types.STRING,Types.INT));

        SingleOutputStreamOperator<Tuple2<String, Integer>> summed = wordAndOne.keyBy(0).sum(1);

        //打印输出
        summed.print();

        //抛出异常
        env.execute();

    }
}



启动之前,在linux环境下执行命令,并输入想要统计的单词次数,命令为:nc -lk 9999

注意的是Lambda表达式只能在本地运行,不支持集群部署,也就是说不能提交job任务

打包上传集群中执行

打包之前不能把代码写死,在创建Flink流式计算的执行环境修改一些参数,具体代码(查阅)

package cn._51doit.flink.day01;

import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import java.util.Arrays;

/**
 * Lambda编写WordCount
 *
 */
public class LambdaStreamingWordCount {
    public static void main(String[] args) throws Exception {
        //创建Flink流式计算的执行环境(ExecutionEnvironment)
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //创建DataStream
        //Source
        DataStreamSource<String> lines = env.socketTextStream(args[0], Integer.parseInt(args[1]));

        //使用java8的Lambda表达式
        SingleOutputStreamOperator<String> words = lines.flatMap((String line, Collector<String> out) ->
                Arrays.stream(line.split(" ")).forEach(out::collect)).returns(Types.STRING); //使用Lambda表达式,要有return返回信息

        SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = words.map(w -> Tuple2.of(w, 1)).returns(Types.TUPLE(Types.STRING,Types.INT));

        SingleOutputStreamOperator<Tuple2<String, Integer>> summed = wordAndOne.keyBy(0).sum(1);

        //打印输出
        summed.print();

        //抛出异常
        env.execute();

    }
}

关键代码:

DataStreamSource<String> lines = env.socketTextStream(args[0], Integer.parseInt(args[1]));

打包、上传到linux环境执行,若不需要打包的文件,可以到pom文件设置,把刚才的<scope>provided</scope>注释打开即可,我这里注释都打开了

说明:

pom文件加入<scope>provided</scope>文件后,在打包的时候,不会把这个文件引入进来(排除该文件的意思)

若在本地运行必须把<scope>provided</scope>注释掉【不然没有依赖】

提交flink-java-1.0.jar包,在webui页面的8081的端口下的Submit New Job--->Add New选择flink-java-1.0.jar包进行提交,提交后对jar包的入口包名、端口号、并行度进行编辑

点击提交就会生成job任务文件,详情查看:

http://master:8081/#/job/c023d4307839bd1a750bdaae37bc1903/overview


 

查看打印的数据:

http://master:8081/#/task-manager/192.168.242.104:35955-7f8554/metrics

说明:flink的web页面的数据不是实时更新的,每一条数据过来需要手动刷新

分享完毕谢谢!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
好的,以下是一个简单的 Flink 入门程序。 首先,您需要安装 Flink 并设置环境变量。然后,您需要为您的项目添加 flink 的 Maven 依赖项,如下所示: ```xml <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.9.1</version> </dependency> ``` 接下来,您可以编写您的 Flink 程序,如下所示: ```java import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.DataSet; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.util.Collector; public class FlinkWordCount { public static void main(String[] args) throws Exception { final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<String> text = env.fromElements( "Hello World", "Hello Flink", "Hello Flink and Kafka" ); DataSet<Tuple2<String, Integer>> counts = text.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() { @Override public void flatMap(String value, Collector<Tuple2<String, Integer>> out) { for (String word : value.split("\\s")) { out.collect(new Tuple2<>(word, 1)); } } }) .groupBy(0) .sum(1); counts.print(); } } ``` 这个程序中,我们首先创建了一个 ExecutionEnvironment 对象,然后使用它来读取一个包含三行字符串的数据集。接下来,我们应用一个 flatMap 函数来将每行字符串划分为单词,并为每个单词创建一个二元组。最后,我们按照单词分组,并计算每个单词出现的次数。 希望这个简单的程序能帮助您入门 Flink

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值