java flink 示例
Flink批处理示例JAVA (Flink Batch Example JAVA)
Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.
Apache Flink是具有强大的流和批处理功能的开源流处理框架。
先决条件 (Prerequisites)
- Unix-like environment (Linux, Mac OS X, Cygwin) 类似于Unix的环境(Linux,Mac OS X,Cygwin)
- git 吉特
- Maven (we recommend version 3.0.4) Maven(我们建议版本3.0.4)
- Java 7 or 8 Java 7或8
- IntelliJ IDEA or Eclipse IDE IntelliJ IDEA或Eclipse IDE
git clone https://github.com/apache/flink.git
cd flink
mvn clean package -DskipTests # this will take up to 10 minutes
数据集 (Datasets)
For the batch processing data we’ll be using the datasets in here: datasets In this example we’ll be using the movies.csv and the ratings.csv, create a new java project and put them in a folder in the application base.
对于批处理数据,我们将在这里使用数据集: 数据集在本示例中,我们将使用Movies.csv和rating.csv,创建一个新的Java项目并将其放在应用程序库的文件夹中。
例 (Example)
We’re going to make an execution where we retrieve the average rating by movie genre of the entire dataset we have.
我们将执行一个执行过程,其中我们根据整个电影数据集的电影流派来检索平均评分。
环境和数据集 (Environment and datasets)
First create a new Java file, I’m going to name it AverageRating.java
首先创建一个新的Java文件,我将其命名为AverageRating.java
The first thing we’ll do is to create the execution environment and load the csv files in a dataset. Like this:
我们要做的第一件事是创建执行环境并将csv文件加载到数据集中。 像这样:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Tuple3<Long, String, String>> movies = env.readCsvFile("ml-latest-small/movies.csv")
.ignoreFirstLine()
.parseQuotedStrings('"')
.ignoreInvalidLines()
.types(Long.class, String.class, String.class);
DataSet<Tuple2<Long, Double>> ratings = env.readCsvFile("ml-latest-small/ratings.csv")
.ignoreFirstLine()
.includeFields(false, true, true, false)
.types(Long.class, Double.class);
There, we are making a dataset with a <Long, String, String> for the movies, ignoring errors, quotes and the header line, and a dataset with <Long, Double> for the movie ratings, also ignoring the header, invalid lines and quotes.
在那里,我们正在制作一个带有<Long,String,String>的电影数据集,而忽略错误,引号和标题行,以及一个带有<Long,Double>的电影评级的数据集,还忽略了标题,无效行。和报价。
链接处理 (Flink Processing)
Here we will process the dataset with flink. The result will be in a List of String, Double tuples. where the genre will be in the String and the average rating will be in the double.
在这里,我们将使用flink处理数据集。 结果将在“字符串,双元组”列表中。 该流派将出现在“字符串”中,而平均收视率则是“双打”。
First we’ll join the ratings dataset with the movies dataset by the moviesId present in each dataset. With this we’ll create a new Tuple with the movie name, genre and score. Later, we group this tuple by genre and add the score of all equal genres, finally we divide the score by the total results and we have our desired result.
首先,我们将通过每个数据集中存在的movieId将评级数据集与电影数据集结合起来。 这样,我们将创建一个具有电影名称,流派和乐谱的新元组。 之后,我们按流派将此元组分组,然后将所有相等流派的分数相加,最后,将分数除以总结果,便得到了理想的结果。
List<Tuple2<String, Double>> distribution = movies.join(ratings)
.where(0)
.equalTo(0)
.with(new JoinFunction<Tuple3<Long, String, String>,Tuple2<Long, Double>, Tuple3<StringValue, StringValue, DoubleValue>>() {
private StringValue name = new StringValue();
private StringValue genre = new StringValue();
private DoubleValue score = new DoubleValue();
private Tuple3<StringValue, StringValue, DoubleValue> result = new Tuple3<>(name,genre,score);
@Override
public Tuple3<StringValue, StringValue, DoubleValue> join(Tuple3<Long, String, String> movie,Tuple2<Long, Double> rating) throws Exception {
name.setValue(movie.f1);
genre.setValue(movie.f2.split("\\|")[0]);
score.setValue(rating.f1);
return result;
}
})
.groupBy(1)
.reduceGroup(new GroupReduceFunction<Tuple3<StringValue,StringValue,DoubleValue>, Tuple2<String, Double>>() {
@Override
public void reduce(Iterable<Tuple3<StringValue,StringValue,DoubleValue>> iterable, Collector<Tuple2<String, Double>> collector) throws Exception {
StringValue genre = null;
int count = 0;
double totalScore = 0;
for(Tuple3<StringValue,StringValue,DoubleValue> movie: iterable){
genre = movie.f1;
totalScore += movie.f2.getValue();
count++;
}
collector.collect(new Tuple2<>(genre.getValue(), totalScore/count));
}
})
.collect();
With this you’ll have a working batch processing flink application. Enjoy!
有了这个,您将拥有一个有效的批处理flink应用程序。 请享用!
翻译自: https://www.freecodecamp.org/news/apache-flink-batch-example-in-java/
java flink 示例
本文提供了一个Java中使用Apache Flink进行批处理的示例,包括环境设置、数据集准备和处理流程。通过将电影数据集与评分数据集连接,计算每个电影类型的平均评分。
392

被折叠的 条评论
为什么被折叠?



