Flink 开发Demo

什么是Apache Flink?——体系结构

Apache Flink是一个框架和分布式处理引擎,用于无界和有界数据流的有状态计算。Flink被设计成可以在所有常见的集群环境中运行,以内存中的速度和任何规模执行计算。

在这里,我们解释Flink架构的重要方面。

处理无界和有界数据

任何类型的数据都是作为事件流产生的。信用卡交易、传感器测量、机器日志或网站或移动应用程序上的用户交互,所有这些数据都以流的形式生成。

数据可以被处理为无界流或有界流。

1.无界流有开始,但没有确定的结束。它们不会终止并在生成数据时提供数据。无界流必须被连续处理,也就是说,事件必须在被摄取后被迅速处理。等待所有输入数据到达是不可能的,因为输入是无界的,在任何时间点都不会完成。处理无界数据通常需要以特定顺序(例如事件发生的顺序)摄取事件,以便能够推断结果的完整性。

2.有界流有定义的开始和结束。有界流可以通过在执行任何计算之前摄取所有数据来处理。处理有界流不需要有序摄取,因为有界数据集总是可以排序的。有界流的处理也称为批处理。
在这里插入图片描述

Apache Flink擅长处理无界和有界数据集。精确的时间和状态控制使Flink的运行时可以在无限流上运行任何类型的应用程序。有界流在内部由专门为固定大小的数据集设计的算法和数据结构处理,产生出色的性能。

通过探索构建在Flink之上的用例来说服自己。

在任何地方部署应用程序

Apache Flink是一个分布式系统,需要计算资源来执行应用程序。Flink集成了所有常见的集群资源管理器,如Hadoop YARN、Apache Mesos和Kubernetes,但也可以设置为作为独立集群运行。

Flink被设计成可以很好地工作于前面列出的每个资源管理器。这是通过特定于资源管理器的部署模式实现的,该模式允许Flink以惯用的方式与每个资源管理器交互。

在部署Flink应用程序时,Flink会根据应用程序配置的并行性自动识别所需的资源,并从资源管理器请求它们。如果发生故障,Flink将通过请求新资源来替换失败的容器。提交或控制应用程序的所有通信都是通过REST调用进行的。这简化了Flink在许多环境中的集成。

以任何规模运行应用程序

Flink旨在运行任何规模的有状态流应用程序。应用程序可能被并行化为数千个任务,这些任务分布在集群中并并发执行。因此,应用程序可以利用几乎无限量的cpu、主存、磁盘和网络IO。此外,Flink很容易维护非常大的应用程序状态。它的异步和增量检查点算法确保对处理延迟的影响最小,同时保证一次状态一致性。

用户报告了在他们的生产环境中运行的Flink应用程序令人印象深刻的可伸缩性数字,例如

  • 应用程序每天处理数万亿个事件,
  • 维护多个tb状态的应用程序,以及
  • 运行在数千个核上的应用程序。

利用内存中的性能

有状态Flink应用程序针对本地状态访问进行了优化。任务状态始终保存在内存中,或者,如果状态大小超过可用内存,则保存在访问效率高的磁盘数据结构中。因此,任务通过访问本地(通常在内存中)状态来执行所有计算,从而产生非常低的处理延迟。Flink通过定期和异步地将本地状态检查点到持久存储来保证在发生故障时只有一次的状态一致性。
在这里插入图片描述

说到Flink 开发,必须要了解它的分层API。

Flink提供了三层api。每个API在简洁性和表达性之间提供了不同的权衡,并针对不同的用例。
在这里插入图片描述

ProcessFunctions

ProcessFunctions是Flink提供的最有表现力的函数接口。Flink提供ProcessFunctions来处理来自一个或两个输入流或事件的单独事件,这些事件被分组在一个窗口中。ProcessFunctions提供对时间和状态的细粒度控制。ProcessFunction可以任意修改其状态并注册计时器,这些计时器将在将来触发回调函数。因此,ProcessFunctions可以根据许多有状态事件驱动的应用程序的需要实现复杂的单事件业务逻辑。

下面的示例显示了在keyyedstream上操作并匹配START和END事件的keyyedprocessfunction。当接收到START事件时,该函数在状态中记住它的时间戳,并在4小时内注册一个计时器。如果在定时器触发之前收到END事件,则该函数计算END和START事件之间的持续时间,清除状态并返回值。否则,计时器就会触发并清除状态。

/**
* Matches keyed START and END events and computes the difference between
* both elements' timestamps. The first String field is the key attribute,
* the second String attribute marks START and END events.
*/
public static class StartEndDuration
  extends KeyedProcessFunction<String, Tuple2<String, String>, Tuple2<String, Long>> {

  private ValueState<Long> startTime;
  
  @Override
  public void open(Configuration conf) {
    // obtain state handle
    startTime = getRuntimeContext()
      .getState(new ValueStateDescriptor<Long>("startTime", Long.class));
  }

  /** Called for each processed event. */
  @Override
  public void processElement(
      Tuple2<String, String> in,
      Context ctx,
      Collector<Tuple2<String, Long>> out) throws Exception {
  
      switch (in.f1) {
        case "START":
          // set the start time if we receive a start event.
          startTime.update(ctx.timestamp());
          // register a timer in four hours from the start event.
          ctx.timerService()
            .registerEventTimeTimer(ctx.timestamp() + 4 * 60 * 60 * 1000);
          break;
        case "END":
          // emit the duration between start and end event
          Long sTime = startTime.value();
          if (sTime != null) {
            out.collect(Tuple2.of(in.f0, ctx.timestamp() - sTime));
            // clear the state
            startTime.clear();
          }
        default:
          // do nothing
      }
  }

  /** Called when a timer fires. */
  @Override
  public void onTimer(
    long timestamp,
    OnTimerContext ctx,
    Collector<Tuple2<String, Long>> out) {

    // Timeout interval exceeded. Cleaning up the state.
    startTime.clear();
  }
}

该示例说明了KeyedProcessFunction的表达能力,但也强调了它是一个相当冗长的接口。

数据流API

DataStream API为许多常见的流处理操作提供了原语,例如打开窗口、一次记录转换以及通过查询外部数据存储来丰富事件。DataStream API可用于Java和Scala,并基于map()、reduce()和aggregate()等函数。函数可以通过扩展接口定义,也可以作为Java或Scala lambda函数定义。

下面的示例展示了如何对点击流进行会话化并计算每个会话的点击次数。

// a stream of website clicks
DataStream<Click> clicks = ...

DataStream<Tuple2<String, Long>> result = clicks
  // project clicks to userId and add a 1 for counting
  .map(
    // define function by implementing the MapFunction interface.
    new MapFunction<Click, Tuple2<String, Long>>() {
      @Override
      public Tuple2<String, Long> map(Click click) {
        return Tuple2.of(click.userId, 1L);
      }
    })
  // key by userId (field 0)
  .keyBy(0)
  // define session window with 30 minute gap
  .window(EventTimeSessionWindows.withGap(Time.minutes(30L)))
  // count clicks per session. Define function as lambda function.
  .reduce((a, b) -> Tuple2.of(a.f0, a.f1 + b.f1));

SQL和表API

Flink提供了两个关系API, Table API和SQL。这两个api都是用于批处理和流处理的统一api,也就是说,查询在无界的实时流或有界的记录流上以相同的语义执行,并产生相同的结果。表API和SQL利用Apache Calcite进行解析、验证和查询优化。它们可以与DataStream和DataSet api无缝集成,并支持用户定义的标量、聚合和表值函数。

Flink的关系api旨在简化数据分析、数据流水线和ETL应用程序的定义。

下面的示例展示了对点击流进行会话化并计算每个会话的点击次数的SQL查询。这与DataStream API示例中的用例相同。

SELECT userId, COUNT(*)
FROM clicks
GROUP BY SESSION(clicktime, INTERVAL '30' MINUTE), userId

Flink为常见的数据处理用例提供了几个库。这些库通常嵌入在API中,并不是完全自包含的。因此,它们可以从API的所有特性中受益,并与其他库集成。

复杂事件处理(CEP):

模式检测是事件流处理的一个非常常见的用例。Flink的CEP库提供了一个API来指定事件模式(想想正则表达式或状态机)。CEP库与Flink的数据流API集成在一起,这样模式就可以在数据流上进行评估。CEP库的应用包括网络入侵检测、业务流程监控和欺诈检测。

DataSet API:

DataSet API是Flink批处理应用程序的核心API。DataSet API的原语包括map、reduce、(外部)join、co-group和iterate。所有操作都由算法和数据结构支持,这些算法和数据结构对内存中的序列化数据进行操作,如果数据大小超过内存预算,则溢出到磁盘。Flink的DataSet API的数据处理算法受到传统数据库操作符的启发,例如混合哈希连接或外部合并排序。从Flink 1.12开始,数据集API已被软弃用。

Gelly:

Gelly是一个可扩展的图形处理和分析库。Gelly是在DataSet API之上实现并与之集成的。因此,它受益于其可扩展和健壮的操作符。Gelly具有内置算法,如标签传播、三角形枚举和页面排名,但也提供了一个图形API,可以简化自定义图形算法的实现。

下面以两个例子作为编程说明,这里假设你已经搭建好环境,如果还没环境,请参考这篇文章:

Flink+Iceberg环境搭建

WordCount(单词计数)

启动flink集群后,进入flink目录:
在这里插入图片描述
在这里插入图片描述
执行ls命令可以看到里面有个examples 目录,里面包含workcount例子。

我们先用命令行提交一个Job
[root@hadoop001 flink-1.13.6]# ./bin/flink run examples/streaming/WordCount.jar

在这里插入图片描述
单词统计结果
在这里插入图片描述
该例子源码就三个文件

WordCount.java
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.flink.examples.java.wordcount;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.MultipleParameterTool;
import org.apache.flink.examples.java.wordcount.util.WordCountData;
import org.apache.flink.util.Collector;
import org.apache.flink.util.Preconditions;

/**
 * Implements the "WordCount" program that computes a simple word occurrence histogram over text
 * files.
 *
 * <p>The input is a plain text file with lines separated by newline characters.
 *
 * <p>Usage: <code>WordCount --input &lt;path&gt; --output &lt;path&gt;</code><br>
 * If no parameters are provided, the program is run with default data from {@link WordCountData}.
 *
 * <p>This example shows how to:
 *
 * <ul>
 *   <li>write a simple Flink program.
 *   <li>use Tuple data types.
 *   <li>write and use user-defined functions.
 * </ul>
 */
public class WordCount {

    // *************************************************************************
    //     PROGRAM
    // *************************************************************************

    public static void main(String[] args) throws Exception {

        final MultipleParameterTool params = MultipleParameterTool.fromArgs(args);

        // set up the execution environment
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        // make parameters available in the web interface
        env.getConfig().setGlobalJobParameters(params);

        // get input data
        DataSet<String> text = null;
        if (params.has("input")) {
            // union all the inputs from text files
            for (String input : params.getMultiParameterRequired("input")) {
                if (text == null) {
                    text = env.readTextFile(input);
                } else {
                    text = text.union(env.readTextFile(input));
                }
            }
            Preconditions.checkNotNull(text, "Input DataSet should not be null.");
        } else {
            // get default test text data
            System.out.println("Executing WordCount example with default input data set.");
            System.out.println("Use --input to specify file input.");
            text = WordCountData.getDefaultTextLineDataSet(env);
        }

        DataSet<Tuple2<String, Integer>> counts =
                // split up the lines in pairs (2-tuples) containing: (word,1)
                text.flatMap(new Tokenizer())
                        // group by the tuple field "0" and sum up tuple field "1"
                        .groupBy(0)
                        .sum(1);

        // emit result
        if (params.has("output")) {
            counts.writeAsCsv(params.get("output"), "\n", " ");
            // execute program
            env.execute("WordCount Example");
        } else {
            System.out.println("Printing result to stdout. Use --output to specify output path.");
            counts.print();
        }
    }

    // *************************************************************************
    //     USER FUNCTIONS
    // *************************************************************************

    /**
     * Implements the string tokenizer that splits sentences into words as a user-defined
     * FlatMapFunction. The function takes a line (String) and splits it into multiple pairs in the
     * form of "(word,1)" ({@code Tuple2<String, Integer>}).
     */
    public static final class Tokenizer
            implements FlatMapFunction<String, Tuple2<String, Integer>> {

        @Override
        public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
            // normalize and split the line
            String[] tokens = value.toLowerCase().split("\\W+");

            // emit the pairs
            for (String token : tokens) {
                if (token.length() > 0) {
                    out.collect(new Tuple2<>(token, 1));
                }
            }
        }
    }
}
WordCountPojo.java
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.flink.examples.java.wordcount;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.core.fs.FileSystem.WriteMode;
import org.apache.flink.examples.java.wordcount.util.WordCountData;
import org.apache.flink.util.Collector;

/**
 * This example shows an implementation of WordCount without using the Tuple2 type, but a custom
 * class.
 */
@SuppressWarnings("serial")
public class WordCountPojo {

    /**
     * This is the POJO (Plain Old Java Object) that is being used for all the operations. As long
     * as all fields are public or have a getter/setter, the system can handle them
     */
    public static class Word {

        // fields
        private String word;
        private int frequency;

        // constructors
        public Word() {}

        public Word(String word, int i) {
            this.word = word;
            this.frequency = i;
        }

        // getters setters
        public String getWord() {
            return word;
        }

        public void setWord(String word) {
            this.word = word;
        }

        public int getFrequency() {
            return frequency;
        }

        public void setFrequency(int frequency) {
            this.frequency = frequency;
        }

        @Override
        public String toString() {
            return "Word=" + word + " freq=" + frequency;
        }
    }

    public static void main(String[] args) throws Exception {

        final ParameterTool params = ParameterTool.fromArgs(args);

        // set up the execution environment
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        // make parameters available in the web interface
        env.getConfig().setGlobalJobParameters(params);

        // get input data
        DataSet<String> text;
        if (params.has("input")) {
            // read the text file from given input path
            text = env.readTextFile(params.get("input"));
        } else {
            // get default test text data
            System.out.println("Executing WordCount example with default input data set.");
            System.out.println("Use --input to specify file input.");
            text = WordCountData.getDefaultTextLineDataSet(env);
        }

        DataSet<Word> counts =
                // split up the lines into Word objects (with frequency = 1)
                text.flatMap(new Tokenizer())
                        // group by the field word and sum up the frequency
                        .groupBy("word")
                        .reduce(
                                new ReduceFunction<Word>() {
                                    @Override
                                    public Word reduce(Word value1, Word value2) throws Exception {
                                        return new Word(
                                                value1.word, value1.frequency + value2.frequency);
                                    }
                                });

        if (params.has("output")) {
            counts.writeAsText(params.get("output"), WriteMode.OVERWRITE);
            // execute program
            env.execute("WordCount-Pojo Example");
        } else {
            System.out.println("Printing result to stdout. Use --output to specify output path.");
            counts.print();
        }
    }

    // *************************************************************************
    //     USER FUNCTIONS
    // *************************************************************************

    /**
     * Implements the string tokenizer that splits sentences into words as a user-defined
     * FlatMapFunction. The function takes a line (String) and splits it into multiple Word objects.
     */
    public static final class Tokenizer implements FlatMapFunction<String, Word> {

        @Override
        public void flatMap(String value, Collector<Word> out) {
            // normalize and split the line
            String[] tokens = value.toLowerCase().split("\\W+");

            // emit the pairs
            for (String token : tokens) {
                if (token.length() > 0) {
                    out.collect(new Word(token, 1));
                }
            }
        }
    }
}
WordCountData.java
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.flink.examples.java.wordcount.util;

import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;

/**
 * Provides the default data sets used for the WordCount example program. The default data sets are
 * used, if no parameters are given to the program.
 */
public class WordCountData {

    public static final String[] WORDS =
            new String[] {
                "To be, or not to be,--that is the question:--",
                "Whether 'tis nobler in the mind to suffer",
                "The slings and arrows of outrageous fortune",
                "Or to take arms against a sea of troubles,",
                "And by opposing end them?--To die,--to sleep,--",
                "No more; and by a sleep to say we end",
                "The heartache, and the thousand natural shocks",
                "That flesh is heir to,--'tis a consummation",
                "Devoutly to be wish'd. To die,--to sleep;--",
                "To sleep! perchance to dream:--ay, there's the rub;",
                "For in that sleep of death what dreams may come,",
                "When we have shuffled off this mortal coil,",
                "Must give us pause: there's the respect",
                "That makes calamity of so long life;",
                "For who would bear the whips and scorns of time,",
                "The oppressor's wrong, the proud man's contumely,",
                "The pangs of despis'd love, the law's delay,",
                "The insolence of office, and the spurns",
                "That patient merit of the unworthy takes,",
                "When he himself might his quietus make",
                "With a bare bodkin? who would these fardels bear,",
                "To grunt and sweat under a weary life,",
                "But that the dread of something after death,--",
                "The undiscover'd country, from whose bourn",
                "No traveller returns,--puzzles the will,",
                "And makes us rather bear those ills we have",
                "Than fly to others that we know not of?",
                "Thus conscience does make cowards of us all;",
                "And thus the native hue of resolution",
                "Is sicklied o'er with the pale cast of thought;",
                "And enterprises of great pith and moment,",
                "With this regard, their currents turn awry,",
                "And lose the name of action.--Soft you now!",
                "The fair Ophelia!--Nymph, in thy orisons",
                "Be all my sins remember'd."
            };

    public static DataSet<String> getDefaultTextLineDataSet(ExecutionEnvironment env) {
        return env.fromElements(WORDS);
    }
}

该例子比较简单,就不用解释了

我们也可以使用web UI部署和监控一个job任务
web UI部署

在这里插入图片描述
如上图所示,浏览器敲入url:http://192.168.1.13:8081 回车,这里192.168.1.13 是部署flink服务器,

然后点击左边Submit New Job菜单,再点右边页面的右上角按钮Add New按钮,上传Job的jar 包,输入运行主类(带main方法),如果有参数在下一行输入程序参数,点击submit按钮提交任务

web UI监控

任务提交后,可以在Jobs-Running Job和Completed Jobs菜单查看
在这里插入图片描述
在这里插入图片描述
对于作业执行,Flink有两个操作符。第一个是源操作符,它从集合源读取数据。第二个运算符是转换运算符,它聚合单词的计数。了解有关数据流操作符的更多信息。

您还可以查看作业执行的时间轴:
在这里插入图片描述
您已经成功运行了Flink应用程序!您可以随意从示例/文件夹中选择任何其他JAR归档文件,或者部署您自己的作业!

TableAPI读写例子

本例以SQL和表API的读写为例,先说写例子

package com.demo.iceberg;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;

public class TableAPIWriteIceberg {
    static final Log log = LogFactory.getLog(TableAPIWriteIceberg.class);
    public static void main(String[] args) {
        try {
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            StreamTableEnvironment tabenv = StreamTableEnvironment.create(env);
            env.enableCheckpointing(1000);

            //1 create catalog
            System.out.println("before create catalog");
            log.info("before create catalog");
            tabenv.executeSql("" +
                    "create catalog hadoop_iceberg_dev with " +
                    "('type'='iceberg'," +
                    "'catalog-type'='hadoop'," +
                    "'warehouse'='hdfs://hadoop001:9000/iceberg/flink'" +
                    ")");

            //2 using catalog
            System.out.println("before use catalog");
            log.info("before use catalog");
            tabenv.useCatalog("hadoop_iceberg_dev");

            //3 create database under the above hadooop catalog
            System.out.println("before create database");
            log.info("before create database");
            tabenv.executeSql("create database if not exists iceberg ");

            //4 using iceberg db
            System.out.println("before use database");
            log.info("before use database");
            tabenv.useDatabase("iceberg");

            //5 create iceberg table
            //tabenv.executeSql("drop table if exists hadoop_iceberg_dev.iceberg.userAddr");
            System.out.println("before create table");
            log.info("before create table");
            tabenv.executeSql("" +
                    "create table  hadoop_iceberg_dev.iceberg.userAddr( userId int, city string, dt string) partitioned by (dt)");

            //6 insert data
            System.out.println("before insert");
            log.info("before insert");
            tabenv.executeSql("insert into  hadoop_iceberg_dev.iceberg.userAddr values(1,'SH','2022-12-12'),(2,'SH','2022-12-12'),(3,'BJ','2022-12-13')");
            System.out.println("after insert");
            log.info("after insert");
        }catch(Throwable e){
//            System.out.println("--------------"+System.currentTimeMillis()+"-----------");
//            e.printStackTrace(System.out);
            log.error("TableAPIWriteIceberg error",e);
        }
    }
}

例子说明:

刚开始创建一个catalog,指定表存在hadoop(指定了hadoop 的url)

然后使用刚创建的catalog(类似mysql use database)

然后创建一个数据库,建表,插入表,打包后提交job jar ,运行效果图如下:

在这里插入图片描述
在这里插入图片描述
数据是不是真插入了,可以通过读例子验证或者hive直接sql查询,先看看读例子,代码如下:

package com.demo.iceberg;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
//com.demo.iceberg.TableAPIReadIceberg
public class TableAPIReadIceberg {
    static final Log log = LogFactory.getLog(TableAPIReadIceberg.class);
    public static void main(String[] args) {

     try{
        //StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

//         EnvironmentSettings.Builder batchBuilder = EnvironmentSettings.newInstance().inBatchMode();
//
//         TableEnvironment tabenv = TableEnvironment.create(env);
         final EnvironmentSettings settings =
                 EnvironmentSettings.newInstance().inBatchMode().build();
        final TableEnvironment tabenv = TableEnvironment.create(settings);
        Configuration conf = tabenv.getConfig().getConfiguration();
        //conf for streaming read
        conf.setBoolean("table.dynamic-table-options.enabled",true);

        //tabenv.enableCheckpointing(1000);
        //1 create catalog
        log.info("before create catalog");
        tabenv.executeSql("" +
                "create catalog hadoop_iceberg_dev with " +
                "('type'='iceberg'," +
                "'catalog-type'='hadoop'," +
                "'warehouse'='hdfs://hadoop001:9000/iceberg/flink'" +
                ")");

        //2 batch read
        //  tabenv.executeSql(" select *from hadoop_iceberg_dev.iceberg.userAddr").print();

        //3 streaming read
         log.info("before select");
         log.info("tabenv class name:"+tabenv.getClass().getName());
         System.out.println("tabenv class name:"+tabenv.getClass().getName());
        //tabenv.executeSql(" select * from hadoop_iceberg_dev.iceberg.userAddr /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/ ");
         tabenv.executeSql(" select *from hadoop_iceberg_dev.iceberg.userAddr").print();
         log.info("after select");
    }catch(Throwable e){
        System.out.println("--------------"+System.currentTimeMillis()+"-----------");
        e.printStackTrace(System.out);
         log.error("TableAPIReadIceberg error",e);
    }
    }
}

运行效果截图:
在这里插入图片描述
在这里插入图片描述
结果发现通过web UI 提交任务报错,意思这是Flink 1.13.6一个bug,好吧!那就试试命令行提交job看看
在这里插入图片描述
因为命令行是在服务器端运行,所以必须把jar上传到服务器某目录,格式为flink run -c main.class xxx.jar

./flink run -c com.demo.iceberg.TableAPIReadIceberg ../examples/demo/flinkdemo1136-1.0-SNAPSHOT.jar

可以看到查询结果已经打印出来,我们看看后台任务监控
在这里插入图片描述
在这里插入图片描述
为了大家开发方便,把pom.xml贴出来

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>flinkdemo1136</artifactId>
    <version>1.0-SNAPSHOT</version>



    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <flink.version>1.13.6</flink.version>
        <scala.version>2.12.10</scala.version>
        <scala.binary.version>2.12</scala.binary.version>
        <log4j.version>1.2.17</log4j.version>
        <slf4j.version>1.7.22</slf4j.version>
        <hive.version>3.1.2</hive.version>
        <!--<iceberg.version>0.11.1</iceberg.version>-->
        <iceberg.version>0.13.1</iceberg.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>*</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>*</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table -->

        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-common -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-common</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-api-java -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-api-java-bridge -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-planner -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-table-planner-blink -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <!-- <dependency>
             <groupId>org.apache.iceberg</groupId>
             <artifactId>iceberg-flink-runtime</artifactId>
             <version>${iceberg.version}</version>
         </dependency>-->

        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-flink-runtime-1.13</artifactId>
            <version>${iceberg.version}</version>
        </dependency>
        <!--added for 0.12 iceberg version, with class not found bug -->
        <dependency>
            <groupId>org.apache.iceberg</groupId>
            <artifactId>iceberg-core</artifactId>
            <version>${iceberg.version}</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.1.3</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.flink/flink-clients -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.76</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
            <version>1.11.6</version>
        </dependency>

        <!-- Flink Dependency -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-hive_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>

        <!-- Hive Dependency -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>${hive.version}</version>
            <scope>provided</scope>
        </dependency>

        <!--CDC-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.49</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <dependency>
            <groupId>com.ververica</groupId>
            <artifactId>flink-connector-mysql-cdc</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.75</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.4.1</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>


</project>

至此,这篇文章就结束了,感谢你看到这里。Flink 还有很多场景应用,后面有机会陆续推出。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值