Flink sink doris案例

添加 flink-doris-connector 和必要的 Flink Maven 依赖

此处参考官网的配置
Flink 1.13.* 及以前的版本

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-java</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-java_${scala.version}</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-clients_${scala.version}</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<!-- flink table -->
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-common</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-api-java-bridge_${scala.version}</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner-blink_${scala.version}</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<!-- flink-doris-connector -->
<dependency>
  <groupId>org.apache.doris</groupId>
  <artifactId>flink-doris-connector-1.13_2.12</artifactId>
  <!--artifactId>flink-doris-connector-1.12_2.12</artifactId-->
  <!--artifactId>flink-doris-connector-1.11_2.12</artifactId-->
  <version>1.0.3</version>
</dependency>    

Flink 1.14.* 版本

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-java</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-java_${scala.version}</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-clients_${scala.version}</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<!-- flink table -->
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-table-planner_${scala.version}</artifactId>
    <version>${flink.version}</version>
    <scope>provided</scope>
</dependency>
<!-- flink-doris-connector -->
<dependency>
  <groupId>org.apache.doris</groupId>
  <artifactId>flink-doris-connector-1.14_2.12</artifactId>
  <version>1.0.3</version>
</dependency>  

案例是采用1.14版本的,就在今天Flink出了1.15 ,真快 我是紧赶慢赶啊

建表

CREATE TABLE dbname.`worker` (
   `startTime` datetime NOT NULL ,
 `id` int NOT NULL,
 `name` varchar(255) DEFAULT NULL,
  `age` int DEFAULT NULL,
  `city` varchar(255) NOT NULL,
  `salary` int NOT NULL
)ENGINE=olap
DUPLICATE KEY(startTime,id,name)
PARTITION BY RANGE(startTime)()
distributed BY HASH(name)
PROPERTIES (
.......
);

记得要把分区完善,如果是空分区,会报错,无法导入数据的

模拟数据源

因为用Flink基本都是流式数据,又不想再写个kafka,所以就自己早了个数据源

class MyDataSource extends SourceFunction[String] {

  var runnning: Boolean = true

  override def run(sourceContext: SourceFunction.SourceContext[String]): Unit = {
    val random: Random = new Random()

    var id: Int = 0

    val nameList: util.ArrayList[String] = new util.ArrayList[String]()
    nameList.addAll(util.Arrays.asList("aa", "bb", "cc", "dd"))

    val cityList: util.ArrayList[String] = new util.ArrayList[String]()
    cityList.addAll(util.Arrays.asList("苏州", "无锡", "常州", "南京"))

    var age: Int = 0
    var salary: Int = 0
    var r:Int = 0
    var name:String = null
    var city :String = null

    while (runnning) {
      id = id + 1
      r = random.nextInt(10)%nameList.size()
      age = age+random.nextInt(20)
      salary = salary+random.nextInt(5000)+10000
      name = nameList.get(r)
      city = cityList.get(r)
//      val str: String = JSON.toJSONString(new worker(id, name, age, city, salary),JSON.DEFAULT_GENERATE_FEATURE)
      val str:String = "{\"startTime\":\"2022-05-06\","+"\"id\":"+id+",\"name\":\""+name+"\",\"age\":"+age+",\"city\":\""+city+"\",\"salary\":"+salary+"}"
      sourceContext.collect(str)
      Thread.sleep(1000L)
    }

  }

  override def cancel(): Unit = ???
}

sink到Doris

 public static void main(String[] args) {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        Properties pro = new Properties();
        pro.setProperty("format", "json");
        pro.setProperty("strip_outer_array", "true");

        DataStreamSource<String> stream = env.addSource(new MyDataSource());
        stream.print();
        stream.addSink(
                        DorisSink.sink(
                                DorisReadOptions.builder().build(),
                                DorisExecutionOptions.builder()
                                        .setBatchSize(3)
                                        .setBatchIntervalMs(1L)
                                        .setMaxRetries(3)
                                        .setStreamLoadProp(pro).build(),
                                DorisOptions.builder()
                                        .setFenodes("xxx.xxx.xxx.xxx.xxx:8030")
                                        .setTableIdentifier("dbname.worker")
                                        .setUsername("username")
                                        .setPassword("password").build()
                        ));

        try {
            env.execute("Flink2Doris");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

封装一下

考虑到使用场景,我就封装了一下,用起来能方便一些

public class MySinkF {
    public SinkFunction<String> MySinkDoris(String tablename){
        Properties pro = new Properties();
        pro.setProperty("format", "json");
        pro.setProperty("strip_outer_array", "true");

        SinkFunction<String> sink = DorisSink.sink(
                DorisReadOptions.builder().build(),
                DorisExecutionOptions.builder()
                        .setBatchSize(3)
                        .setBatchIntervalMs(1L)
                        .setMaxRetries(3)
                        .setStreamLoadProp(pro).build(),
                DorisOptions.builder()
                        .setFenodes("xxx.xxx.xxx.xxx.xxx:8030")
                        .setTableIdentifier("dbname."+tablename)
                        .setUsername("username")
                        .setPassword("password").build()
        );
        return sink;
    }

}

同理,fenodes username password 不常改变的,可以读配置文件
原代码就可以简单一些了

stream.addSink(new MySinkF().MySinkDoris("worker"));

小升级一下

因为我的业务是用AGGREGATE类型,有些字段需要replace,所以我又试了一下,是否正常使用

重新建表

CREATE TABLE test_db.`worker_replace` (
   `startTime` datetime NOT NULL ,
 `id` int NOT NULL,
 `name` varchar(255) DEFAULT NULL,
  `age` int DEFAULT NULL,
  `city` varchar(255) NOT NULL,
  `salary` int REPLACE NOT NULL
)ENGINE=olap
AGGREGATE KEY(startTime,id,name,age,city)
PARTITION BY RANGE(startTime)()
distributed BY HASH(name)
PROPERTIES (
......
);

数据源

直接从文件中拿了

{"startTime" : "2022-05-06 00:00:00","id" : 1,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 2,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 3,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 4,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 5,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 6,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 7,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 8,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 9,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 10,"name" : "dd","age" :14,"city" : "南京","salary" : 888}
{"startTime" : "2022-05-06 00:00:00","id" : 1,"name" : "dd","age" :14,"city" : "南京","salary" : 999}
{"startTime" : "2022-05-06 00:00:00","id" : 2,"name" : "dd","age" :14,"city" : "南京","salary" : 999}
{"startTime" : "2022-05-06 00:00:00","id" : 3,"name" : "dd","age" :14,"city" : "南京","salary" : 999}
{"startTime" : "2022-05-06 00:00:00","id" : 4,"name" : "dd","age" :14,"city" : "南京","salary" : 999}

最后代码

    public static void main(String[] args) {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

//设置并行为1 效果比较明显 
        DataStreamSource<String> data = env.readTextFile("your_path").setParallelism(1);
        data.print();
        data.addSink(new MySinkF().MySinkDoris("worker_replace"));


        try {
            env.execute("doris repalce");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

最后去观察表,确实和预期结果一样

  • 5
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
尚硅谷是一家专注于大数据和人工智能领域的培训机构,致力于提供高质量的技术培训和实战指导。其中,FlinkDoris是尚硅谷教授的两个非常重要的技术。 Flink是一个流式处理框架,也被称为Apache Flink。它提供了一个高性能和可扩展的方法来处理大规模实时和批量数据。Flink支持以事件驱动的方式进行数据处理,并具有低延迟、高吞吐量的特点。它的特色包括:Exactly-Once语义、状态管理、窗口计算等。Flink被广泛应用于实时数据分析、流式ETL、实时监控等场景。 Doris是一个分析型数据库,也被称为Apache Doris或者Palo。它是一个分布式列式存储数据库,专注于高效的实时数据分析。Doris提供了高性能、高可用性和易于扩展的特点。它支持实时数据插入和查询,并具有多维分析的能力。Doris的特色包括:分布式事务、数据表分区、快速查询等。Doris被广泛应用于用户行为分析、指标报表、OLAP等领域。 尚硅谷针对FlinkDoris两个技术,提供了相关的课程和培训,帮助学习者深入理解它们的原理和应用。培训内容包括技术介绍、实战演练、案例分析等,学习者通过实际操作和实践项目,能够掌握FlinkDoris的核心能力。尚硅谷的教学团队具有丰富的实战经验,能够通过案例讲解、答疑解惑,帮助学习者更好地掌握FlinkDoris。 总而言之,尚硅谷的FlinkDoris课程提供了学习者学习和掌握这两个重要技术的机会。通过学习这两个技术,学习者可以在大数据和人工智能领域获得更多的就业机会,并且在实践中运用它们来解决实际问题。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值