使用flinksql读取parquent文件

使用flinksql读取parquent文件

一、导入maven依赖

 <dependency>
       <groupId>org.apache.flink</groupId>
       <artifactId>flink-parquet_2.12</artifactId>
       <version>1.11</version>
 </dependency>

二、创建flink动态表关联文件

public static void main(String[] args) {

        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(2);
        final StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
        
        String dataPath = "/Users/klook/Downloads/user_profile/";
        

        tableEnv.executeSql("CREATE TABLE `user_profile_data`(\n" +
                "  `device_id` STRING,\n" +
                "  `last_week_click_num` INT ," +
                "  `last_month_click_num` INT ," +
                "  `last_week_searchpv` INT  ," +
                " `year` string, " +
                " `month` string, " +
                " `day` string " +
                ") partitioned by(`year`,`month`,`day`)" +
                " WITH ( " +
                "  'connector' = 'filesystem',\n" +
                "  'path' = "+ "'" + dataPath + "',\n" +
                "  'format' = 'parquet'\n" +
                "  )");


        final Table activity_base = tableEnv.sqlQuery(sql);
        final DataStream<Row> activityBaseStream = tableEnv.toAppendStream(activity_base, Row.class);
        sink2Mongo(activityBaseStream);


        try {
            env.execute();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

关联后可以使用sql进行操作,也可以转换成流进行别的操作。

三、特殊说明

如果在生产环境使用jar进行执行时报没有parquet工厂类的时候,可以将parquet的jar包放到flink的lib目录下。

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Could not find any factory for identifier 'parquet' that implements 'org.apache.flink.table.factories.FileSystemFormatFactory' in the classpath.
Available factory identifiers are:
csv
json
	at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:302)
	at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198)
	at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:149)
	at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:699)
	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:232)
	at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:916)
	at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
Caused by: org.apache.flink.table.api.ValidationException: Could not find any factory for identifier 'parquet' that implements 'org.apache.flink.table.factories.FileSystemFormatFactory' in the classpath.

如果报以下错,可以检查分区字段是否在sql语句里面。

Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.flink.formats.parquet.ParquetFileSystemFormatFactory$ParquetInputFormat.lambda$open$0(ParquetFileSystemFormatFactory.java:171)
	at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684)
	at org.apache.flink.formats.parquet.ParquetFileSystemFormatFactory$ParquetInputFormat.open(ParquetFileSystemFormatFactory.java:169)
	at org.apache.flink.formats.parquet.ParquetFileSystemFormatFactory$ParquetInputFormat.open(ParquetFileSystemFormatFactory.java:128)
	at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:85)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
Command exiting with ret '0'
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
好的,您可以按照以下步骤操作: 1. 在 Flink 中添加 Kafka 和 MySQL 的依赖: ```xml <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-jdbc_${scala.binary.version}</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.23</version> </dependency> ``` 2. 创建 Flink SQL 的执行环境: ```java StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings settings = EnvironmentSettings.newInstance() .useBlinkPlanner() .inStreamingMode() .build(); StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, settings); ``` 3. 注册 Kafka 数据源和 MySQL 数据汇: ```java tableEnv.executeSql("CREATE TABLE kafka_source (\n" + " id INT,\n" + " name STRING,\n" + " age INT,\n" + " PRIMARY KEY (id) NOT ENFORCED\n" + ") WITH (\n" + " 'connector' = 'kafka',\n" + " 'topic' = 'test',\n" + " 'properties.bootstrap.servers' = 'localhost:9092',\n" + " 'properties.group.id' = 'testGroup',\n" + " 'format' = 'json',\n" + " 'scan.startup.mode' = 'earliest-offset'\n" + ")"); tableEnv.executeSql("CREATE TABLE mysql_sink (\n" + " id INT,\n" + " name STRING,\n" + " age INT,\n" + " PRIMARY KEY (id)\n" + ") WITH (\n" + " 'connector' = 'jdbc',\n" + " 'url' = 'jdbc:mysql://localhost:3306/test',\n" + " 'table-name' = 'user',\n" + " 'driver' = 'com.mysql.cj.jdbc.Driver',\n" + " 'username' = 'root',\n" + " 'password' = 'root'\n" + ")"); ``` 4. 使用 Flink SQL 读取 Kafka 数据源并将数据写入 MySQL 数据汇: ```java tableEnv.executeSql("INSERT INTO mysql_sink SELECT * FROM kafka_source"); env.execute(); ``` 这样就可以使用 Flink SQL 从 Kafka 中读取数据,并将数据写入 MySQL 数据库中了。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值