flink按天分区实时写入hive表问题

flink按天分区实时写入hive表问题

参考链接:
https://www.cnblogs.com/codetouse/p/12746612.html
https://www.cnblogs.com/kancy/p/13912443.html
https://www.e-learn.cn/content/wangluowenzhang/227836

1.问题:

2022-02-07 17:04:49,196 WARN org.apache.flink.yarn.configuration.YarnLogConfigUtil [] - The configuration directory (’/opt/flink/conf’) already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2022-02-07 17:04:49,391 INFO org.apache.hadoop.yarn.client.api.impl.TimelineReaderClientImpl [] - Initialized TimelineReader URI=http://hadoop11.nb:8198/ws/v2/timeline/, clusterId=yarn-cluster
java.lang.LinkageError: ClassCastException: attempting to castjar:file:/data/work/cmbh/project/flinkJob/bigdata-1.0-SNAPSHOT-jar-with-dependencies.jar!/javax/ws/rs/ext/RuntimeDelegate.class to jar:file:/opt/flink/lib/javax.ws.rs-api-2.0.jar!/javax/ws/rs/ext/RuntimeDelegate.class
at javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:146)
at javax.ws.rs.ext.RuntimeDelegate.getInstance(RuntimeDelegate.java:120)
at javax.ws.rs.core.MediaType.valueOf(MediaType.java:179)
at com.sun.jersey.core.header.MediaTypes.(MediaTypes.java:65)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:182)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:175)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.init(MessageBodyFactory.java:162)
at com.sun.jersey.api.client.Client.init(Client.java:343)
at com.sun.jersey.api.client.Client.access$000(Client.java:119)
at com.sun.jersey.api.client.Client$1.f(Client.java:192)
at com.sun.jersey.api.client.Client 1. f ( C l i e n t . j a v a : 188 ) a t c o m . s u n . j e r s e y . s p i . i n j e c t . E r r o r s . p r o c e s s W i t h E r r o r s ( E r r o r s . j a v a : 193 ) a t c o m . s u n . j e r s e y . a p i . c l i e n t . C l i e n t . < i n i t > ( C l i e n t . j a v a : 188 ) a t c o m . s u n . j e r s e y . a p i . c l i e n t . C l i e n t . < i n i t > ( C l i e n t . j a v a : 171 ) a t o r g . a p a c h e . h a d o o p . y a r n . c l i e n t . a p i . i m p l . T i m e l i n e C o n n e c t o r . s e r v i c e I n i t ( T i m e l i n e C o n n e c t o r . j a v a : 122 ) a t o r g . a p a c h e . h a d o o p . s e r v i c e . A b s t r a c t S e r v i c e . i n i t ( A b s t r a c t S e r v i c e . j a v a : 164 ) a t o r g . a p a c h e . h a d o o p . s e r v i c e . C o m p o s i t e S e r v i c e . s e r v i c e I n i t ( C o m p o s i t e S e r v i c e . j a v a : 108 ) a t o r g . a p a c h e . h a d o o p . y a r n . c l i e n t . a p i . i m p l . T i m e l i n e R e a d e r C l i e n t I m p l . s e r v i c e I n i t ( T i m e l i n e R e a d e r C l i e n t I m p l . j a v a : 99 ) a t o r g . a p a c h e . h a d o o p . s e r v i c e . A b s t r a c t S e r v i c e . i n i t ( A b s t r a c t S e r v i c e . j a v a : 164 ) a t o r g . a p a c h e . h a d o o p . y a r n . c l i e n t . a p i . i m p l . A H S v 2 C l i e n t I m p l . s e r v i c e I n i t ( A H S v 2 C l i e n t I m p l . j a v a : 63 ) a t o r g . a p a c h e . h a d o o p . s e r v i c e . A b s t r a c t S e r v i c e . i n i t ( A b s t r a c t S e r v i c e . j a v a : 164 ) a t o r g . a p a c h e . h a d o o p . y a r n . c l i e n t . a p i . i m p l . Y a r n C l i e n t I m p l . s e r v i c e I n i t ( Y a r n C l i e n t I m p l . j a v a : 207 ) a t o r g . a p a c h e . h a d o o p . s e r v i c e . A b s t r a c t S e r v i c e . i n i t ( A b s t r a c t S e r v i c e . j a v a : 164 ) a t o r g . a p a c h e . f l i n k . y a r n . Y a r n C l u s t e r C l i e n t F a c t o r y . g e t C l u s t e r D e s c r i p t o r ( Y a r n C l u s t e r C l i e n t F a c t o r y . j a v a : 82 ) a t o r g . a p a c h e . f l i n k . y a r n . Y a r n C l u s t e r C l i e n t F a c t o r y . c r e a t e C l u s t e r D e s c r i p t o r ( Y a r n C l u s t e r C l i e n t F a c t o r y . j a v a : 61 ) a t o r g . a p a c h e . f l i n k . y a r n . Y a r n C l u s t e r C l i e n t F a c t o r y . c r e a t e C l u s t e r D e s c r i p t o r ( Y a r n C l u s t e r C l i e n t F a c t o r y . j a v a : 43 ) a t o r g . a p a c h e . f l i n k . c l i e n t . d e p l o y m e n t . e x e c u t o r s . A b s t r a c t J o b C l u s t e r E x e c u t o r . e x e c u t e ( A b s t r a c t J o b C l u s t e r E x e c u t o r . j a v a : 73 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . e n v i r o n m e n t . S t r e a m E x e c u t i o n E n v i r o n m e n t . e x e c u t e A s y n c ( S t r e a m E x e c u t i o n E n v i r o n m e n t . j a v a : 1957 ) a t o r g . a p a c h e . f l i n k . c l i e n t . p r o g r a m . S t r e a m C o n t e x t E n v i r o n m e n t . e x e c u t e A s y n c ( S t r e a m C o n t e x t E n v i r o n m e n t . j a v a : 137 ) a t o r g . a p a c h e . f l i n k . c l i e n t . p r o g r a m . S t r e a m C o n t e x t E n v i r o n m e n t . e x e c u t e ( S t r e a m C o n t e x t E n v i r o n m e n t . j a v a : 76 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . e n v i r o n m e n t . S t r e a m E x e c u t i o n E n v i r o n m e n t . e x e c u t e ( S t r e a m E x e c u t i o n E n v i r o n m e n t . j a v a : 1834 ) a t o r g . a p a c h e . f l i n k . s t r e a m i n g . a p i . s c a l a . S t r e a m E x e c u t i o n E n v i r o n m e n t . e x e c u t e ( S t r e a m E x e c u t i o n E n v i r o n m e n t . s c a l a : 801 ) a t c o m . l e a d e o n . s e r v i c e . b o s l o g i n d t l 1.f(Client.java:188) at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193) at com.sun.jersey.api.client.Client.<init>(Client.java:188) at com.sun.jersey.api.client.Client.<init>(Client.java:171) at org.apache.hadoop.yarn.client.api.impl.TimelineConnector.serviceInit(TimelineConnector.java:122) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.client.api.impl.TimelineReaderClientImpl.serviceInit(TimelineReaderClientImpl.java:99) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.client.api.impl.AHSv2ClientImpl.serviceInit(AHSv2ClientImpl.java:63) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:207) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.flink.yarn.YarnClusterClientFactory.getClusterDescriptor(YarnClusterClientFactory.java:82) at org.apache.flink.yarn.YarnClusterClientFactory.createClusterDescriptor(YarnClusterClientFactory.java:61) at org.apache.flink.yarn.YarnClusterClientFactory.createClusterDescriptor(YarnClusterClientFactory.java:43) at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:73) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1957) at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:137) at org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1834) at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:801) at com.leadeon.service.bos_login_dtl 1.f(Client.java:188)atcom.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)atcom.sun.jersey.api.client.Client.<init>(Client.java:188)atcom.sun.jersey.api.client.Client.<init>(Client.java:171)atorg.apache.hadoop.yarn.client.api.impl.TimelineConnector.serviceInit(TimelineConnector.java:122)atorg.apache.hadoop.service.AbstractService.init(AbstractService.java:164)atorg.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)atorg.apache.hadoop.yarn.client.api.impl.TimelineReaderClientImpl.serviceInit(TimelineReaderClientImpl.java:99)atorg.apache.hadoop.service.AbstractService.init(AbstractService.java:164)atorg.apache.hadoop.yarn.client.api.impl.AHSv2ClientImpl.serviceInit(AHSv2ClientImpl.java:63)atorg.apache.hadoop.service.AbstractService.init(AbstractService.java:164)atorg.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:207)atorg.apache.hadoop.service.AbstractService.init(AbstractService.java:164)atorg.apache.flink.yarn.YarnClusterClientFactory.getClusterDescriptor(YarnClusterClientFactory.java:82)atorg.apache.flink.yarn.YarnClusterClientFactory.createClusterDescriptor(YarnClusterClientFactory.java:61)atorg.apache.flink.yarn.YarnClusterClientFactory.createClusterDescriptor(YarnClusterClientFactory.java:43)atorg.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:73)atorg.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1957)atorg.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:137)atorg.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)atorg.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1834)atorg.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:801)atcom.leadeon.service.boslogindtl.main(bos_login_dtl.scala:63)
at com.leadeon.service.bos_login_dtl.main(bos_login_dtl.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)

2.原因:

hadoop-common依赖版本不对

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
<!--            <version>2.6.5</version>-->
            <version>3.1.1</version>
            <scope>provided</scope>
        </dependency>

3.具体实现步骤:

3.1pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.leadeon</groupId>
    <artifactId>bigdata</artifactId>
    <version>1.0-SNAPSHOT</version>

    <packaging>jar</packaging>
    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <scala.version>2.11.12</scala.version>
        <flink.version>1.13.3</flink.version>
        <mockito.version>2.8.9</mockito.version>
        <powermock.version>1.7.4</powermock.version>
        <encoding>UTF-8</encoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.11</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-filesystem_2.11</artifactId>
            <version>1.11.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.1.1</version>
            <scope>provided</scope>
        </dependency>

    </dependencies>

    <!--scala类打包需要-->
    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <classifier>dist</classifier>
                    <appendAssemblyId>true</appendAssemblyId>
                    <descriptorRefs>
                        <descriptor>jar-with-dependencies</descriptor>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <id>scala-compile-first</id>
                        <phase>process-resources</phase>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <scalaVersion>${scala.version}</scalaVersion>
                    <recompileMode>incremental</recompileMode>
                    <useZincServer>false</useZincServer>
                    <args>
                        <arg>-unchecked</arg>
                        <arg>-deprecation</arg>
                        <arg>-feature</arg>
                    </args>
                    <jvmArgs>
                        <jvmArg>-Xms1024m</jvmArg>
                        <jvmArg>-Xmx1024m</jvmArg>
                    </jvmArgs>
                    <javacArgs>
                        <javacArg>-source</javacArg>
                        <javacArg>${java.version}</javacArg>
                        <javacArg>-target</javacArg>
                        <javacArg>${java.version}</javacArg>
                        <javacArg>-Xlint:all,-serial,-path</javacArg>
                    </javacArgs>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.antlr</groupId>
                <artifactId>antlr4-maven-plugin</artifactId>
                <version>4.7</version>
                <executions>
                    <execution>
                        <id>antlr</id>
                        <goals>
                            <goal>antlr4</goal>
                        </goals>
                        <phase>none</phase>
                    </execution>
                </executions>
                <configuration>
                    <outputDirectory>src/test/java</outputDirectory>
                    <listener>true</listener>
                    <treatWarningsAsErrors>true</treatWarningsAsErrors>
                </configuration>
            </plugin>
        </plugins>
    </build>


</project>

3.2自定义hdfs分区路径

注意:早一天要提早建好分区,要不了分区数据不会进去hive元数据中,数据也查询不出来。
alter table bus_login_dtl_bak2 add if not exists partition(dt=‘20220208’)
location ‘/apps/hive/datahouse/cmbh_real_log/bus_login_dtl/dt=20220208’;

package com.leadeon.utils;

import org.apache.flink.streaming.connectors.fs.Clock;
import org.apache.flink.streaming.connectors.fs.bucketing.BasePathBucketer;
import org.apache.hadoop.fs.Path;

import java.io.File;
import java.text.SimpleDateFormat;
import java.util.Date;

public class HdfsBasePathBucketer extends BasePathBucketer<String> {
    private static final long serialVersionUID = 1L;
    SimpleDateFormat formatter = new SimpleDateFormat("yyyyMMdd");
    String dateString = formatter.format(new Date().getTime());
    @Override
    public Path getBucketPath(Clock clock, Path basePath, String element) {
        Path path = new Path(basePath + File.separator + "dt=" + dateString);
        return path;
    }

}

3.3flink程序(消费kafka数据到hive)

package com.leadeon.service

import com.typesafe.config.ConfigFactory
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
import org.apache.kafka.clients.CommonClientConfigs
import org.apache.kafka.clients.consumer.ConsumerConfig
import java.io.File
import java.text.SimpleDateFormat
import java.util.{Date, Properties}

import com.leadeon.utils.HdfsBasePathBucketer
import org.apache.flink.api.common.serialization.SimpleStringSchema
import org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink

object bos_login_dtl {
  def main(args: Array[String]): Unit = {
    if (args.length != 1) {
      println("usage: set application.conf file path!")
      return
    }
    val Array(brokers, kafkaTopic, group, times,parallelism,  outputHDFS) = getConf(args.head)
    val date = new SimpleDateFormat("yyyyMMdd").format(new Date)

    //1.获取执行器的环境
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.enableCheckpointing(5000)
    env.setParallelism(parallelism.toInt)
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

    //2.kafka配置
    val props = new Properties()
    props.setProperty(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG,brokers)
    props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, group)
    props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true")
    props.setProperty(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000")
    props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
    props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
    props.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")

    //3.设置数据源
    val kafkaDataStream= env.addSource(new FlinkKafkaConsumer[String](kafkaTopic,
      new SimpleStringSchema(), props))

    //4.Storage into hdfs

    val sink = new BucketingSink[String](outputHDFS)
    //    sink.setBucketer(new DateTimeBucketer("yyyyMMdd"))
    sink.setBucketer(new HdfsBasePathBucketer)
    //每个文件最大小限制256M,达到后关闭或创建新文件// 每个文件最大小限制256M,达到后关闭或创建新文件
    sink.setBatchSize(1024 * 1024 * 256L)
    //设定批次滚动时间翻滚间隔30分钟,达到后关闭或创建新文件,和上面的`batchSize`双重检查决定
    sink.setBatchRolloverInterval(times.toLong * 1000L)
    //设定不活动桶时间阈值,超过此值便关闭文件
    sink.setInactiveBucketThreshold(3 * 60 * 1000L)
    //设定检查不活动桶的频率
    sink.setInactiveBucketCheckInterval(30 * 1000L)
    //添加sink 将数据输出到hdfs
//   kafkaDataStream.map(_.replaceAll("\\|#\\$", "\t")).addSink(sink)
    kafkaDataStream.map(new MapOperator).addSink(sink)
    // 打印数据
    kafkaDataStream.print()
    env.execute("login_steaming")


  }

  /**
    * 获取基础配置
    * @param path
    * @return
   *
   */
  def getConf(path: String): Array[String] = {
    val conf = ConfigFactory.parseFile(new File(path))
    Array(conf.getString("conf.brokers"),
      conf.getString("conf.topics"),
      conf.getString("conf.group"),
      conf.getString("conf.times"),
      conf.getString("conf.parallelism"),
      conf.getString("conf.outputHDFS")
    )
  }
}

import org.apache.flink.api.common.functions.MapFunction

class MapOperator extends MapFunction[String, String] {
  @throws[Exception]
  override def map(line: String): String = { // 转化过程,把字符串转化为大写
    val str = line.replaceAll("\\|#\\$", "\t")
    println("处理后的数据  ************ "+str)
    str
  }
}

3.4启动程序

#!/bin/bash
export HADOOP_CLASSPATH=/usr/hdp/3.1.4.0-315/hadoop
export HBASE_CONF_DIR=/usr/hdp/3.1.4.0-315/hbase/conf
export YARN_CONF_DIR=/usr/hdp/3.1.4.0-315/hadoop-yarn/conf
/opt/flink/bin/flink run -m yarn-cluster -p 4 -yjm 1024m -ytm 4096m -ys 5 -c com.leadeon.service.bos_login_dtl /data/work/cmbh/project/flinkJob/bigdata-1.0-SNAPSHOT-jar-with-dependencies.jar /data/work/cmbh/project/flinkJob/application.conf
要使用Flink将批量数据写入Hive中,需要使用FlinkHive Sink。以下是一些步骤: 1. 确保你的Flink集群和Hive集群在同一个Hadoop集群中,并且FlinkHive都能够访问同一个Hadoop文件系统。 2. 在Flink应用程序中添加Hive Sink依赖项。例如,如果你正在使用Maven构建项目,可以在pom.xml中添加以下依赖项: ``` <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-hive_${scala.binary.version}</artifactId> <version>${flink.version}</version> </dependency> ``` 3. 创建Hive并将其注册到Flink中。可以使用FlinkHiveCatalog或HiveMetastoreCatalog。例如,以下是使用HiveMetastoreCatalog注册的示例代码: ``` String catalogName = "myhive"; // Hive catalog name String databaseName = "mydb"; // Hive database name String tableName = "mytable"; // Hive table name // Create Hive table String createTableDDL = "CREATE TABLE IF NOT EXISTS " + tableName + " (id INT, name STRING) " + " PARTITIONED BY (dt STRING) " + " STORED AS ORC"; hiveClient.execute(createTableDDL); // Register Hive table as Flink table HiveCatalog hiveCatalog = new HiveCatalog(catalogName, databaseName, hiveConfDir); hiveCatalog.open(); TableSchema tableSchema = new TableSchema(new String[]{"id", "name", "dt"}, new TypeInformation[]{Types.INT, Types.STRING, Types.STRING}); HiveTableDescriptor hiveTableDescriptor = new HiveTableDescriptor(hiveCatalog, tableName, tableSchema); hiveTableDescriptor.setPartitionKeys(new String[]{"dt"}); tableEnv.registerTableSource(tableName, hiveTableDescriptor.getTableSource()); tableEnv.registerTableSink(tableName, hiveTableDescriptor.getTableSink()); ``` 4. 将数据写入Hive。可以使用Flink的DataSet或DataStream API读取批量数据,并使用FlinkHive Sink将数据写入Hive。例如,以下是使用DataSet API将数据写入Hive的示例代码: ``` DataSet<Row> data = ...; // Batch data to be written to Hive table String partitionValue = "20220101"; // Partition value data.addSink( new HiveSink( tableName, new Configuration(), new TableSchema(new String[]{"id", "name", "dt"}, new TypeInformation[]{Types.INT, Types.STRING, Types.STRING}), new String[]{"dt"}, new String[]{partitionValue} ) ); ``` 当你运行Flink应用程序时,数据将被写入Hive的指定分区中。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值