Linux环境下安装FLink1.10.0并启动SQL-client读取Hive1.2.1的数据

最新推荐文章于 2024-03-28 11:20:34 发布

wangyu_qiuxue

最新推荐文章于 2024-03-28 11:20:34 发布

阅读量2.9k

点赞数 1

分类专栏： Flink 初学文章标签：大数据 hadoop flink

本文链接：https://blog.csdn.net/wangyu_qiuxue/article/details/105349239

版权

本文详细介绍了在Linux环境下如何安装Flink 1.10.0，并通过配置启动YARN session。在过程中遇到了包括依赖冲突、Hadoop版本兼容性等问题，通过添加缺失的jar文件、解决版本冲突以及编译源码等方法成功解决。最后成功使用SQL-client读取了Hive 1.2.1的数据。

摘要由CSDN通过智能技术生成

Linux环境下安装FLink1.10.0并启动SQL-client读取Hive数据

首先去官网下载Flink1.10.0的tgz的包，教程如上篇文章上半部分流程一样，然后配置一下FLINK_HOME/conf/sql-client-defaults.yaml：

catalogs:
   - name: myhive   #自己定个名字就行
     type: hive
     hive-conf-dir: /etc/hive/conf  # hive-site.xml的路径
     hive-version: 1.2.1    # hive版本
 execution:
  # select the implementation responsible for planning table programs
  # possible values are 'blink' (used by default) or 'old'
  planner: blink
  # 'batch' or 'streaming' execution
  type: batch  #这里streaming和batch都行
  # allow 'event-time' or only 'processing-time' in sources
  time-characteristic: event-time
  # interval in ms for emitting periodic watermarks
  periodic-watermarks-interval: 200
  # 'changelog' or 'table' presentation of results
  result-mode: table
  # maximum number of maintained rows in 'table' presentation of results
  max-table-result-rows: 1000000
  # parallelism of the program
  parallelism: 1
  # maximum parallelism
  max-parallelism: 128
  # minimum idle state retention in ms
  min-idle-state-retention: 0
  # maximum idle state retention in ms
  max-idle-state-retention: 0
  # current catalog ('default_catalog' by default)
  current-catalog: myhive
  # current database of the current catalog (default database of the catalog by default)
  current-database: secoo_tmp
  # controls how table programs are restarted in case of a failures
  restart-strategy:
    # strategy type
    # possible values are "fixed-delay", "failure-rate", "none", or "fallback" (default)
    type: fallback

配置/etc/profile文件：

export HADOOP_HOME=/usr/hdp/2.4.0.0-169/hadoop
export YARN_CONF_DIR=/etc/hadoop/conf
export HADOOP_CLASSPATH=`hadoop classpath` #非常重要，不添加 运行flink命令时会报错

在FLink安装目录启动yarn-session.sh:

./bin/yarn-session.sh -n 5 -tm 4096 -s 4 -nm 应用名称 -q 队列名称 -d(这个参数可以保证在我们退出客户端时，任务不被立即杀死，还在yarn上持续运行着)

yarn-session的参数介绍
  -n ： 指定TaskManager的数量；
  -d: 以分离模式运行；
  -id：指定yarn的任务ID；
  -j:Flink jar文件的路径;
  -jm：JobManager容器的内存（默认值：MB）;
  -nl：为YARN应用程序指定YARN节点标签;
  -nm:在YARN上为应用程序设置自定义名称;
  -q:显示可用的YARN资源（内存，