impala编译安装

前言

本文主要关于Impala-cdh5-2.12.0_5.16.1 源码编译与安装~

背景

公司需要将hdfs迁移到腾讯云的chdfs。chdfs实现了hdfs的协议,可以按照数据实际内存来付费,会节省不少钱。然而在测试过程中发现impala并不兼容chdfs会上报不支持ofs。需要改动内核源码~。根据腾讯云同事提供的部分源码改动方案改完后,需要自己打包编译。下面大概记录下编译遇到的一些坑

E0312 14:35:09.345242 358875 impala-server.cc:285] Currently configured default filesystem: CHDFSHadoopFileSystemAdapter. fs.defaultFS (ofs://f4mjxuquwlp-p2WP.chdfs.ap-shanghai.myqcloud.com/) is not supported

源码下载

https://codeload.github.com/cloudera/Impala/zip/cdh5-2.12.0_5.16.1

yum 安装依赖

yum -y install git ant libevent-devel automake libtool flex bison gcc-c++ openssl-devel make cmake
yum -y install doxygen.x86_64 glib-devel python-devel bzip2-devel svn libevent-devel krb5-workstation
yum -y install openldap-devel db4-devel python-setuptools python-pip cyrus-sasl* postgresql postgresql-server ant-nodeps lzo-devel lzop

编译

1、设置环境变量 IMPALA_HOME
2、设置 /etc/default/bigtop-utils 中的JAVA_HOME
3、编译命令 ./buildall.sh -notests -so

编译过程中遇到的问题和解决方案

1、下载python包过慢

Downloading Python dependencies
~/Impala-cdh5-2.12.0_5.16.1/infra/python/deps ~/wangkai/Impala-cdh5-2.12.0_5.16.1
Getting package info from https://pypi.python.org/simple/allpairs/
File with matching digest already exists, skipping AllPairs-2.0.1.tar.gz
Getting package info from https://pypi.python.org/simple/boto3/
File with matching digest already exists, skipping boto3-1.2.3.tar.gz
Getting package info from https://pypi.python.org/simple/simplejson/
File with matching digest already exists, skipping simplejson-3.3.0.tar.gz
Getting package info from https://pypi.python.org/simple/botocore/
File with matching digest already exists, skipping botocore-1.3.30.tar.gz
Getting package info from https://pypi.python.org/simple/python_dateutil/
File with matching digest already exists, skipping python-dateutil-2.5.2.tar.gz
Getting package info from https://pypi.python.org/simple/six/

日志可以看出会去下载python包,如果已经存在就跳过,由于服务器下载比较慢,有些包可以手动下载完丢进去。目录如下:
$IMPALA_HOME/infra/python/deps/
在这里插入图片描述

下载部分代码在 ./buildall.sh

bootstrap_dependencies() {
  # Populate necessary thirdparty components unless it's set to be skipped.
  if [[ "${SKIP_TOOLCHAIN_BOOTSTRAP}" = true ]]; then
    echo "SKIP_TOOLCHAIN_BOOTSTRAP is true, skipping download of Python dependencies."
    echo "SKIP_TOOLCHAIN_BOOTSTRAP is true, skipping toolchain bootstrap."
  else
    echo "Downloading Python dependencies"
    # Download all the Python dependencies we need before doing anything
    # of substance. Does not re-download anything that is already present.
    // 下载python包
    if ! "$IMPALA_HOME/infra/python/deps/download_requirements"; then
      echo "Warning: Unable to download Python requirements."
      echo "Warning: bootstrap_virtualenv or other Python-based tooling may fail."
    else
      echo "Finished downloading Python dependencies"
    fi

    echo "Downloading and extracting toolchain dependencies."
    "$IMPALA_HOME/bin/bootstrap_toolchain.py"
    echo "Toolchain bootstrap complete."
  fi
}

下载过慢的话可以手动下载丢入上面的包。还有后面编译的时候可以把上面下载的那段代码注释掉,跳过校验。校验也是比较慢的,~~~~~

2、下载c++的包过慢
Downloading and extracting toolchain dependencies.
impala编译过程在python包下载后回去下载c++的包。同样下载会很慢,可以自行下载后丢入。
目录为:$IMPALA_HOME//toolchain/
在这里插入图片描述
3、jar包下载失败
我在编译 fe这个项目的时候碰到了失败。主要原因是pom文件中的jar包下载不到,有些包已经不存在配置的仓库源里了。可能版本比较老了…
错误如下,有蛮多这种包找不到的,还有下载慢的。下载慢的自己下载放到本地仓库~~

[WARNING] The POM for net.sourceforge.czt.dev:cup-maven-plugin:jar:1.6-cdh is missing, no dependency information available

经过不断尝试发现需要修改 impala-parent 工程下的pom.xml。手动添加一些仓库地址

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.impala</groupId>
  <artifactId>impala-parent</artifactId>
  <version>0.1-SNAPSHOT</version>
  <packaging>pom</packaging>
  <name>Apache Impala Parent POM</name>

  <properties>
    <surefire.reports.dir>${env.IMPALA_LOGS_DIR}/fe_tests</surefire.reports.dir>
    <jacoco.skip>true</jacoco.skip>
    <jacoco.data.file>${env.IMPALA_FE_TEST_COVERAGE_DIR}/jacoco.exec</jacoco.data.file>
    <jacoco.report.dir>${env.IMPALA_FE_TEST_COVERAGE_DIR}</jacoco.report.dir>
    <test.hive.testdata>${project.basedir}/../testdata/target/AllTypes.txt</test.hive.testdata>
    <backend.library.path>${env.IMPALA_HOME}/be/build/debug/service:${env.IMPALA_HOME}/be/build/release/service</backend.library.path>
    <beeswax_port>21000</beeswax_port>
    <impalad>localhost</impalad>
    <testExecutionMode>reduced</testExecutionMode>
    <hadoop.version>${env.IMPALA_HADOOP_VERSION}</hadoop.version>
    <hive.version>${env.IMPALA_HIVE_VERSION}</hive.version>
    <hive.major.version>${env.IMPALA_HIVE_MAJOR_VERSION}</hive.major.version>
    <sentry.version>${env.IMPALA_SENTRY_VERSION}</sentry.version>
    <hbase.version>${env.IMPALA_HBASE_VERSION}</hbase.version>
    <parquet.version>${env.IMPALA_PARQUET_VERSION}</parquet.version>
    <kite.version>${env.IMPALA_KITE_VERSION}</kite.version>
    <thrift.version>${env.IMPALA_THRIFT_JAVA_VERSION}</thrift.version>
    <impala.extdatasrc.api.version>1.0-SNAPSHOT</impala.extdatasrc.api.version>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <kudu.version>${env.KUDU_JAVA_VERSION}</kudu.version>
    <commons-io.version>2.6</commons-io.version>
    <slf4j.version>1.7.25</slf4j.version>
    <junit.version>4.12</junit.version>
    <!-- Beware compatibility requirements with Thrift and
         KMS; see IMPALA-4210. -->
    <httpcomponents.core.version>4.2.5</httpcomponents.core.version>
    <yarn-extras.version>${project.version}</yarn-extras.version>
    <eclipse.output.directory>eclipse-classes</eclipse.output.directory>
    <guava.version>11.0.2</guava.version>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <repositories>
    <repository>
       <id>apache.snapshots</id>
       <name>Apache Development Snapshot Repository</name>
        <url>https://repository.apache.org/content/repositories/snapshots/</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>

    <repository>
      <id>cdh.rcs.releases.repo</id>
      <url>https://repository.cloudera.com/content/groups/cdh-releases-rcs</url>
      <name>CDH Releases Repository</name>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
        <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
     <repository>
        <id>alimavenspring</id>
        <url>https://maven.aliyun.com/repository/spring/</url>
    </repository>
        <repository>
            <id>tengxun</id>
            <url>https://search.maven.org/artifact/</url>
        </repository>

     <repository>
        <id>alimaven</id>
        <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
    </repository>
        <repository>
        <id>clouderanew</id>
        <url>https://repository.cloudera.com/artifactory/repo/</url>
    </repository>
    <repository>
      <id>cdh.releases.repo</id>
      <url>https://repository.cloudera.com/content/repositories/releases</url>
      <name>CDH Releases Repository</name>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>cdh.snapshots.repo</id>
      <url>https://repository.cloudera.com/content/repositories/snapshots</url>
      <name>CDH Snapshots Repository</name>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>cloudera.thirdparty.repo</id>
      <url>https://repository.cloudera.com/content/repositories/third-party</url>
      <name>Cloudera Third Party Repository</name>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>

    <!-- This is needed for java-cup. TODO add the plugin to our maven repo -->
    <repository>
      <id>sonatype-nexus-snapshots</id>
      <name>Sonatype Nexus Snapshots</name>
      <url>https://oss.sonatype.org/content/repositories/snapshots</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
  </repositories>

  <pluginRepositories>
     <pluginRepository>
        <id>clouderanew</id>
        <url>https://repository.cloudera.com/artifactory/repo/</url>
    </pluginRepository>
     <pluginRepository>
        <id>alimaven</id>
        <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
    </pluginRepository>
     <pluginRepository>
        <id>alimavenspring</id>
        <url>https://maven.aliyun.com/repository/spring/</url>
    </pluginRepository>
     <pluginRepository>
        <id>tengxun</id>
        <url>https://search.maven.org/artifact/</url>
    </pluginRepository>
     <pluginRepository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </pluginRepository>
    <pluginRepository>
      <id>cloudera.thirdparty.repo</id>
      <url>https://repository.cloudera.com/content/repositories/third-party</url>
      <name>Cloudera Third Party Repository</name>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </pluginRepository>
    <pluginRepository>
      <id>cloudera.snapshot.repo</id>
      <url>https://repository.cloudera.com/content/repositories/snapshots</url>
      <name>Cloudera Snapshot Repository</name>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </pluginRepository>

    <!-- This is needed for the cup maven plugin. TODO add the plugin to our maven repo -->
    <pluginRepository>
      <id>sonatype-nexus-snapshots</id>
      <name>Sonatype Nexus Snapshots</name>
      <url>https://oss.sonatype.org/content/repositories/snapshots</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </pluginRepository>
  </pluginRepositories>

</project>

END:经历九九八十一难 最终编译成功了~~~

部署

一、修改三个配置文件
1、 bin/start-catalogd.sh 启动catalog的文件会去连接metastore
需要增加
source $IMPALA_HOME/bin/impala-config.sh
CATALOGD_ARGS=" -log_dir=/var/log/impala "
2、bin/start-statestored.sh 跟踪集群中的Impalad的健康状态及位置信息
需要增加
source $IMPALA_HOME/bin/impala-config.sh
STATESTORED_ARGS="-log_dir=/var/log/impala -state_store_port=24000"
3、bin/set-classpath.sh
在这里插入图片描述
4、bin/start-impalad.sh
在这里插入图片描述
二、启动
2972 2021-03-14 17:00:30 nohup ./start-statestored.sh &
2973 2021-03-14 17:00:34 nohup ./start-catalogd.sh &
2974 2021-03-14 17:00:38 nohup ./start-impalad.sh & 计算节点只要启动这个即可
三、连接测试
在这里插入图片描述

结语

1、编译impala需要一定的耐心,很多包下载很慢很慢。。
2、对maven配置的不熟悉导致走了很多弯路~~~~~准备好好学习下maven配置管理
补充一下mvn下载日志可以去 $IMPALA_HOME/logs 下查看。可以看到具体的下载进度~

参考文章

https://github.com/TencentEMapReduce/impala/commit/14cf694293a60174fd3c064f76ee7708d98fc2c7
https://blog.csdn.net/qqqq0199181/article/details/98515118

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值