组件版本:
Hive 1.1.0
CDH 5.15.0
Atlas 1.2.0
Centos 8
Maven 3.6.3
文章目录
Java 环境安装
yum install java-1.8.0-openjdk.x86_64

Tips: Migrate from CentOS 8 to CentOS Stream 8
dnf --disablerepo ‘*’ --enablerepo=extras swap centos-linux-repos centos-stream-repos
dnf distro-sync
Atlas编译安装
下载源码包
wget https://archive.apache.org/dist/atlas/1.2.0/apache-atlas-1.2.0-sources.tar.gz
解压编译
sudo mvn -Phadoop-2 -Pdist,embedded-hbase-solr -DskipTests -Dmaven.javadoc.skip=true clean package

Atlas安装包路径
/apache-atlas-sources-1.2.0/distro/target/apache-atlas-1.2.0-server/apache-atlas-1.2.0
修改包路径:
mv apache-atlas-1.2.0 /home/software/apache-atlas-1.2.0-server
Solr配置
修改Solr配置文件
vi /home/software/apache-atlas-1.2.0-server/solr/bin/solr.in.sh
修改Zookeeper Host:
ZK_HOST=“xxxx:2181/solr”
修改时区:
SOLR_TIMEZONE=“Asia/Shanghai”
Zookeeper添加Solr节点
在zookeeper添加solr节点:
create /solr "solr"

启动Solr
bin/solr start


Atlas与Solr集成
修改atlas的配置文件
vi atlas-application.properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=xxxxxx:2181/solr
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true
Copy Atlas Conf目录下的solr目录到solr安装目录:
cp -r /home/software/apache-atlas-1.2.0-server/conf/solr ./solr_conf
SOLR_HOME下执行:
bin/solr create -c vertex_index -d solr_conf -shards 1 -replicationFactor 1
bin/solr create -c edge_index -d solr_conf -shards 1 -replicationFactor 1
bin/solr create -c fulltext_index -d solr_conf -shards 1 -replicationFactor 1

Atlas与Kafka集成
修改atlas的配置文件
vi atlas-application.properties
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=xxxxxxx:2181 #ZK地址
atlas.kafka.bootstrap.servers=xxxxxxx:9092 #kafka地址
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=true
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
创建Topic
bin/atlas_kafka_setup.py

Atlas与HBase集成
修改atlas的配置文件
vi atlas-application.properties
atlas.audit.hbase.zookeeper.quorum=xxxxxx:2181 #HBase ZK地址
atlas.graph.storage.hostname=xxxxxxx:2181 #HBase ZK地址
Atlas与Hive集成
备注
由于当前环境Hive为1.1.0版本,所以默认无法展示字段血缘,Hive需要打两个Patch:
14706
13112
重新编译,用编译后的hive-exec-1.1.0-cdh5.15.0-core.jar,hive-exec-1.1.0-cdh5.15.0.jar替换原jar包
高版本Hive可以忽略此步骤
修改atlas的配置文件
vi atlas-application.properties
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
将atlas的hive-hook依赖包放在hive的auxlib下
hive-hook Jar包路径:
distro/target/apache-atlas-1.2.0-hive-hook/apache-atlas-hive-hook-1.2.0/hook/hive
备注:将 atlas-application.properties 压缩到 atlas-plugin-classloader-0.4.8.jar 内:
zip -u atlas-plugin-classloader-1.2.0.jar atlas-application.properties
此外,需要集成额外的依赖:
<dependency>
<groupId>com.fasterxml.jackson.jaxrs</groupId>
<artifactId>jackson-jaxrs-json-provider</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-jaxb-annotations</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.jaxrs</groupId>
<artifactId>jackson-jaxrs-base</artifactId>
<version>${jackson.version}</version>
</dependency>
下载如下依赖jar包放到auxlib下:

配置hive-site.xml
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
修改CDH配置:



重启Hive生效配置
set hive.exec.post.hooks;
set hive.aux.jars.path;

启动Atlas
bin/atlas_start.py


表血缘

字段血缘


3万+

被折叠的 条评论
为什么被折叠?



