文章目录
背景
在文章元数据管理工具Atlas学习笔记之介绍中,我介绍了Atlas这一元数据管理工具,现在,就要将它进行编译,并且和一些常见的大数据组件进行集成。
环境
操作系统:CentOS7,64位虚拟机,VMware WorkStation虚拟机管理器;
软件及其版本和位置:
软件名称
版本
maven
3.6.0(必须是3.5.0及以上,否则无法编译atlas)
Hadoop
2.7.2
ZooKeeper
3.4.9
Kafka
kafka_2.11-0.11.0.0,scala2.11,kafka0.11.0.0
HBase
1.3.1
Hive
2.3.7
MySQL
5.7.26
solr
7.7.3
Atlas
1.2.0
Atlas安装
Maven~MySQL的安装配置此处略过,自行搜索,此处记录solr和Atlas的安装配置过程。
solr
可执行文件下载地址:https://www.apache.org/dyn/closer.lua/lucene/solr/7.7.3/solr-7.7.3.tgzaction=download
解压:
[root@scentos szc]# tar -zxvf solr-7.7.3.tgz
solr不能以root身份执行,因此类似ES,需要为其创建用户、设置密码(solr),并把solr根目录的所属者改为该用户
root@scentos szc]# useradd solr
[root@scentos szc]# passwd solr
Changing password for user solr.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[root@scentos szc]# chown -R solr:solr solr-7.7.3/
用solr用户身份进入solr根目录,修改配置文件bin/solr.in.sh:
[root@scentos szc]# su solr
[solr@scentos szc]$ cd solr-7.7.3/
[solr@scentos solr-7.7.3]$ vim bin/solr.in.sh
将ZK_HOST所在行取消注释,并改为自己ZooKeeper的ip:端口:
ZK_HOST="scentos:2181"
Atlas
下载地址: https://dlcdn.apache.org/atlas/1.2.0/apache-atlas-1.2.0-sources.tar.gz
解压、编译:
[root@scentos szc]# tar -zxvf apache-atlas-1.2.0-sources.tar.gz
[root@scentos szc]# mv apache-atlas-sources-1.2.0/ atlas-1.2.0/
[root@scentos szc]# cd atlas-1.2.0/
[root@scentos atlas-1.2.0]# export MAVEN_OPTS="-Xms2g -Xmx2g"
[root@scentos atlas-1.2.0]# mvn clean -DskipTests install && mvn clean -DskipTests package -Pdist
编译结果:
进入distro/target/apache-atlas-1.2.0-server/apache-atlas-1.2.0目录,这就是我们atlas-server的根目录。
Atlas启动
启动Hadoop、ZooKeeper、HBase、Kafka、Hive和MySQL
分享一下启动脚本:
Hadoop
start-all.sh
mr-jobhistory-daemon.sh start historyserver
httpfs.sh start
启动ZooKeeper
/home/szc/zookeeper/zookeeper-3.4.9/bin/zkServer.sh start
启动HBase
/home/szc/hbase-1.3.1/bin/start-hbase.sh
启动Kafka
/home/szc/kafka_2.11-0.11.0.0/bin/kafka-server-start.sh -daemon /home/szc/kafka_2.11-0.11.0.0/config/server.properties
启动MySQL
Windows下通过打开Navicat启动,先执行一下设置时区的SQL脚本:
set global time_zone='+8:00';
启动Hive
/home/szc/apache-hive-2.3.7/bin/hive --service metastore &
启动solr
用solr用户身份启动solr:
[solr@scentos solr-7.7.3]$ bin/solr start
在windows下访问IP:8983,可以看到如下web界面:
配置atlas
进入atlas-server根目录:
[root@scentos atlas-1.2.0]# cd distro/target/apache-atlas-1.2.0-server/apache-atlas-1.2.0/
[root@scentos apache-atlas-1.2.0]#
集成HBase
在conf/atlas-application.properties中,将atlas.graph.storage.backend从hbase2改成hbase、atlas.graph.storage.hostname赋值为ZooKeeper的URL:
atlas.graph.storage.backend=hbase
....
atlas.graph.storage.hostname=scentos:2181
在conf/atlas-env.sh中加上HBase的配置目录:
export HBASE_CONF_DIR=/home/szc/hbase-1.3.1/conf
集成Solr
在conf/atlas-application.properties中,将atlas.graph.index.search.solr.zookeeper-url赋值为ZooKeeper的URL:
atlas.graph.index.search.solr.zookeeper-url=scentos:2181
到Solr根目录中,以用户solr身份执行以下命令,创建solr数据集:
[solr@scentos solr-7.7.3]$ bin/solr create -c vertex_index -d /home/szc/atlas-1.2.0/distro/target/apache-atlas-1.2.0-server/apache-atlas-1.2.0/conf/solr/ -shards 3 -replicationFactor 2
Created collection 'vertex_index' with 3 shard(s), 2 replica(s) with config-set 'vertex_index'
[solr@scentos solr-7.7.3]$ bin/solr create -c edge_index -d /home/szc/atlas-1.2.0/distro/target/apache-atlas-1.2.0-server/apache-atlas-1.2.0/conf/solr/ -shards 3 -replicationFactor 2
Created collection 'edge_index' with 3 shard(s), 2 replica(s) with config-set 'edge_index'
[solr@scentos solr-7.7.3]$ bin/solr create -c fulltext_index -d /home/szc/atlas-1.2.0/distro/target/apache-atlas-1.2.0-server/apache-atlas-1.2.0/conf/solr/ -shards 3 -replicationFactor 2
Created collection 'fulltext_index' with 3 shard(s), 2 replica(s) with config-set 'fulltext_index'
[solr@scentos solr-7.7.3]$
上面三条命令除了vertex_index、edge_index和fulltext_index外,一模一样,-d参数值要改成自己atlas根目录下conf/solr目录路径
集成Kafka
在conf/atlas-application.properties中,修改以下属性值:
120 ######### Notification Configs #########
atlas.notification.embedded=false # 不使用atlas内置的kafka
atlas.kafka.data=/home/szc/kafka_2.11-0.11.0.0/data # kafka的数据存储目录
atlas.kafka.zookeeper.connect=scentos:2181 # ZooKeeper的URL
atlas.kafka.bootstrap.servers=scentos:9092 # kafka服务器的URL
kafka的数据存储目录和kafka服务器URL参见kafka的配置文件:kafkaHome/config/server.properties,分别对应log.dirs和listeners属性值
Server配置
在conf/atlas-application.properties中,修改以下属性值:
## Server port configuration
atlas.server.http.port=21000 # 注释打开
....
atlas.rest.address=http://scentos:21000 # scentos改成自己CentOS的IP或域名
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false # 打开该行注释
######### Entity Audit Configs #########
....
atlas.audit.hbase.zookeeper.quorum=scentos:2181 # scentos改成自己CentOS的IP或域名,此处为ZooKeeper的URL
在conf/atlas-log4j.xml中,打开下面代码的注释,以记录性能指标:
<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${atlas.log.dir}/atlas_perf.log" />
<param name="datePattern" value="'.'yyyy-MM-dd" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d|%t|%m%n" />
</layout>
</appender>
<logger name="org.apache.atlas.perf" additivity="false">
<level value="debug" />
<appender-ref ref="perf_appender" />
</logger>
集成Hive
在conf/atlas-application.properties中,添加以下代码:
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
到Hive根目录下,在conf/hive-site.xml中加入以下属性:
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
并在conf/hive-env.sh文件最后加上下面几行:
export ATLAS_HIVE_HOOK_JARS=""
export ATLAS_HIVE_HOOK_HOME=/home/szc/apache-atlas-sources-1.2.0/distro/target/apache-atlas-1.2.0-hive-hook/apache-atlas-hive-hook-1.2.0/hook/hive
for jar in `ls $ATLAS_HIVE_HOOK_HOME | grep jar`; do
export ATLAS_HIVE_HOOK_JARS=$ATLAS_HIVE_HOOK_JARS:$ATLAS_HIVE_HOOK_HOME/$jar
done
for jar in `ls $ATLAS_HIVE_HOOK_HOME/atlas-hive-plugin-impl`; do
export ATLAS_HIVE_HOOK_JARS=$ATLAS_HIVE_HOOK_JARS:$ATLAS_HIVE_HOOK_HOME/atlas-hive-plugin-impl/$jar
done
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH$ATLAS_HIVE_HOOK_JARS
最后,回到atlas-server根目录,把atlas的atlas-application.properties文件复制到HiveHome/conf中
[root@scentos apache-atlas-1.2.0]# cp conf/atlas-application.properties /home/szc/apache-hive-2.3.7/conf/
下载以下三个文件:
文件名
下载链接
jackson-module-jaxb-annotations-2.9.9.jar
https://repo1.maven.org/maven2/com/fasterxml/jackson/module/jackson-module-jaxb-annotations/2.9.9/jackson-module-jaxb-annotations-2.9.9.jar
jackson-jaxrs-base-2.9.9.jar
https://repo1.maven.org/maven2/com/fasterxml/jackson/jaxrs/jackson-jaxrs-base/2.9.9/jackson-jaxrs-base-2.9.9.jar
jackson-jaxrs-json-provider-2.9.9.jar
https://repo1.maven.org/maven2/com/fasterxml/jackson/jaxrs/jackson-jaxrs-json-provider/2.9.9/jackson-jaxrs-json-provider-2.9.9.jar
将这三个jar包,上传到/home/szc/atlas-1.2.0/distro/target/apache-atlas-1.2.0-hive-hook/apache-atlas-hive-hook-1.2.0/hook/hive/atlas-hive-plugin-impl/目录下,然后重启hive,执行以下脚本文件即可:
pid=`ps -ef | grep hive.metastore | grep -v grep | awk '{print $2}'`
kill ${pid}
/home/szc/apache-hive-2.3.7/bin/hive --service metastore &
启动atlas
在atlas-server根目录下,执行文件:
[root@scentos apache-atlas-1.2.0]# bin/atlas_start.py
等待一两分钟,见到如下结果就说明启动成功:
而后在浏览器里访问CentOS_ip:21000,会出现以下界面:
默认用户名密码都是admin,出现以下界面,则atlas配置并启动成功。
如果出现问题,可通过atlas-server根目录/logs/*.out或application.log查看报错信息。
结语
下一篇文章,我以Hive为例,介绍Atlas网页端的操作方法。