编译
一、环境
- JDK1.8 下载地址 https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
- MAVEN3.5 下载地址 http://maven.apache.org/download.cgi
maven是免安装的,这点很棒,直接下载来,解压即可。
但是要在/etc/profile 中加上环境变量
export MAVEN_HOME=/opt/maven3.6.1
export PATH=$PATH:$MAVEN_HOME/bin
修改setting.xml文件增加阿里云配置
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>*,!cloudera</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
使用一下命令验证
[root@hadoop3 conf]# mvn -v
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /opt/maven3.6.3
Java version: 1.8.0_144, vendor: Oracle Corporation, runtime: /usr/java/jdk1.8/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-1127.13.1.el7.x86_64", arch: "amd64", family: "unix"
- python2.7
二、下载atlas
http://atlas.apache.org/#/Downloads
我下载的是2.1.0
三、编译准备
解压在/opt 下 ,并修改为atlas2.1
vim /opt/atlas2.1/pom.xml
找到一下版本,修改
<hadoop.version>3.0.0</hadoop.version>
<hbase.version>2.1.0</hbase.version>
<kafka.version>2.2.1</kafka.version>
<zookeeper.version>3.4.5</zookeeper.version>
vim /opt/atlas2.1/distro/src/conf/atlas-log4j.xml
找到以下代码,将上面的注释放开
<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${atlas.log.dir}/atlas_perf.log" />
<param name="datePattern" value="'.'yyyy-MM-dd" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d|%t|%m%n" />
</layout>
</appender>
<logger name="org.apache.atlas.perf" additivity="false">
<level value="debug" />
<appender-ref ref="perf_appender" />
</logger>
四、开始编译
cd /opt/atlas2.1
mvn clean -DskipTests package -Denforcer.skip=true -Drat.skip=true -Pdist -X
中途遇到了 enforcer 和 rat的错误,就使用了上面的命令,跳过即可。
Could not resolve dependencies for project org.apache.atlas:sqoop-bridge-shim:jar:2.1.0: Could not find artifact org.apache.sqoop:sqoop:jar:1.4.6.2.3.99.0-195 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public) -> [Help 1]
这种错误,是没有sqoop的相关依赖,
https://mvnrepository.com/artifact/org.apache.sqoop/sqoop/1.4.6.2.3.99.0-195
下载完后,记住,只要将pom和jar文件传到linux的 /root/.m2/repository/org/apache/sqoop/sqoop/1.4.6.2.3.99.0-195 目录即可!!!不要再传其他的文件了。如果这目录下原来有东西,清空它在放进去。
编译的过程,还是有点长的,耐心等待即可。
集成
编译完成后,在 /opt/atlas2.1/distro/target 在有以下内容 ,我们要的就是 apache-atlas-2.1.0-bin.tar.gz
找个目录将其解压,我将他解压在了 /usr/local/src/atlas下
1.与hbase集成
1.设立软连接,链接配置文件
ln -s /etc/hbase/conf/ /usr/local/src/atlas/apache-atlas-2.1.0/conf/hbase/
2.vim conf/atlas-application.properties
atlas.graph.storage.hostname=hadoop1:2181,hadoop2:2181,hadoop3:2181
#修改地址,是外部能够访问
atlas.rest.address=http://node01:21000
#访问hbase
atlas.audit.hbase.zookeeper.quorum=hadoop1:2181,hadoop2:2181,hadoop3:2181
3.vim conf/atlas-env.sh
export HBASE_CONF_DIR=/usr/local/src/atlas/apache-atlas-2.1.0/hbase/conf
export MANAGE_LOCAL_HBASE=false
export MANAGE_LOCAL_SOLR=true
export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0
-XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=dumps/atlas_server.hprof
-Xloggc:logs/gc-worker.log -verbose:gc
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC
-XX:+PrintGCTimeStamps"
#优化 JDK1.8(以下需要16G内存)
export ATLAS_SERVER_HEAP="-Xms1536m -Xmx1536m
-XX:MaxNewSize=512m -XX:MetaspaceSize=50M
-XX:MaxMetaspaceSize=128m"
2.集成solr
由于solr必不可少,但是又用不到,所以我就安装了单个节点的solr
1.vim conf/atlas-application.properties
atlas.graph.index.search.solr.mode=cloud
#这里只要写一个zookeeper的地址即可,写多了反而不行,是个坑
atlas.graph.index.search.solr.zookeeper-url=hadoop3:2181/solr
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true
2.在solr创建所需的数据
#将Atlas的conf目录下Solr文件夹同步到Solr的目录下,更名,然后发到各个节点(使集群solr能读到atlas的solr配置)
cp -r /opt/module/atlas/conf/solr /opt/cloudera/parcels/CDH/lib/solr/
cd /opt/cloudera/parcels/CDH/lib/solr/
mv solr atlas_solr
scp -r /opt/cloudera/parcels/CDH/lib/solr/atlas_solr hadoop2:/opt/cloudera/parcels/CDH/lib/solr/
scp -r /opt/cloudera/parcels/CDH/lib/solr/atlas_solr hadoop1:/opt/cloudera/parcels/CDH/lib/solr/
#Solr创建collection,atlas相关索引保存的目录
/opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c vertex_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas_solr -force -shards 3 -replicationFactor 2
/opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c edge_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas_solr -force -shards 3 -replicationFactor 2
/opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c fulltext_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas_solr -force -shards 3 -replicationFactor 2
#如果需要删除collection,用一下命令
/opt/cloudera/parcels/CDH/lib/solr/bin/solr delete -c vertex_index
/opt/cloudera/parcels/CDH/lib/solr/bin/solr delete -c edge_index
/opt/cloudera/parcels/CDH/lib/solr/bin/solr delete -c fulltext_index
在solr的对应页面上,可以看到相关节点数据 http://hadoop3:8983/solr/#/~cloud
3.集成kafka
1.vim conf/atlas-application.properties
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
#这里也只要写一个zookeeper地址即可,是个坑
atlas.kafka.zookeeper.connect=hadoop3:2181
atlas.kafka.bootstrap.servers=hadoop1:9092,hadoop2:9092,hadoop3:9092
atlas.kafka.zookeeper.session.timeout.ms=4000
atlas.kafka.zookeeper.connection.timeoutms=2000
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
#自动提交
atlas.kafka.enable.auto.commit=true
2.创建topic
kafka-topics --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --create --replication-factor 2 --partitions 3 --topic ATLAS_HOOK
kafka-topics --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --create --replication-factor 2 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
4.集成HIVE
- 搜索hive-site
1.前面2个
<property><name>hive.exec.post.hooks</name><value>org.apache.atlas.hive.hook.HiveHook,org.apache.hadoop.hive.ql.hooks.LineageLogger</value></property>
2.后面的
<property><name>hive.exec.post.hooks</name><value>org.apache.atlas.hive.hook.HiveHook,org.apache.hadoop.hive.ql.hooks.LineageLogger</value></property><property><name>hive.reloadable.aux.jars.path</name><value>/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive</value></property>
- 环境高级配置代码段(安全阀)
HIVE_AUX_JARS_PATH=/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive
- 辅助
/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive
重启hive,
之后将 /usr/local/src/atlas/apache-atlas-2.1.0/hook/hive 分发到hive的节点
然后添加环境变量
vi /etc/profile
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export HIVE_CONF_DIR=/etc/hive/conf
export PATH=$HIVE_HOME/bin:$PATH
source /etc/profile
最后导入hive元数据
执行 ./bin/import-hive.sh 要求数据账号密码,都是admin
启动
./bin/atals_start.sh