Atlas 源码的编译此处省略,网上有很多资料。Atlas编译后会自动生成一系列安装包,选择apache-atlas-2.0.0-bin.tar.gz进行安装。注:编译源码包时,选择不包含自带的solr和hbase
本文按照生产模式进行安装,atlas HA, solr HA , hbase HA。
准备3台服务器,默认Hbase和zookeeper已经安装启动好
hadoop101
hadoop102
hadoop103
参考链接:
https://www.cnblogs.com/shengyang17/p/12272916.html
https://blog.csdn.net/MuQianHuanHuoZhe/article/details/82048755
安装Solr
1)Atlas 2.0 对应的solr版本是7.5.0,去官网下载solr安装包。
2)在hadoop101,解压solr-7.5.0.tgz到/opt/目录下面
[kris@hadoop101 opt]$ tar -zxvf solr-7.5.0.tgz -C /opt/
3)修改solr-5.2.1的名称为solr
[kris@hadoop101 opt]$ mv solr-7.5.0/ solr
4)进入solr/bin目录,修改solr.in.sh文件
[kris@hadoop102 solr]$ vim bin/solr.in.sh
#添加下列指令
ZK_HOST="hadoop101:2181,hadoop102:2181,hadoop103:2181"
SOLR_HOST="hadoop101"
#可修改端口号
SOLR_PORT=8983
5)分发Solr,进行Cloud模式部署
分发完成后,分别对hadoop102、hadoop103主机/opt/solr/bin下的solr.in.sh文件,修改为SOLR_HOST=对应主机名。
6)在三台节点上分别启动Solr,这个就是Cloud模式
[kris@hadoop101 solr]$ bin/solr start -force
[kris@hadoop102 solr]$ bin/solr start -force
[kris@hadoop103 solr]$ bin/solr start -force
提示:启动Solr前,需要提前启动Zookeeper服务。
7)Web访问8983端口,可指定三台节点中的任意一台IP,http://hadoop101:8983/solr/#/
提示:UI界面出现Cloud菜单栏时,Solr的Cloud模式才算部署成功。
8)编写Solr启动停止脚本(可以不用)
(1)在hadoop101的/home/kris/bin目录下创建脚本
[kris@hadoop102 bin]$ vim s.sh
#!/bin/bash
case $1 in
"start"){
for i in hadoop101 hadoop102 hadoop103
do
ssh $i "/opt/module/solr/bin/solr start"
done
};;
"stop"){
for i in hadoop101 hadoop102 hadoop103
do
ssh $i "/opt/module/solr/bin/solr stop"
done
};;
esac
(2)增加脚本执行权限
[kris@hadoop101 bin]$ chmod +x s.sh
(3)Solr集群启动脚本
[kris@hadoop101 module]$ s.sh start
(4)Solr集群停止脚本
[kris@hadoop101 module]$ s.sh stop
安装Atlas
1)解压apache-atlas-2.0.0-bin.tar.gz到/opt/目录下面
[kris@hadoop101 opt]$ tar -zxvf apache-atlas-2.0.0-bin.tar.gz -C /opt/
2)修改apache-atlas-2.0.0的名称为atlas
[kris@hadoop101 opt]$ mv apache-atlas-2.0.0/ atlas
3)将解压的文件分发到hadoop102 机器上
Atlas集成外部框架
Atlas集成Hbase
1)进入/opt/atlas/conf/目录,修改配置文件
[kris@hadoop101 conf]$ vim atlas-application.properties
#修改atlas存储数据主机,zookeeper地址
atlas.graph.storage.hostname=hadoop101:2181,hadoop102:2181,hadoop103:2181
2)进入到/opt/atlas/conf/hbase路径,添加Hbase集群的配置文件到atlas下
[kris@hadoop101 hbase]$
ln -s /opt/hbase/conf/ /opt/module/atlas/conf/hbase/
3)在/opt/atlas/conf/atlas-env.sh中添加HBASE_CONF_DIR
[kris@hadoop101 conf]$ vim atlas-env.sh
#添加HBase配置文件路径
export HBASE_CONF_DIR=/opt/atlas/conf/hbase/conf
export JAVA_HOME=/opt/app/jdk1.8.0_191/
export MANAGE_LOCAL_HBASE=false (如果要使用外部的zk和hbase,则改为false)
export MANAGE_LOCAL_SOLR=false (如果要是用外部的solr,则改为false)
Atlas集成Solr
1)进入/opt/atlas/conf目录,修改配置文件
[kris@hadoop101 conf]$ vim atlas-application.properties
#修改如下配置,zookeeper地址
atlas.graph.index.search.solr.zookeeper-url=hadoop101:2181,hadoop102:2181,hadoop103:2181
2)将Atlas自带的Solr文件夹拷贝到外部Solr集群的各个节点(solr三台机器都需要拷贝)
[kris@hadoop101 conf]$
cp -r /opt/atlas/conf/solr /opt/module/solr/
3)进入到/opt/solr路径,修改拷贝过来的文件夹solr修改为atlas_conf
[kris@hadoop101 solr]$ mv solr atlas_conf
4)在Cloud模式下,启动Solr(需要提前启动Zookeeper集群),并创建collection(只需要在其中一台机器上运行即可)
[kris@hadoop101 solr]$ bin/solr create -c vertex_index -d /opt/solr/atlas_conf -shards 3 -replicationFactor 2 -force
[kris@hadoop101 solr]$ bin/solr create -c edge_index -d /opt/solr/atlas_conf -shards 3 -replicationFactor 2 -force
[kris@hadoop101 solr]$ bin/solr create -c fulltext_index -d /opt/solr/atlas_conf -shards 3 -replicationFactor 2 -force
-shards 3:表示该集合分片数为3
-replicationFactor 2:表示每个分片数都有2个备份
vertex_index、edge_index、fulltext_index:表示集合名称
注意:如果需要删除vertex_index、edge_index、fulltext_index等collection可以执行如下命令。
[kris@hadoop101 solr]$ bin/solr delete -c ${collection_name}
5)验证创建collection成功
登录solr web控制台:http://hadoop101:8983/solr/#/~cloud 看到如下图显示:
Atlas集成Kafka
1)进入/opt/atlas/conf/目录,修改配置文件atlas-application.properties
[kris@hadoop101 conf]$ vim atlas-application.properties
######### Notification Configs #########
atlas.notification.embedded=false 如果要使用外部的kafka,则改为false
# 内嵌kafka会根据此端口启动一个zk实例
#atlas.kafka.zookeeper.connect=localhost:9026 # 如果使用外部kafka,则填写外部zookeeper地址
#atlas.kafka.bootstrap.servers=localhost:9027 # 如果使用外部kafka,则填写外部broker server地址
atlas.kafka.zookeeper.connect=hadoop101:2181,hadoop102:2181,hadoop103:2181
atlas.kafka.bootstrap.servers=hadoop101:9092,hadoop102:9092,hadoop103:9092
atlas.kafka.zookeeper.session.timeout.ms=4000
atlas.kafka.zookeeper.connection.timeout.ms=2000
atlas.kafka.enable.auto.commit=true
2)启动Kafka集群,并创建Topic,自带的就不需要了
[kris@hadoop101 kafka]$ bin/kafka-topics.sh --zookeeper hadoop101:2181,hadoop102:2181,hadoop103:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
[kris@hadoop101 kafka]$ bin/kafka-topics.sh --zookeeper hadoop101:2181,hadoop102:2181,hadoop103:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
Atlas Web Server HA
vim atlas-application.properties:
hive.atlas.hook=true
hive.exec.post.hooks=org.apache.atlas.hive.hook.HiveHook
atlas.rest.address= http://hadoop101:21000
atlas.server.ha.enabled=true
atlas.server.ids=id1,id2
atlas.server.address.id1=hadoop101:21000
atlas.server.address.id2=hadoop102:21000
atlas.server.ha.zookeeper.connect=node1:2181,node2:2181,node3:2181
Atlas其他配置
1)进入/opt/atlas/conf/目录,修改配置文件atlas-application.properties
[kris@hadoop101 conf]$ vim atlas-application.properties
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.zookeeper.quorum=hadoop101:2181,hadoop102:2181,hadoop103:2181
2)记录性能指标,进入/opt/atlas/conf/路径,修改当前目录下的atlas-log4j.xml
[kris@hadoop101 conf]$ vim atlas-log4j.xml
#去掉如下代码的注释
<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">
<param name="file" value="${atlas.log.dir}/atlas_perf.log" />
<param name="datePattern" value="'.'yyyy-MM-dd" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d|%t|%m%n" />
</layout>
</appender>
<logger name="org.apache.atlas.perf" additivity="false">
<level value="debug" />
<appender-ref ref="perf_appender" />
</logger>
Atlas集成Hive
1)进入/opt/atlas/conf/目录,修改配置文件atlas-application.properties
[kris@hadoop101 conf]$ vim atlas-application.properties
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
2)将atlas-application.properties配置文件加入到atlas-plugin-classloader-2.0.0.jar中(只需要在hadoop101执行即可,会将hadoop101的atlas安装包拷贝到hive所在每一台服务器)
[kris@hadoop101 hive]$ zip -u /opt/atlas/hook/hive/atlas-plugin-classloader-2.0.0.jar /opt/atlas/conf/atlas-application.properties
[kris@hadoop101 hive]$ cp /opt/atlas/conf/atlas-application.properties /opt/hive/conf/
原因:这个配置不能参照官网,将配置文件考到hive的conf中。参考官网的做法一直读取不到atlas-application.properties配置文件,
看了源码发现是在classpath读取的这个配置文件,所以将它压到jar里面。
3)在/opt/hive/conf/hive-site.xml文件中设置Atlas hook
[kris@hadoop101 conf]$ vim hive-site.xml
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
注:hive.exec.post.hooks如果有多个值,可以,隔开,例如:org.apache.atlas.hive.hook.HiveHook,org.apache.atlas.hive.hook.HiveHook
[kris@hadoop101 conf]$ vim hive-env.sh
export HIVE_AUX_JARS_PATH=/opt/atlas/hook/hive/atlas-plugin-classloader-2.0.0.jar,/opt/atlas/hook/hive/hive-bridge-shim-2.0.0.jar
修改完配置文件后,将hadoop101的配置文件拷贝给hadoop102,修改配置文件中的主机atlas.rest.address= http://hadoop101:21000,分别启动atlas
查看atlas HA状态
$ATLAS_HOME/bin/atlas_admin.py -status
ACTIVE:此实例处于活动状态,可以响应用户请求。
PASSIVE:这个实例是被动的。它会将收到的任何用户请求重定向到当前活动实例。
BECOMING_ACTIVE:如果服务器正在转换为ACTIVE实例,则会打印出来。服务器无法在此状态下为任何元数据用户请求提供服务。
BECOMING_PASSIVE:如果服务器正在转换为PASSIVE实例,则会打印出来。服务器无法在此状态下为任何元数据用户请求提供服务。
在正常操作情况下,这些实例中只有一个应该打印值ACTIVE作为对脚本的响应,而其他实例将打印PASSIVE。
修改完的atlas 配置文件
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
######### Graph Database Configs #########
# Graph Database
#Configures the graph database to use. Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase
# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus
#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=dataMiddle-79:2181,dataMiddle-80:2181,dataMiddle-90:2181
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=
# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true
# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1
# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1
# Graph Search Index
atlas.graph.index.search.backend=solr
#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=dataMiddle-79:2181,dataMiddle-80:2181,dataMiddle-90:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: http://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true
# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150
######### Notification Configs #########
atlas.notification.embedded=true
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443
######### Security Properties #########
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true
#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none
#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true
######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>
######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>
######### JAAS Configuration ########
#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM
######### Server Properties #########
atlas.rest.address=http://dataMiddle-97:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=dataMiddle-79:2181,dataMiddle-80:2181,dataMiddle-90:2181
######### High Availability Configuration ########
atlas.server.ha.enabled=true
#### Enabled the configs below as per need if HA is enabled #####
atlas.server.ids=id1,id2
atlas.server.address.id1=dataMiddle-97:21000
atlas.server.address.id2=dataMiddle-98:21000
atlas.server.ha.zookeeper.connect=dataMiddle-79:2181,dataMiddle-80:2181,dataMiddle-90:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
######### Hive Hook Configs #######
hive.atlas.hook=true
hive.exec.post.hooks=org.apache.atlas.hive.hook.HiveHook
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
######### Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=
######### Performance Configs #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000
######### CSRF Configs #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER
############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=
############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=
######### Compiled Query Cache Configuration #########
# The size of the compiled query cache. Older queries will be evicted from the cache
# when we reach the capacity.
#atlas.CompiledQueryCache.capacity=1000
# Allows notifications when items are evicted from the compiled query
# cache because it has become full. A warning will be issued when
# the specified number of evictions have occurred. If the eviction
# warning threshold <= 0, no eviction warnings will be issued.
#atlas.CompiledQueryCache.evictionWarningThrottle=0
######### Full Text Search Configuration #########
#Set to false to disable full text search.
#atlas.search.fulltext.enable=true
######### Gremlin Search Configuration #########
#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false
########## Add http headers ###########
#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>