Atlas开发环境部署(本地编译)和生产环境部署

想坐在海南香蕉树上

已于 2022-07-13 15:14:53 修改

阅读量1.9k

点赞数 1

文章标签：大数据 big data

于 2022-07-13 15:12:21 首次发布

本文链接：https://blog.csdn.net/do_bug/article/details/125764555

版权

Atlas开发环境部署和生产环境部署

一、Atlas使用演示

1.1 概述

Atlas功能：提供数据管理和治理功能，用以构建其数据资产目录，对这些资产进行分类管理，形成数据字典。
特点：
1、提供强大的索引，能够对不同源不同库下的表进行快速的检索
2、提供血缘关系图，以清晰的看到表与表的来龙去脉。血缘关系做到了字段级别
3、提供分类、术语划分等对元数据再一次进行分类分组管理
4、对单个表的信息展示十分详细
5、初始化导入元数据后，后面无需二次操作便可自动增量同步元数据信息到Atlas

1.2使用演示

1.3架构概述

请添加图片描述

hive\scoop\hbase。。。为不同的数据源，目前支持Hive、HBase、Scoop、Storm、Kafka数据源

kafka为消息中间件，用来增量同步源数据源数据到Atlas中

atlas 核心架构有三大模块：
ingest\export：导入导出元数据信息。导入指将其他的数据源的元数据导入到Atlas；导出指将atlas中的元数据导出               到hbase.atlas本身不存储数据。导入的时候是增量导入，依赖kafka做消息队列。hive-->atlas               会通过部署hive-hook，由hive 的 hook程序，将其元数据信息导入到atlas中
type system:系统分类模块，对元数据进行分类。包括源不同、同源的库、表、操作语句、路径等进行分类
graph engine：图引擎，主要是表与表之间的血缘依赖图、字段与字段之间的血缘依赖图。这部分数据是通过图数据库               JanusGraph经过转换成k-v形式，最后存储到Hbase中

Hbase为Atlas中的导出目标，存储元数据信息
solr类似与ElasticSearch，建索引，提供全局检索功能，称为索引库

提供了http协议和rest方式的api接口，方便做二次开发

1.3 源码各模块作用

各模块作用：

二、addons
       安装扩展组件源代码，主要是Atlas接入各种Hadoop元数据数据源的桥接代码，对应Atlas架构图中的部分：
    比如：hive-bridge
        hive桥接扩展模块，通过bin目录下的import-hive.sh脚本导入hive元数据到Atlas系统，脚本调用了桥接代码类HiveMetaStoreBridg
三、authorization
   Atlas鉴权模块，支持Simple鉴权和Ranger鉴权两种方式，这个模块的详细介绍说明和使用说明见官方文档：Apache Atlas – Data Governance and Metadata framework for Hadoop

五、client
客户端API代码
client-v2包：V2版本客户端API代码，客户端调用Atlas API接口时可以直接调用这里封装的API接口方法，减轻代码开发工作量

七、dashboardv3
Atlas管理台UI前端应用，对应架构图中的Admin UI


九、distro
atlas分布式部署相关的一些开发配置文件
pom.xml中提供了一些图数据库存储Hbase和图数据库索引检索组件solr的默认配置

bin目录下提供atlas基本安装部署的Python脚本文件，比如启动、停止atlas服务等
conf下提供Atlas配置文件

另外主要有：Atlas应用配置文件atlas-application.properties、Atlas环境变量配置文件atlas-env.sh、日志配置文件atlas-log4j.xml，鉴权策略配置文件atlas-simple-authz-policy.json、用户认证配置文件users-credentials.properties
main/assemblies目录下是打包相关的描述符配置文件

十九、webapp
Atlas Web应用模块
其中Atlas类：Atlas单机部署启动服务驱动类

二、Atlas 本地编译部署

1、下载源码

https://atlas.apache.org/#/Downloads （选择不同版本下载source源码，本次是2.2.0最新稳定版本）

2、编译源码

1、配置maven内存设置环境变量，否则在编译webapp模块时因内存超出而报错
MAVEN_OPTS=-Xms512m -Xmx1024m

2、在 主pom：apache-atlas(root)文件中： 如果这地方报错，则修改
 <requireMavenVersion>
<version>3.8.4</version>  #指定具体的版本
 </requireMavenVersion>
<requireJavaVersion>
<level>ERROR</level>
<version>1.8.0_45</version>  #指定具体的版本
</requireJavaVersion>
<requireJavaVersion>
<level>WARN</level>
<version>1.8.0_45</version>   #指定具体的版本

<dependency>
   <groupId>org.apache.atlas</groupId>
   <artifactId>atlas-buildtools</artifactId>
   <version>2.2.0</version>  #改为2.2.0版本
</dependency>

<!-- <configuration>-->
<!--  <deployAtEnd>true</deployAtEnd>-->
<!--  </configuration>-->
                    

3、编译stom-bridge 和storm-bridge-shim模块的时候可能因为需要引用外部的资源而下载不过来，可以关掉不加载,这个是编译数据源模块可以跳过，不影响服务的正常执行。同时在其他地方引用的位置要注掉：
对于atlas-distro.pom
<!--    <dependencies>-->
<!--        <dependency>-->
<!--            <groupId>org.apache.atlas</groupId>-->
<!--            <artifactId>storm-bridge</artifactId>-->   #注掉
<!--        </dependency>-->
<!--    </dependencies>-->
以及
<artifactId>maven-assembly-plugin</artifactId>
                        <executions>
                            <execution>
                                <goals>
                                    <goal>single</goal>
                                </goals>
                                <phase>package</phase>
                                <configuration>
                                    <skipAssembly>false</skipAssembly>
                                    <descriptors>
                                       
<!--<descriptor>src/main/assemblies/atlas-storm-hook-package.xml</descriptor>--> #注掉
对于apache.atlas.pom
<!--        <module>addons/storm-bridge-shim</module>-->         #注掉
<!--        <module>addons/storm-bridge</module>-->    
以及
<!--            <dependency>-->                                  #注掉
<!--                <groupId>org.apache.atlas</groupId>-->
<!--                <artifactId>storm-bridge</artifactId>-->
<!--                <version>${project.version}</version>-->
<!--            </dependency>--> 
以及
<!--            <dependency>-->                                   #注掉
<!--                <groupId>org.apache.atlas</groupId>-->
<!--                <artifactId>storm-bridge-shim</artifactId>-->
<!--                <version>${project.version}</version>-->
<!--            </dependency>-->

最后执行编译命令：
使用外部模式编译(hbase和solr使用外部,本次编译使用外部方式）
mvn clean -DskipTests package -Pdist
编译成功后：
[INFO] Apache Atlas Server Build Tools .................... SUCCESS [  1.066 s]
[INFO] apache-atlas ....................................... SUCCESS [  4.884 s]
[INFO] Apache Atlas Integration ........................... SUCCESS [  9.549 s]
[INFO] Apache Atlas Test Utility Tools .................... SUCCESS [  3.712 s]
[INFO] Apache Atlas Common ................................ SUCCESS [  4.260 s]
[INFO] Apache Atlas Client ................................ SUCCESS [  0.483 s]
[INFO] atlas-client-common ................................ SUCCESS [  1.448 s]
[INFO] atlas-client-v1 .................................... SUCCESS [  2.249 s]
[INFO] Apache Atlas Server API ............................ SUCCESS [  2.715 s]
[INFO] Apache Atlas Notification .......................... SUCCESS [  5.100 s]
[INFO] atlas-client-v2 .................................... SUCCESS [  1.530 s]
[INFO] Apache Atlas Graph Database Projects ............... SUCCESS [  0.470 s]
[INFO] Apache Atlas Graph Database API .................... SUCCESS [  1.720 s]
[INFO] Graph Database Common Code ......................... SUCCESS [  1.989 s]
[INFO] Apache Atlas JanusGraph-HBase2 Module .............. SUCCESS [  1.608 s]
[INFO] Apache Atlas JanusGraph DB Impl .................... SUCCESS [  7.949 s]
[INFO] Apache Atlas Graph DB Dependencies ................. SUCCESS [  1.929 s]
[INFO] Apache Atlas Authorization ......................... SUCCESS [  3.133 s]
[INFO] Apache Atlas Repository ............................ SUCCESS [ 20.009 s]
[INFO] Apache Atlas UI .................................... SUCCESS [01:53 min]
[INFO] Apache Atlas New UI ................................ SUCCESS [01:48 min]
[INFO] Apache Atlas Web Application ....................... SUCCESS [04:45 min]
[INFO] Apache Atlas Documentation ......................... SUCCESS [  4.515 s]
[INFO] Apache Atlas FileSystem Model ...................... SUCCESS [  5.743 s]
[INFO] Apache Atlas Plugin Classloader .................... SUCCESS [  3.340 s]
[INFO] Apache Atlas Hive Bridge Shim ...................... SUCCESS [  5.995 s]
[INFO] Apache Atlas Hive Bridge ........................... SUCCESS [ 20.976 s]
[INFO] Apache Atlas Falcon Bridge Shim .................... SUCCESS [  2.760 s]
[INFO] Apache Atlas Falcon Bridge ......................... SUCCESS [  9.204 s]
[INFO] Apache Atlas Sqoop Bridge Shim ..................... SUCCESS [  0.859 s]
[INFO] Apache Atlas Sqoop Bridge .......................... SUCCESS [ 15.889 s]
[INFO] Apache Atlas Hbase Bridge Shim ..................... SUCCESS [  5.268 s]
[INFO] Apache Atlas Hbase Bridge .......................... SUCCESS [ 16.576 s]
[INFO] Apache HBase - Testing Util ........................ SUCCESS [ 11.497 s]
[INFO] Apache Atlas Kafka Bridge .......................... SUCCESS [  7.014 s]
[INFO] Apache Atlas classification updater ................ SUCCESS [  3.903 s]
[INFO] Apache Atlas index repair tool ..................... SUCCESS [  5.946 s]
[INFO] Apache Atlas Impala Hook API ....................... SUCCESS [  0.860 s]
[INFO] Apache Atlas Impala Bridge Shim .................... SUCCESS [  1.138 s]
[INFO] Apache Atlas Impala Bridge ......................... SUCCESS [ 14.205 s]
[INFO] Apache Atlas Distribution .......................... SUCCESS [01:23 min]
[INFO] atlas-examples ..................................... SUCCESS [  0.452 s]
[INFO] sample-app ......................................... SUCCESS [  1.933 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  13:25 min
[INFO] Finished at: 2022-06-10T11:05:05+08:00
[INFO] ------------------------------------------------------------------------



使用内嵌hbase和solr模式编译：mvn clean -DskipTests package -Pdist,embedded-hbase-solr
注:1.内嵌模式会从apache官网apache下载hbase和solr，如果网络比较慢，可以将对应版本的安装包直接放到对应目录下
   2.使用内嵌模式可能会因为windows下无法启动hadoop，导致无法启动hbase，最后服务启动失败

3、开发环境下Atlas配置

编译成功之后，会将编译生成的文件放到\apache-atlas-sources-2.2.0\distro 目录下
=任意位置新建目录：如D:\ext.wangwentao5\all_paojects\atlas-deploy2
在这个目录下新建目录结构如下：
atlas-deploy2:
conf
data
hbase
logs
models
solr
webapp

然后：
将\apache-atlas-sources-2.2.0\distro\target\conf 下的文件全部复制到atlas-deploy2\conf中

将\apache-atlas-sources-2.2.0\webapp\target\web-app2.2.0下的文件复制到\atlas-deploy2\webapp\atlas下

配置Atlas：
注：由于windows环境下安装部署大数据组件Hadoop、Hbase等有诸多限制，所以使用外部hbase和solr的方式，在linux平台启动这些组件

在源码中apache-atlas-sources-2.2.0\distro\src\conf 和自建文件夹\atlas-deploy2\conf中均进行一样的修改：
修改atlas-application.properties  
(注：192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181为外部虚拟机中的zookeeper地址)
-1-
atlas.graph.storage.hostname=192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181
-2-
atlas.graph.index.search.solr.zookeeper-url=192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181
-3-
atlas.kafka.data=/opt/apps/kafka_2.11-2.4.1/data  ##linux中的Kafka数据存储路径
atlas.kafka.zookeeper.connect=192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181/kafka
atlas.kafka.bootstrap.servers=192.168.182.10:9092,192.168.182.20:9092,192.168.182.30:9092
-4-端口21000
atlas.server.http.port=21000
-5-ip地址
atlas.rest.address=http://localhost:21000
-6-关闭每次开启是初始化atlas
atlas.server.run.setup.on.start=false
-7-hbase元数据对应的zk地址
atlas.audit.hbase.zookeeper.quorum=192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181


修改atlas-env.sh
export MANAGE_LOCAL_HBASE=false  ##关闭内嵌hbase模式
export MANAGE_LOCAL_SOLR=false   ##关闭内嵌solr模式
export MANAGE_EMBEDDED_CASSANDRA=false  ##指示 cassandra 是否是 Atlas 的嵌入式后端
export MANAGE_LOCAL_ELASTICSEARCH=false ##指示是否应该为 Atlas 启动 Elasticsearch 的本地实例
export HBASE_CONF_DIR=/opt/apps/hbase-2.0.5/conf  ##linux中hbase的conf地址

atlas-application.properties 文件：

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181
atlas.graph.storage.hbase.regions-per-server=1

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index
atlas.graph.index.search.backend=solr

#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=false

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  Import Configs  #########
#atlas.import.temp.directory=/temp/import

#########  Notification Configs  #########
atlas.notification.embedded=false
atlas.kafka.data=/opt/apps/kafka_2.11-2.4.1/data
atlas.kafka.zookeeper.connect=192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181/kafka
atlas.kafka.bootstrap.servers=192.168.182.10:9092,192.168.182.20:9092,192.168.182.30:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM

#########  Server Properties  #########
atlas.rest.address=http://192.168.182.10:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=192.168.182.10:2181,192.168.182.20:2181,192.168.182.30:2181

#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>


#########  UI Configuration ########

atlas.ui.default.version=v1

atlas-env.sh 文件：

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
#export JAVA_HOME=

# any additional java opts you want to set. This will apply to both client and server operations
#export ATLAS_OPTS=

# any additional java opts that you want to set for client only
#export ATLAS_CLIENT_OPTS=

# java heap size we want to set for the client. Default is 1024MB
#export ATLAS_CLIENT_HEAP=

# any additional opts you want to set for atlas service.
#export ATLAS_SERVER_OPTS=

# indicative values for large number of metadata entities (equal or more than 10,000s)
#export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps"

# java heap size we want to set for the atlas server. Default is 1024MB
#export ATLAS_SERVER_HEAP=

# indicative values for large number of metadata entities (equal or more than 10,000s) for JDK 8
#export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m"

# What is is considered as atlas home dir. Default is the base locaion of the installed software
#export ATLAS_HOME_DIR=

# Where log files are stored. Defatult is logs directory under the base install location
#export ATLAS_LOG_DIR=

# Where pid files are stored. Defatult is logs directory under the base install location
#export ATLAS_PID_DIR=

# where the atlas titan db data is stored. Defatult is logs/data directory under the base install location
#export ATLAS_DATA_DIR=

# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
#export ATLAS_EXPANDED_WEBAPP_DIR=

# indicates whether or not a local instance of HBase should be started for Atlas
export MANAGE_LOCAL_HBASE=false

# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLR=false

# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRA=false

# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCH=false

4、启动Atlas所需组件

注:要提前在机器中安装配置好所需组件。包括CentOS环境、jdk、hadoop、mysql、zookeeper、kafka、hbase 、solr..

启动 hadoop
启动yarn
启动zk
启动kafka
启动hbase
启动solr
查看端口验证服务是否全都正常开启：
=============== linux01 ===============
27842 NameNode
30035 HRegionServer
28646 NodeManager
29115 QuorumPeerMain
29531 Kafka
29851 HMaster
28510 ResourceManager
30591 jar
=============== linux02 ===============
12321 HRegionServer
12165 Kafka
11766 QuorumPeerMain
12679 jar
11080 SecondaryNameNode
11003 DataNode
11435 NodeManager
=============== linux03 ===============
5008 NodeManager
4833 DataNode
5601 Kafka
5766 HRegionServer
5209 QuorumPeerMain
6124 jar

5、本地源码启动Atlas

服务驱动类：webapp模块下：org.apache.atlas.Atlas
配置VM参数：
-Datlas.home=D:\ext.wangwentao5\all_paojects\atlas-deploy2
-Datlas.conf=D:\ext.wangwentao5\all_paojects\atlas-deploy2\conf
-Datlas.data=D:\ext.wangwentao5\all_paojects\atlas-deploy2\data
-Datlas.log.dir=D:\ext.wangwentao5\all_paojects\atlas-deploy2\logs
-Dlog4j.configuration=atlas-log4j.xml
-Djava.net.preferIPv4Stack=true
配置args参数：
--port 22000
--app
D:\ext.wangwentao5\all_paojects\atlas-deploy2\webapp\atlas

启动驱动服务类...................

web页面访问：localhost:21000
账号密码都是admin

至此开发环境部署完毕....................

6、接口调用演示

三、生产环境部署

前提：在集群环境中已安装：jdk、hadoop\mysql\zookeeper\kafka\hbase

1、solr安装

使用新用户启动solr

为了安全，创建新用户solr，使用solr用户来安装启动solr
创建用户solr：sudo useradd solr
给用户solr设置密码solr：echo solr | sudo passwd --stdin solr
上传solr安装包-解压
修改solr组件的用户为solr
sudo chown -R solr:solr /opt/apps/solr-7.7.3/

集群安装：修改配置-最后分发

bin/solr.in.sh
ZK_HOST="linux01:2181,linux02:2181,linux03:2181"
分发之后修改用户：sudo chown -R solr:solr /opt/apps/solr-7.7.3/

启动：
以solr用户启动solr  。每个节点都要启动
sudo -i -u solr /opt/apps/solr-7.7.3/bin/solr start
启动成功验证：
出现：  Started Solr server on port 8983 (pid=20045). Happy searching!
网页端口 8983

2、atlas安装

atlas不提供安装包，只提供源码包，所以要自己进行编译，使用编译后的安装包进行安装

atlas编译后，会有很多的安装包，其中带hook的是与其他组件进行整合的连接包
在distro的target目录下找到server包进行解压安装

3、Atlas配置

3.1atlas—Hbase

atlas中的数据存储在hbase中
修改conf/atlas-application.properties
将图片信息元数据存在hbase中，而hbase的元数据是放在zk中的
atlas.graph.storage.hostname=linux01:2181,linux02:2181,linux03:2181
vi conf/atlas-env.sh
export HBASE_CONF_DIR=/opt/apps/hbase-2.0.5/conf

3.2 atlas—solr

solr存储atlas的图数据信息，比如血缘图
修改conf/atlas-application.properties   让atlas能找到solr的元数据
atlas.graph.index.search.solr.zookeeper-url=linux01:2181,linux02:2181,linux03:2181

atlas会将索引文件图数据信息存到solr
在solr中创建三个collection
创建vertex_index 三个分片 两个副本 点索引
sudo -i -u solr /opt/apps/solr-7.7.3/bin/solr create -c vertex_index -d /opt/apps/atlas-2.1.0/conf/solr -shards 3 -replicationFactor 2
创建edge_index 线索引
sudo -i -u solr /opt/apps/solr-7.7.3/bin/solr create -c edge_index -d /opt/apps/atlas-2.1.0/conf/solr -shards 3 -replicationFactor 2
创建fulltext_index 全局索引
sudo -i -u solr /opt/apps/solr-7.7.3/bin/solr create -c fulltext_index -d /opt/apps/atlas-2.1.0/conf/solr -shards 3 -replicationFactor 2

3.3.atlas—kafka

增量式数据同步需要kafka做为消息中间件
修改conf/atlas-application.properties 
atlas.notification.embedded=false
 #kafka的数据存储路径
atlas.kafka.data=/opt/apps/kafka_2.11-2.4.1/data  
#kafka的zk连接地址
atlas.kafka.zookeeper.connect=linux01:2181,linux02:2181,linux03:2181/kafka
#kafka的连接地址
atlas.kafka.bootstrap.servers=linux01:9092,linux02:9092,linux03:9092

3.4atlas—server

修改conf/atlas-application.properties
改atlas的ip
atlas.rest.address=http://linux01:21000
取消每次开启初始化
atlas.server.run.setup.on.start=false
改zk地址
atlas.audit.hbase.zookeeper.quorum=linux01:2181,linux02:2181,linux03:2181

4、Atlas启动

启动 hadoop

启动zk

启动kafka

启动hbase

启动solr
在所有节点以solr用户执行：
sudo -i -u solr /opt/apps/solr-7.7.3/bin/solr start


上述都启动之后可以查看进程：
=============== linux01 ===============
27842 NameNode
30035 HRegionServer
28646 NodeManager
29115 QuorumPeerMain
29531 Kafka
29851 HMaster
28510 ResourceManager
30591 jar
=============== linux02 ===============
12321 HRegionServer
12165 Kafka
11766 QuorumPeerMain
12679 jar
11080 SecondaryNameNode
11003 DataNode
11435 NodeManager
=============== linux03 ===============
5008 NodeManager
4833 DataNode
5601 Kafka
5766 HRegionServer
5209 QuorumPeerMain
6124 jar

最后才可以启动atlas
/opt/apps/atlas-2.1.0/bin/atlas_start.py 
启动会比较慢 会一直打印..........
直到看到Apache Atlas Server started!!! 则atlas成功启动
显示成功后还要再等待一会，web才可以访问

atlas启动的报错信息在：atlas/logs  以及 application.log中查看
停止为 atlas_stop.py 

访问atlas的webUI：
linux01:21000 
初始默认账户密码都是 admin

四、Atlas同步Hive源元数据演示

5.1 配置Hive Hook

前提：Atlas正常启动运行；已安装Hive

在源码编译之后会在distro/target中生成各种数据源的hook压缩包，选择hive-hook进行解压配置

修改/opt/apps/atlas-2.1.0/conf/atlas-application.properties
添加如下信息：
#########  Hive Hook Configs  ########
atlas.hook.hive.syschronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary

修改/opt/apps/hive-3.1.2/conf/hive-site.xml  配置 Hive Hook
添加如下信息：
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>


安装 hive hook
解压：在/opt/apps/software下
tar -zxvf apache-atlas-2.1.0-hive-hook.tar.gz 
进到apache-atlas-2.1.0-hive-hook.里面，将两个文件 hook 和 hook-bin 复制到atlas的安装目录下
scp -r ./* /opt/apps/atlas-2.1.0/

修改vi /opt/apps/hive-3.1.2/conf/hive-env.sh   使得hive可以找到atlas1的hook程序
export HIVE_AUX_JARS_PATH=/opt/apps/atlas-2.1.0/hook/hive

将/opt/apps/atlas-2.1.0/conf/atlas-application.properties 复制到 /opt/apps/hive-3.1.2/conf/
scp /opt/apps/atlas-2.1.0/conf/atlas-application.properties  /opt/apps/hive-3.1.2/conf/   

完成。。。

5.2 初始化同步hive元数据

执行hook-bin中的元数据导入脚本：
/opt/apps/atlas-2.1.0/hook-bin/import-hive.sh
输入用户名密码
看到：Hive Meta Data imported successfully!!!  则导入成功

之后边自动进行hive的元数据增量同步到Atlas中，不需要再执行初始化操作

六、源码分析

6.1 架构概述：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vL79bING-1657696260115)(D:\ext.wangwentao5\Desktop\data\atlas\img\1653386940.jpg)]

hive\scoop\hbase。。。为不同的数据源，目前支持Hive、HBase、Scoop、Storm、Kafka数据源

kafka为消息中间件，用来增量同步源数据源数据到Atlas中

atlas 核心架构有三大模块：
ingest\export：导入导出元数据信息。导入指将其他的数据源的元数据导入到Atlas；导出指将atlas中的元数据导出               到hbase.atlas本身不存储数据。导入的时候是增量导入，依赖kafka做消息队列。hive-->atlas               会通过部署hive-hook，由hive 的 hook程序，将其元数据信息导入到atlas中
type system:系统分类模块，对元数据进行分类。包括源不同、同源的库、表、操作语句、路径等进行分类
graph engine：图引擎，主要是表与表之间的血缘依赖图、字段与字段之间的血缘依赖图。这部分数据是通过图数据库               JanusGraph经过转换成k-v形式，最后存储到Hbase中

Hbase为Atlas中的导出目标，存储元数据信息
solr类似与ElasticSearch，建索引，提供全局检索功能，称为索引库

提供了http协议和rest方式的api接口，方便做二次开发

各模块作用：

二、addons
       安装扩展组件源代码，主要是Atlas接入各种Hadoop元数据数据源的桥接代码，对应Atlas架构图中的部分：
    比如：hive-bridge
        hive桥接扩展模块，通过bin目录下的import-hive.sh脚本导入hive元数据到Atlas系统，脚本调用了桥接代码类HiveMetaStoreBridg
三、authorization
   Atlas鉴权模块，支持Simple鉴权和Ranger鉴权两种方式，这个模块的详细介绍说明和使用说明见官方文档：Apache Atlas – Data Governance and Metadata framework for Hadoop

五、client
客户端API代码
client-v2包：V2版本客户端API代码，客户端调用Atlas API接口时可以直接调用这里封装的API接口方法，减轻代码开发工作量

七、dashboardv3
Atlas管理台UI前端应用，对应架构图中的Admin UI


九、distro
atlas分布式部署相关的一些开发配置文件
pom.xml中提供了一些图数据库存储Hbase和图数据库索引检索组件solr的默认配置

bin目录下提供atlas基本安装部署的Python脚本文件，比如启动、停止atlas服务等
conf下提供Atlas配置文件

另外主要有：Atlas应用配置文件atlas-application.properties、Atlas环境变量配置文件atlas-env.sh、日志配置文件atlas-log4j.xml，鉴权策略配置文件atlas-simple-authz-policy.json、用户认证配置文件users-credentials.properties
main/assemblies目录下是打包相关的描述符配置文件

十九、webapp
Atlas Web应用模块
其中Atlas类：Atlas单机部署启动服务驱动类

6.2 程序启动过程

//Atlas.java: 单机部署启动服务驱动类

 public static void main(String[] args) throws Exception {
        //解析args参数
        CommandLine cmd = parseArgs(args);
        //加载这个模块的配置文件
        PropertiesConfiguration buildConfiguration = new PropertiesConfiguration("atlas-buildinfo.properties");
        String appPath = "webapp/target/atlas-webapp-" + getProjectVersion(buildConfiguration);

        if (cmd.hasOption(APP_PATH)) {
            appPath = cmd.getOptionValue(APP_PATH);
        }

        setApplicationHome();
        //加载atlas-application.properties  atlas主要的配置文件
        Configuration configuration = ApplicationProperties.get();

        final String enableTLSFlag = configuration.getString(SecurityProperties.TLS_ENABLED);
        final String appHost = configuration.getString(SecurityProperties.BIND_ADDRESS, EmbeddedServer.ATLAS_DEFAULT_BIND_ADDRESS);

        if (!isLocalAddress(InetAddress.getByName(appHost))) {
            String msg =
                "Failed to start Atlas server. Address " + appHost
                    + " does not belong to this host. Correct configuration parameter: "
                    + SecurityProperties.BIND_ADDRESS;
            LOG.error(msg);
            throw new IOException(msg);
        }

        //最终确定app端口port ，SSL协议默认为false不启动
        final int appPort = getApplicationPort(cmd, enableTLSFlag, configuration);
        System.setProperty(AtlasConstants.SYSTEM_PROPERTY_APP_PORT, String.valueOf(appPort));
        final boolean enableTLS = isTLSEnabled(enableTLSFlag, appPort);
        configuration.setProperty(SecurityProperties.TLS_ENABLED, String.valueOf(enableTLS));

        showStartupInfo(buildConfiguration, enableTLS, appPort);

        //根据参数创建 jetty服务 server，最后开启服务 server.start()
        server = EmbeddedServer.newServer(appHost, appPort, appPath, enableTLS);
        installLogBridge();
        server.start();
    }

6.3 源元数据导入到Atlas过程

以Hive元数据为例：

配置Hive Hook.然后启动初始化导入脚本：import-hive.sh
脚本中调用的主类：
${JAVA_BIN}" ${JAVA_PROPERTIES} -cp "${CP}" org.apache.atlas.hive.bridge.HiveMetaStoreBridge 
主要调用过程：
importTables
  └── importDatabases 
      └── importHiveMetadata

//导入所有表
private int importTables(AtlasEntity dbEntity, String databaseName, String tblName, final boolean failOnError) throws Exception {
        int tablesImported = 0;

        final List<String> tableNames;

        if (StringUtils.isEmpty(tblName)) {
            tableNames = hiveClient.getAllTables(databaseName);
        } else {
            tableNames = hiveClient.getTablesByPattern(databaseName, tblName);
        }

        if(!CollectionUtils.isEmpty(tableNames)) {
            LOG.info("Found {} tables to import in database {}", tableNames.size(), databaseName);

            try {
                for (String tableName : tableNames) {
                    int imported = importTable(dbEntity, databaseName, tableName, failOnError);
                  
//
调用importTable(dbEntity, databaseName, tableName, failOnError);
在importTable中对表进行注册实例，也就是对元数据进行分类
 registerInstances(createTableProcess);
///
AtlasEntityWithExtInfo  ret  = null;
        EntityMutationResponse  response  = atlasClientV2.createEntity(entity);
        List<AtlasEntityHeader> createdEntities = response.getEntitiesByOperation(EntityMutations.EntityOperation.CREATE);
        if (CollectionUtils.isNotEmpty(createdEntities)) {
            for (AtlasEntityHeader createdEntity : createdEntities) {
                if (ret == null) {
                    ret = atlasClientV2.getEntityByGuid(createdEntity.getGuid());
LOG.info("Created {} entity: name={}, guid={}", ret.getEntity().getTypeName(), ret.getEntity().getAttribute(ATTRIBUTE_QUALIFIED_NAME), ret.getEntity().getGuid());

client模块中有很多http接口：
new API_V2(TYPEDEF_BY_NAME, HttpMethod.GET, Response.Status.OK);
new API_V2(TYPEDEF_BY_GUID, HttpMethod.GET, Response.Status.OK);
new API_V2(TYPEDEFS_API, HttpMethod.GET, Response.Status.OK);
new API_V2(TYPEDEFS_API + "headers", HttpMethod.GET, Response.Status.OK);
new API_V2(TYPEDEFS_API, HttpMethod.POST, Response.Status.OK);
new API_V2(TYPEDEFS_API, HttpMethod.PUT, Response.Status.OK);
new API_V2(TYPEDEFS_API, HttpMethod.DELETE, Response.Status.NO_CONTENT);
new API_V2(TYPEDEF_BY_NAME, HttpMethod.DELETE, Response.Status.NO_CONTENT);
                  
通过Http Post 的请求将库表数据更新至Atlas

增量同步数据：通过kafka
在HiveHook中调用run方法：
public void run(HookContext hookContext) throws Exception {
        if (LOG.isDebugEnabled()) {
            LOG.debug("==> HiveHook.run({})", hookContext.getOperationName());
        }

        try {
            //处理Hook 上下文信息
            HiveOperation        oper    = OPERATION_MAP.get(hookContext.getOperationName());
            AtlasHiveHookContext context = new AtlasHiveHookContext(this, oper, hookContext, getKnownObjects(), isSkipTempTables());
            BaseHiveEvent        event   = null;

            //不同的时间，进行不同的事件处理
            switch (oper) {
                case CREATEDATABASE:
                    event = new CreateDatabase(context);
                break;

                case DROPDATABASE:
                    event = new DropDatabase(context);

七、分类传播 category propagate

分类具有传播性能
例如：a+b=》q
1、当父类打上标签时，则与其有血缘关系的子类也会自动打上该标签.
    此时打上的标签会增加propagate Classifications:标识。
   如：a-1 ， b-1 则q-1
   如：a-1 ， b-2 则q-1,2
2、当父类的分类标签删除时，只有有一条血缘关系在维护，其父类的标签依旧在；当所有子类的分类标签都消除时，父类的标签会自动清除
   如：a-1 ， b-1 则q-1     清除a-1,则q-1；清除b-1，则q-1；清除a-1，b-1则q清除
   如：a-1 ， b-2 则q-1,2    清除a-1,则q-2；清除b-2，则q-1；清除a-1，b-2则q清除
3、当子类和父类的血缘关系被破坏时，父类的分类标签也会被清除

八、权限管理

8.1 登录认证

1、基于文件形式：
atlas-application.properties中配置
tlas.authentication.method.file=true
atlas.authentication.method.file.filename=sys:atlas.home/conf/users-credentials.properties
在users-credentials.properties中配置用户名密码就行

2、基于LDAP形式
atlas-application.properties中配置
atlas.authentication.method.ldap=true
atlas.authentication.method.ldap.type=ldap
然后设置这些属性
atlas.authentication.method.ldap.url=ldap://<Ldap server ip>:389
atlas.authentication.method.ldap.userDNpattern=uid={0},ou=users,dc=example,dc=com
atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
atlas.authentication.method.ldap.groupSearchFilter=(member=cn={0},ou=users,dc=example,dc=com
atlas.authentication.method.ldap.groupRoleAttribute=cn
atlas.authentication.method.ldap.base.dn=dc=example,dc=com
atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
atlas.authentication.method.ldap.bind.password=<password>
atlas.authentication.method.ldap.referral=ignore
atlas.authentication.method.ldap.user.searchfilter=(uid={0})
atlas.authentication.method.ldap.default.role=ROLE_USER

8.2 权限管理

1、simple模式
atlas-application.properties中配置
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=/atlas/conf/atlas-simple-authz-policy.json
通过加载权限管理配置文件来实现权限管理：atlas-simple-authz-policy.json
{
  "roles": {
    "ROLE_ADMIN": {
      "adminPermissions": [
        {
          "privileges": [ ".*" ]
        }
      ],
      "typePermissions": [
        {
          "privileges":     [ ".*" ],
          "typeCategories": [ ".*" ],
          "typeNames":      [ ".*" ]
        }
      ],
      "entityPermissions": [
        {
          "privileges":            [ ".*" ],
          "entityTypes":           [ ".*" ],
          "entityIds":             [ ".*" ],
          "entityClassifications": [ ".*" ],
          "labels":                [ ".*" ],
          "businessMetadata":      [ ".*" ],
          "attributes":            [ ".*" ],
          "classifications":       [ ".*" ]
        }
      ],
      "relationshipPermissions": [
        {
          "privileges":     [ ".*" ],
          "relationshipTypes": [ ".*" ],
          "end1EntityType":           [ ".*" ],
          "end1EntityId":             [ ".*" ],
          "end1EntityClassification": [ ".*" ],
          "end2EntityType":           [ ".*" ],
          "end2EntityId":             [ ".*" ],
          "end2EntityClassification": [ ".*" ]
        }
      ]
    },

    "DATA_SCIENTIST": {
      "entityPermissions": [
        {
          "privileges":            [ "entity-read", "entity-read-classification" ],
          "entityTypes":           [ ".*" ],
          "entityIds":             [ ".*" ],
          "entityClassifications": [ ".*" ],
          "labels":                [ ".*" ],
          "businessMetadata":      [ ".*" ],
          "attributes":            [ ".*" ]
        }
      ]
    },

    "DATA_STEWARD": {
      "entityPermissions": [
        {
          "privileges":            [ "entity-read", "entity-create", "entity-update", "entity-read-classification", "entity-add-classification", "entity-update-classification", "entity-remove-classification" ],
          "entityTypes":           [ ".*" ],
          "entityIds":             [ ".*" ],
          "entityClassifications": [ ".*" ],
          "labels":                [ ".*" ],
          "businessMetadata":      [ ".*" ],
          "attributes":            [ ".*" ],
          "classifications":       [ ".*" ]
        }
      ],
      "relationshipPermissions": [
        {
          "privileges":               [ "add-relationship", "update-relationship", "remove-relationship" ],
          "relationshipTypes":        [ ".*" ],
          "end1EntityType":           [ ".*" ],
          "end1EntityId":             [ ".*" ],
          "end1EntityClassification": [ ".*" ],
          "end2EntityType":           [ ".*" ],
          "end2EntityId":             [ ".*" ],
          "end2EntityClassification": [ ".*" ]
        }
      ]
    }
  },

  "userRoles": {
    "admin": [ "ROLE_ADMIN" ],
    "rangertagsync": [ "DATA_SCIENTIST" ]
  },

  "groupRoles": {
    "ROLE_ADMIN":      [ "ROLE_ADMIN" ],
    "hadoop":          [ "DATA_STEWARD" ],
    "DATA_STEWARD":    [ "DATA_STEWARD" ],
    "RANGER_TAG_SYNC": [ "DATA_SCIENTIST" ]
  }
}

2、ranger模式
此模式需要安装ranger，然后将ranger与atlas进行集成。
详细说明见官网：https://atlas.apache.org/index.html#/AtlasRangerAuthorizer
操作步骤参考博客：https://blog.csdn.net/weixin_41907245/article/details/125163861#:~:text=Atlas%E6%9C%AC%E8%BA%AB%E6%9C%89,.json%E6%96%87%E4%BB%B6%E4%B8%AD


需要配置：
ranger-atlas-audit.xml
ranger-atlas-security.xml
ranger-policymgr-ssl.xml
ranger-security.xml

Apache Ranger的Apache Atlas授权策略模型支持3个资源层次结构，以控制对以下各项的访问：类型、实体和管理操作。
具体的见官网