文章目录
一、背景
Apache Atlas
是数据治理的一个重要组件,可以自动采集加工,生成元数据及血缘。
Apache Atlas
本质上是基于JanusGraph
图数据库封装起来的一个服务,在生产环境下,JanusGraph
一般需要依赖外部的存储引擎(HBase
/Cassandra
/Bigtable
/BerkeleyDB
)及搜索引擎(Elasticsearch
/Solr
/Lucene
),来提高服务稳定性及性能。
JanusGraph
配置与一般服务有所不同,它的配置有五种类型:
配置类型 | 作用范围 |
---|---|
FIXED | 固定配置。一旦实例启动后,就会将配置保存到存储引擎中,并且无法更改 |
GLOBAL | 全局配置。一旦实例启动后,就会将配置保存到存储引擎中,但可以通过管理接口修改,并且应用到所有实例 |
GLOBAL_OFFLINE | 跟GLOBAL配置类似,但修改时需要保留一个实例,停止其他所有实例。修改完成后再重启所有实例 |
MASKABLE | 可掩盖的配置。本地配置可以覆盖全局配置。若本地没有配置,则使用全局配置 |
LOCAL | 本地配置只作用于单个实例。当实例启动时,每次都会从配置文件中读取,并应用到实例 |
通过Ambari
安装Atlas
时,默认是将Solr
作为搜索引擎的,而搜索引擎的配置属于GLOBAL_OFFLINE
类型。所以当我们启动Atlas
服务后,若想修改搜索引擎,不是一件容易事。
本文提供的方法只适用于对于元数据准确性要求不是特别严格的场景,因为需要删库重建。对应的服务版本为:
Apache Atlas
: 2.1.0JanusGraph
:0.5.1ElasticSearch
:6.8.4
二、尝试修改配置(失败)
一开始准备按照JanusGraph
的官方文档,通过管理接口去修改配置,但发现Atlas
使用的JanusGraph
是Embedded
模式,此模式下的JanusGraph
没有提供独立的服务接口,因此无法通过管理接口修改GLOBAL_OFFLINE
配置项:搜索引擎index.search.backend
。
参考文档:How to connect to embedded inmemory Janusgraph via Gremlin Console
虽然没法通过管理接口修改搜索引擎配置项index.search.backend
,但其他的GLOBAL_OFFLINE
配置项还是可以通过其他方式修改的。不同配置项应该是有区别的,所以有些可以修改。
- 下载并解压
JanusGraph-0.5.1
原生安装包
在官网下载 janusgraph-0.5.1.zip,上传到Atlas服务器并解压。本文解压到/data/Install/JanusGraphInstall
目录下
/data/Install/JanusGraphInstall/janusgraph-0.5.1/
- 设置环境变量
# `JanusGraph`图数据库也是基于`Gremlin`引擎开发的,所以开启JVM的DEBUG级别是设置环境变量GREMLIN_LOG_LEVEL
export GREMLIN_LOG_LEVEL=DEBUG
# 开启后,会打印 classpath,用于查看jar包加载过程
export SCRIPT_DEBUG=y
# 手工设置hadoop安装目录,避免日志中报错
export HADOOP_HOME=/usr/hdp/3.1.0.0-78/hadoop
# 关键!手工设置hadoop、hbase的配置目录,JanusGraph会自动从classpath中读取hbase配置,否则连接hbase时会因为配置不正确而失败
export CLASSPATH=/etc/hadoop/conf:/etc/hbase/conf
- 设置引擎配置
在conf
目录下,JanusGraph
提供了一个配置文件janusgraph-hbase-solr.properties
,里面包含了最基本的配置项,需要修改为Atlas
对应的引擎配置
# HBase配置
storage.hbase.table=atlas_janus
storage.hostname=hdtest01.dev.com,hdtest02.dev.com,hdtest03.dev.com
# Solr配置
index.search.solr.mode=cloud
index.search.solr.zookeeper-url=hdtest01.dev.com:2181/infra-solr,hdtest02.dev.com:2181/infra-solr,hdtest03.dev.com:2181/infra-solr
修改logback.xml
全局日志级别
<root level="DEBUG">
<appender-ref ref="STDOUT" />
</root>
- 启动客户端
# 这是客户端启动命令。启动过程中会打印日志
$ bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
gremlin>
- 连接引擎
虽然Atlas
没对外暴露JanusGraph
的服务接口,但可以通过gremlin
客户端,直连它的存储及搜索引擎,查询/修改数据。
gremlin> graph = JanusGraphFactory.open("/data/Install/JanusGraphInstall/janusgraph-0.5.1/conf/janusgraph-hbase-solr.properties")
- 打开管理接口
# 打开管理接口
gremlin> mgmt = graph.openManagement()
# 获取搜索引擎配置
gremlin> mgmt.get('index.search.backend')
# 获取缓存时长配置
gremlin> mgmt.get('cache.db-cache-time')
- 修改配置
除了正在运行中的服务,首先要把非活跃的实例全部强制下线,因为Embedded JanusGraph下线时不会自动销毁实例注册信息,这算是个issue吧。
由于修改GLOBAL_OFFLINE
配置时,只能保留一个实例,所以要全部强制下线。
# 重新打开管理接口,避免超时关闭
gremlin> mgmt = graph.openManagement()
# 获取所有JanusGraph(Atlas)实例
gremlin> open = mgmt.getOpenInstances()
# 关闭非活跃实例
gremlin> for (String inst : open) { if (!inst.endsWith("(current)")) mgmt.forceCloseInstance(inst) }
# 关键!要提交变更,才会真正修改配置
gremlin> mgmt.commit()
修改GLOBAL_OFFLINE
配置
# 重新打开管理接口,commit会自动关闭掉
gremlin> mgmt = graph.openManagement()
# 修改缓存时间
gremlin> mgmt.set('cache.db-cache-time', 1800000)
# 尝试修改搜索引擎,但实际并不起作用
gremlin> mgmt.set('index.search.backend', 'elasticsearch')
# 关键!要提交变更,才会真正修改配置
gremlin> mgmt.commit()
重新查看配置,发现只有cache.db-cache-time
被修改成功,index.search.backend
还是Solr
。按照官方文档的说法,如果是连接到JanusGraph
实例的话,此时唯一运行的实例应该会自动重启,但Atlas
并没有自动重启,应该是JanusGraph
为Embedded
模式的原因。
三、删库重建
修改配置的方式暂时行不通,只好选择删库重建。这是由于线上Atlas还没被大规模使用,只是作为元数据服务查询,删库重建对线上服务影响不大,其余场景请谨慎评估后再操作。
3.1 删除HBase表
请谨慎操作!不要误删!
# 重新登录Kerberos
$ kinit -kt your.keytab username
# hbase shell
hbase(main)> disable 'atlas_janus'
hbase(main)> drop 'atlas_janus'
3.2 修改Atlas配置
# Graph Search Index
atlas.graph.index.search.backend=elasticsearch
# ElasticSearch support (Tech Preview)
atlas.graph.index.search.hostname=your_es_host:9200
atlas.graph.index.search.elasticsearch.client-only=true
# Authentication type to be used for HTTP(S) access.
atlas.graph.index.search.elasticsearch.http.auth.type=basic
# Username for HTTP(S) authentication.
atlas.graph.index.search.elasticsearch.http.auth.basic.username=your_name
# Password for HTTP(S) authentication.
atlas.graph.index.search.elasticsearch.http.auth.basic.password=your_pwd
3.3 修改Atlas的ES依赖jar包
替换jar包:
$ATLAS_HOME_DIR/server/webapp/atlas/WEB-INF/lib/elasticsearch-rest-client-5.6.4.jar
->
$ATLAS_HOME_DIR/server/webapp/atlas/WEB-INF/lib/elasticsearch-rest-client-7.6.1.jar
删除jar包:
$ATLAS_HOME_DIR/server/webapp/atlas/WEB-INF/lib/elasticsearch-rest-high-level-client-5.6.4.jar
原因:
atlas的es版本是5.6.4
但JanusGraph用的es版本是7.6.1
会造成初始化ES客户端时出错:
java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:64)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:440)
at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:427)
at org.janusgraph.diskstorage.Backend.<init>(Backend.java:150)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1359)
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:146)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:161)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:132)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:112)
at org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase.initJanusGraph(AtlasJanusGraphDatabase.java:182)
at org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase.getGraphInstance(AtlasJanusGraphDatabase.java:169)
at org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase.getGraph(AtlasJanusGraphDatabase.java:276)
at org.apache.atlas.repository.graph.AtlasGraphProvider.getGraphInstance(AtlasGraphProvider.java:52)
at org.apache.atlas.repository.graph.AtlasGraphProvider.retry(AtlasGraphProvider.java:114)
at org.apache.atlas.repository.graph.AtlasGraphProvider.get(AtlasGraphProvider.java:102)
at org.apache.atlas.repository.graph.AtlasGraphProvider$$EnhancerBySpringCGLIB$$7ed677f6.CGLIB$get$0(<generated>)
at org.apache.atlas.repository.graph.AtlasGraphProvider$$EnhancerBySpringCGLIB$$7ed677f6$$FastClassBySpringCGLIB$$f588a8aa.invoke(<generated>)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.janusgraph.diskstorage.es.rest.RestElasticSearchClient
at org.janusgraph.diskstorage.es.rest.RestClientSetup.getElasticSearchClient(RestClientSetup.java:107)
at org.janusgraph.diskstorage.es.rest.RestClientSetup.connect(RestClientSetup.java:75)
at org.janusgraph.diskstorage.es.ElasticSearchSetup$1.connect(ElasticSearchSetup.java:51)
at org.janusgraph.diskstorage.es.ElasticSearchIndex.interfaceConfiguration(ElasticSearchIndex.java:437)
at org.janusgraph.diskstorage.es.ElasticSearchIndex.<init>(ElasticSearchIndex.java:324)
根本原因在于Atlas
连接ES时,调用的是JanusGraph
ES客户端RestElasticSearchClient
,需要初始化一个对象Request
,这个类的构造函数在es-5.6.4
中要求提供4个参数,但在es-7.6.1
中只提供了2个参数
private static final Request INFO_REQUEST = new Request(REQUEST_TYPE_GET, REQUEST_SEPARATOR);
3.4 重启Atlas
bin/atlas_stop.py
bin/atlas_start.py
四、开放ES索引权限
如果在日志发现连接ES报错:401 - 未授权
org.elasticsearch.client.ResponseException: method [GET], host [http://mpes.sit.com:9200], URI [/], status line [HTTP/1.1 401 Unauthorized]
需要联系ES管理员,开通以下接口的权限
- 根接口
/
,用于获取es的版本 /_cluster/health
,用于检测ES服务可用性/_scripts/janusgraph-add
,/_scripts/janusgraph-del
自动增删权限/janusgraph_edge_index
,/janusgraph_vertex_index
,/janusgraph_fulltext_index
索引/_bulk
批量接口/_aliases
别名接口
五、修改ES索引配置
curl --location --request PUT 'your_es_host:9200/janusgraph_vertex_index/_mapping/vertex_index' \
--header 'Authorization: Basic your_basic_auth' \
--header 'Content-Type: application/json' \
--data-raw '{
"properties": {
"Referenceable": {
"properties": {
"qualifiedName": {
"type": "text",
"fielddata": true,
"copy_to": [
"all"
]
}
}
}
}
}'
原因:
当调用Atlas REST API搜索Hive表名,若按照qualifiedName
进行排序,ES会返回错误:
Caused by: org.elasticsearch.client.ResponseException: method [POST], host [http://mpes.sit.com:9200], URI [/janusgraph_vertex_index/_search], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [Referenceable.qualifiedName] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"janusgraph_vertex_index","node":"5Hmgs997TE-38add2XDa0g","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [Referenceable.qualifiedName] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [Referenceable.qualifiedName] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.","caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [Referenceable.qualifiedName] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}},"status":400}
原因在于es
默认关闭了fielddata
,不支持将整个文档内容作为一个完整对象进行聚合操作,比如排序。