目录
datahub-domains创建方式
- UI界面创建
- 通过代码创建
- GraphQL
- Curl
- Python
datahub-domains创建
1.UI
进入Domians编辑界面:右上角Govern->Domains,点击new domains 新建业务域
填写名称、父级名称、id等
2. 代码创建
2.1 GraphQL
进入以下界面执行
执行以下代码可进行单条创建,该界面貌似不能进行多段代码同时运行,故批量导入放弃此方法
mutation createTag {
createTag(input:
{
name: "Deprecated",
id: "deprecated",
description: "Having this tag means this column or table is deprecated."
})
}
2.2 Curl
Curl 未尝试
2.3 python
python读取需要导入的业务域excel文件,格式如下
- 两层分级创建
在父级业务域:“电子商务与客户关系管理”已在UI创建完(id自定义为表里的)的前提下,创建该业务域下属的分域(本步代码是探索批量生成可行性,所以依赖了UI创建完父级,实际代码写入并不需要)
import logging
from datahub.emitter.mce_builder import make_domain_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import ChangeTypeClass, DomainPropertiesClass
import pandas as pd
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
data=pd.read_excel('domains.xlsx',sheet_name='2.业务域')
df=data[data.业务域=='电子商务与客户关系管理'][['业务域编码','业务域','分域编码','分域']].drop_duplicates().reset_index(drop=True)
for i in range(len(df)):
pid=df.loc[i,'业务域编码']
pname=df.loc[i,'业务域']
sid=df.loc[i,'分域编码']
sname=df.loc[i,'分域']
purn= 'urn:li:domain:'+pid
surn= 'urn:li:domain:'+sid
domain_urn = make_domain_urn(sid)
domain_properties_aspect = DomainPropertiesClass(
name=sname, description="",parentDomain=purn
)
event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityType="domain",
changeType=ChangeTypeClass.UPSERT,
entityUrn=domain_urn,
aspect=domain_properties_aspect,
)
rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
rest_emitter.emit(event)
log.info(f"Created domain {domain_urn}")
- 多层分级批量创建
import logging
from datahub.emitter.mce_builder import make_domain_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import ChangeTypeClass, DomainPropertiesClass
import pandas as pd
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
data=pd.read_excel('domains.xlsx',sheet_name='2.业务域')
data=data[data.业务域.isin (['电子商务与客户关系管理'])].reset_index(drop=True)
name=['业务域','分域','业务子域','关键业务']
for j in range(len(name)):
cna=name[j]
cno=name[j]+'编码'
if j>0:
pna=name[j-1]
pno=name[j-1]+'编码'
df=data[[pno,pna,cno,cna]].drop_duplicates().reset_index(drop=True)
for i in range(len(df)):
pid=df.loc[i,pno]
pname=df.loc[i,pna]
sid=df.loc[i,cno]
sname=df.loc[i,cna]
if pid!=sid:
purn= 'urn:li:domain:'+pid
domain_urn = make_domain_urn(sid)
domain_properties_aspect = DomainPropertiesClass(
name=sname, description="",parentDomain=purn
)
event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityType="domain",
changeType=ChangeTypeClass.UPSERT,
entityUrn=domain_urn,
aspect=domain_properties_aspect,
)
rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
rest_emitter.emit(event)
log.info(f"Created domain {domain_urn}")
else:
df=data[[cno,cna]].drop_duplicates().reset_index(drop=True)
for i in range(len(df)):
sid=df.loc[i,cno]
sname=df.loc[i,cna]
domain_urn = make_domain_urn(sid)
domain_properties_aspect = DomainPropertiesClass(
name=sname, description=""
)
event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityType="domain",
changeType=ChangeTypeClass.UPSERT,
entityUrn=domain_urn,
aspect=domain_properties_aspect,
)
rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
rest_emitter.emit(event)
log.info(f"Created domain {domain_urn}")
将该py文件上传至linux平台执行,可批量创建业务域,创建完如下,有多层可下钻
后续会批量生成标签tags、业务术语glossary,感兴趣可关注一下
参考文档: