图数据库批量导入(HugeGraph-Loader )

图数据库批量导入(HugeGraph-Loader )


章节
第一章链接: 图数据库批量导入(HugeGraph-Loader )

前言

HugeGraph-Loader 是 HugeGragh 的数据导入组件,能够将多种数据源的数据转化为图的顶点和边并批量导入到图数据库中。
目前支持的数据源包括:

  1. 本地磁盘文件或目录,支持 TEXT、CSV 和 JSON 格式的文件,支持压缩文件
  2. HDFS 文件或目录,支持压缩文件
  3. 主流关系型数据库,如 MySQL、PostgreSQL、Oracle、SQL Server本地磁盘文件和HDFS 文件,支持断点续传。

一、Loader执行流程

使用 HugeGraph-Loader 的基本流程分为以下几步:

  1. 编写图模型
  2. 准备数据文件
  3. 编写输入源映射文件
  4. 执行命令导入

二、csv文件导入

1.数据映射文件

数据映射文件如下,如果csv文件有表头,input下的header是不需要赋值的,如果赋值了会吧第一行当做数据解析

 {
	"version": "2.0",
	"structs": [{
		"id": "1",
		"skip": false,
		"input": {
			"type": "FILE",
			"path": "/mnt/parastor/aimind/kg-resources/Oakcsys1/d2r/job-63c3b6727701166100cd7426/file-mapping-7f19ceeea95a417495bc33bd54fa1bf9/人员列表1.csv",
			"file_filter": {
				"extensions": ["*"]
			},
			"format": "CSV",
			"delimiter": ",",
			"date_format": "yyyy-MM-dd HH:mm:ss",
			"time_zone": "GMT+8",
			"skipped_line": {
				"regex": "(^#|^//).*|"
			},
			"compression": "NONE",
			"batch_size": 500,
			"header": null,
			"charset": "GBK",
			"list_format": {
				"start_symbol": "",
				"end_symbol": "",
				"elem_delimiter": "|",
				"ignored_elems": [""]
			}
		},
		"vertices": [{
			"label": "ry",
			"skip": false,
			"id": "姓名",
			"unfold": true,
			"field_mapping": {
				"年龄": "nl",
				"性别": "xb"
			},
			"value_mapping": {},
			"selected": ["姓名", "年龄", "性别"],
			"ignored": [],
			"null_values": ["Null"],
			"update_strategies": {},
			"field_formats": []
		}],
		"edges": []
	}]
}

三、json文件导入

1.数据映射文件

数据映射文件如下,json文件没有表头,header需要赋值,json格式为每行一个JSON数据集
例如:
{“name”: “marko”, “sex”: “男”, “age”: “12”, “weight”: “0.4”}
{“name”: “josh”, “sex”: “女”, “age”: “16”, “weight”: “0.4”}

  {
	"version": "2.0",
	"structs": [{
		"id": "1",
		"skip": false,
		"input": {
			"type": "FILE",
			"path": "C:\\Users\\kmliu\\Desktop\\上传文件2\\t_user3.json",
			"file_filter": {
				"extensions": ["*"]
			},
			"format": "JSON",
			"delimiter": ",",
			"date_format": "yyyy-MM-dd HH:mm:ss",
			"time_zone": "GMT+8",
			"skipped_line": {
				"regex": "(^#|^//).*|"
			},
			"compression": "NONE",
			"batch_size": 500,
			"header": ["sex", "name", "weight", "age"],
			"charset": "UTF-8",
			"list_format": {
				"start_symbol": "",
				"end_symbol": "",
				"elem_delimiter": "|",
				"ignored_elems": [""]
			}
		},
		"vertices": [{
			"label": "ry2",
			"skip": false,
			"id": "name",
			"unfold": true,
			"field_mapping": {
				"sex": "sex",
				"age": "age"
			},
			"value_mapping": {},
			"selected": ["sex", "name", "age"],
			"ignored": [],
			"null_values": ["Null"],
			"update_strategies": {},
			"field_formats": []
		}],
		"edges": []
	}]
}

三、mysql数据导入

1.数据映射文件

数据映射文件如下

{
	"version": "2.0",
	"structs": [{
		"id": "1",
		"skip": false,
		"input": {
			"type": "JDBC",
			"vendor": "MYSQL",
			"header": ["id", "name", "age", "sex"],
			"charset": "UTF-8",
			"list_format": {
				"start_symbol": "",
				"end_symbol": "",
				"elem_delimiter": "|",
				"ignored_elems": [""]
			},
			"driver": "com.mysql.cj.jdbc.Driver",
			"url": "jdbc:mysql://xxx.xxx.xxx.xxx:3306",
			"database": "baseName",
			"schema": null,
			"table": "user3",
			"username": "root",
			"password": "root",
			"batch_size": 500,
			"primary_key": "name"
		},
		"vertices": [{
			"label": "ry",
			"skip": false,
			"id": "name",
			"unfold": true,
			"field_mapping": {
				"age": "nl",
				"sex": "xb"
			},
			"value_mapping": {},
			"selected": ["sex", "name", "age"],
			"ignored": [],
			"null_values": ["Null"],
			"update_strategies": {},
			"field_formats": []
		}],
		"edges": []
	}]
}

四、调用hugGraph步骤

1.调用方法入口

参数Oakcsys1是图数据库里面的集合名词,json1.json是数据映射文件,在上面已经描述了生成的规则,“xxx.xx.xx.xx”, “-p”, "18081"这个问图数据库的地址以及ip.

 public static void main(String[] args) {
        // -g {GRAPH_NAME} -f ${INPUT_DESC_FILE} -s ${SCHEMA_FILE} -h {HOST} -p {PORT}
        if (args.length == 0) {
            args = new String[]{"-g", "Oakcsys1",
                    "-f", "C:\\Users\\kmliu\\Desktop\\上传文件2\\json1.json",
                    "-h", "xxx.xx.xx.xx", "-p", "18081"
            };
        }
        HugeGraphLoader loader;
        try {
            loader = new HugeGraphLoader(args);
        } catch (Throwable e) {
            Printer.printError("Failed to start loading", e);
            return;
        }
        loader.load();
    }

五、导入数据日志

1.日志如下

: -----映射任务运行中-日志打印-----
: --------------------------------------------------
: detail metrics
: input-struct '1'
:     read success                  : 4                   
:     read failure                  : 0                   
: vertex 'ry'
:     parse success                 : 4                   
:     parse failure                 : 0                   
:     insert success                : 4                   
:     insert failure                : 0                   
: --------------------------------------------------
: count metrics
:     input read success            : 4                   
:     input read failure            : 0                   
:     vertex parse success          : 4                   
:     vertex parse failure          : 0                   
:     vertex insert success         : 4                   
:     vertex insert failure         : 0                   
:     edge parse success            : 0                   
:     edge parse failure            : 0                   
:     edge insert success           : 0                   
:     edge insert failure           : 0                   
: --------------------------------------------------
: meter metrics
:     total time                    : 5.549s              
:     vertex load rate(vertices/s)  : 0                   
:     edge load rate(edges/s)       : 0                   
: -----映射任务运行中-日志打印结束-----

总结

以上就是HugeGraphLoader的使用基本步骤,主要用于将数据集导入到图数据库中,支持csv,json,txt,mysql,hive等的数据导入。导入速度很快

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值