RedisGraph利用redisgraph-bulk-loader批量导入数据

最新推荐文章于 2024-03-12 11:29:42 发布

那一年天空很高风很清澈

最新推荐文章于 2024-03-12 11:29:42 发布

阅读量718

点赞数 1

分类专栏： redisgraph 文章标签： nosql 后端大数据

本文链接：https://blog.csdn.net/u011572265/article/details/112198706

版权

redisgraph 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

前言

一、redisgraph-bulk-loader介绍

二、使用步骤

1.要求

2.创建docker network

3.创建docker redisgraph

前言

作者使用的是docker版本的redisgraph

一、redisgraph-bulk-loader介绍

用于从CSV输入构建RedisGraph数据库的Python实用程序

二、使用步骤

1.要求

批量加载器实用程序需要Python 3解释器。

2.创建docker network

代码如下（示例）：

docker network create data-import

3.创建docker redisgraph

docker run -p 6379:6379 -d  --network data-import --network-alias redisgraph --name redisgraph redislabs/redisgraph

4.安装python3解释器

docker run -it  --network data-import --network-alias python --name python python

1）.docker run -it --name python python bash
2）.退出
3）.docker start python
4）.docker run -it python bash
5) .安装 pip install redisgraph-bulk-loader

5.生成csv文件

假设已经有数据

List<DeviceFolderCacheDto> pointsList = new ArrayList<>();
CSVUtils.assetsToCsvFile(topic.getFolderId()+"",deviceFolderCacheDtoList);

/**
     * 生成csv文件
     * @param pointsList
     * @return
     */
    public static void assetsToCsvFile(String graphName, List<DeviceFolderCacheDto> pointsList){
        log.info("assetsToCsvFile-inCnt:"+pointsList.size());
        // 表格头,按照redisgraph-bulk-loader格式要求创建
        String[] headArr = new String[]{":ID(Assets)","system_folderId:STRING","system_folderName:STRING","system_tenantId:STRING","system_folderMark:STRING","system_nodeType:STRING","system_code:STRING","system_parentId:STRING","system_folderDesc:STRING"};
        //CSV文件路径及名称
        String filePath = mainFilePath+graphName; //CSV文件路径
        String fileName = "Assets.csv";//CSV文件名称
        File csvFile = null;
        BufferedWriter csvWriter = null;
        try {
            csvFile = new File(filePath + File.separator + fileName);
            log.info("assetsToCsvFile-csvFile:{}",csvFile.getAbsolutePath()+ csvFile.getName());
            File parent = csvFile.getParentFile();
            log.info("assetsToCsvFile-parent:{}",parent.getAbsolutePath()+parent.getName());
            if (parent != null && !parent.exists()) {
                boolean flag = parent.mkdirs();
                log.info("assetsToCsvFile-parent.mkdirs():{}",flag);
            }
            boolean flag = csvFile.createNewFile();
            log.info("assetsToCsvFile-csvFile.createNewFile():{}",flag);
            csvWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(csvFile), "UTF-8"), 1024);

            // 写入文件头部标题行
            csvWriter.write(String.join("|", headArr));
            csvWriter.newLine();

            // 写入文件内容
            int cnt = 0;
            int totalCnt = 0;
            if(pointsList!=null && pointsList.size() > 0){
                for (DeviceFolderCacheDto points : pointsList) {
                    csvWriter.write(points.toRow());
                    cnt ++;
                    if(cnt>=1000){
                        csvWriter.flush();
                        cnt = 0;
                    }
                    totalCnt++;
                    csvWriter.newLine();
                }
            }
            csvWriter.flush();
            log.info("assetsToCsvFile-outCnt:"+totalCnt);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                csvWriter.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

python3 redisgraph_bulk_loader/bulk_insert.py GRAPHNAME [OPTIONS]

Flags	Extended flags	Parameter
-h	--host TEXT	Redis server host (default: 127.0.0.1) redisgraph地址：docker版本地址为容器名
-p	--port INTEGER	Redis server port (default: 6379)
-a	--password TEXT	Redis server password (default: none)
-n	--nodes TEXT	Path to Node CSV file with the filename as the Node Label 指定节点，如果有多个节点，可这么写 --node xxx.csv --node xxx2.csv ，文件名则为node名称
-N	--nodes-with-label TEXT	Node Label followed by path to Node CSV file
-r	--relations TEXT	Path to Relationship CSV file with the filename as the Relationship Type 同节点写法
-R	--relations-with-type TEXT	Relationship Type followed by path to relationship CSV file
-o	--separator CHAR	Field token separator in CSV files (default: comma) CSV文件分隔符，建议使用一个\| 或者多个\|，因为逗号经常容易出错
“”“”-d	--enforce-schema	Requires each cell to adhere to the schema defined in the CSV header 强行按照自定义表头的类型进行录入，不过不指定则按照默认推断
-s	--skip-invalid-nodes	Skip nodes that reuse previously defined IDs instead of exiting with an error 跳过错误node而不是退出
-e	--skip-invalid-edges	Skip edges that use invalid IDs for endpoints instead of exiting with an error 跳过错误边而不是退出
-q	--quote INT	The quoting format used in the CSV file. QUOTE_MINIMAL=0,QUOTE_ALL=1,QUOTE_NONNUMERIC=2,QUOTE_NONE=3
-t	--max-token-count INT	(Debug argument) Max number of tokens sent in each Redis query (default 1024)
-b	--max-buffer-size INT	(Debug argument) Max batch size (MBs) of each Redis query (default 4096)
-c	--max-token-size INT	(Debug argument) Max size (MBs) of each token sent to Redis (default 500)
-i	--index Label:Property	After bulk import, create an Index on provided Label:Property pair (optional)
-f	--full-text-index Label:Property	After bulk import, create an full text index on provided Label:Property pair (optional)

实体属性

无需明确提供属性类型。
属性不需要专门由任何类型组成。
批量加载器当前支持的类型为：
- boolean：true或false（不区分大小写，不加引号）。
- integer：一个无引号的值，可以将其读取为整数类型。
- double：一个无引号的值，可以将其读取为浮点类型。
- string：任何用引号插入或不能转换为数字或布尔类型的字段。
- array：任何类型的元素的括号内插数组。数组中的字符串必须明确地用引号插入。数组属性要求对CSV（-o）使用非逗号分隔符。
Cypher不允许将NULL值分配给属性。
默认行为是推断属性类型，尝试按该顺序将其强制转换为整数，浮点数，布尔值或字符串。
的--enforce-schema标志和一个输入模式，如果类型推断不期望应使用。

标签文件格式：

每行必须具有相同数量的字段。
前导和尾随空格将被忽略。
如果不使用Input Schema，则标签文件的第一个字段将是节点标识符，如Node Identifiers中所述。
所有字段都是将与每个节点关联的属性键。

关系文件

每行必须具有相同数量的字段。
前导和尾随空格将被忽略。
如果不使用Input Schema，则每行的前两个字段是源节点标识符和目标节点标识符。标头中这些字段的名称无关紧要。
如果文件具有两个以上的字段，则所有后续字段都是关系属性，它们遵循与节点属性相同的规则。
所描述的关系始终被认为是定向的（源->目标）。

输入模式

如果--enforce-schema指定了该标志，则所有输入的CSV均应在标题中指定每一列的数据类型。

此格式取消了默认CSV格式的一些限制，例如ID字段为第一列。

大多数标题字段应为属性名称及其数据类型的冒号分隔对，例如Name:STRING。某些数据类型不需要名称字符串，如下所示。

可接受的数据类型为：

Type String	Description	Requires name string
ID	Label files only - Unique identifier for a node	Optional
START_ID	Relation files only - The ID field of this relation's source	No
END_ID	Relation files only - The ID field of this relation's destination	No
IGNORE	This column will not be added to the graph	Optional
DOUBLE / FLOAT	A signed 64-bit floating-point value	Yes
INT / INTEGER / LONG	A signed 64-bit integer value	Yes
BOOLEAN	A boolean value indicated by the string 'true' or 'false'	Yes
STRING	A string value	Yes
ARRAY	An array value

ID命名空间

通常，节点标识符在所有输入CSV中必须唯一。使用输入模式时，（可选）可以创建ID名称空间，并且标识符仅需要在其名称空间中是唯一的。当每个输入CSV的主键相互重叠时，这特别有用。

要引入名称空间，请在:ID类型字符串后加上括号内插的名称空间字符串，例如:ID(User)。应该在关系文件的:START_ID或:END_ID字段中指定相同的名称空间，如中所述:START_ID(User)。

引用连接：https://github.com/redisgraph/redisgraph-bulk-loader

最终生成的命令示例：

python3 redisgraph-bulk-loader 3661091848455168 --host redisgraph --port 6379 --enforce-schema --separator '|' --skip-invalid-edges --nodes /data/redisgraph-data/3661091848455168/Point.csv --nodes /data/redisgraph-data/3661091848455168/Assets.csv  --relations /data/redisgraph-data/3661091848455168/assets_assets.csv --relations /data/redisgraph-data/3661091848455168/assets_point.csv

在python3容易的/usr/local/bin 下面执行该命令则可以导入数据

注：一个图只能一次性生成，如果已经存在该图名称则报错

那一年天空很高风很清澈

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
RedisGraph利用redisgraph-bulk-loader批量导入数据

目录前言一、redisgraph-bulk-loader介绍二、使用步骤1.要求2.创建docker network3.创建docker redisgraph4.安装python3解释器5.生成csv文件实体属性标签文件格式：关系文件输入模式ID命名空间前言作者使用的是docker版本的redisgraph一、redisgraph-bulk-loader介绍用于从CSV输入构建RedisGraph数据库的Py...
复制链接

扫一扫