MYSQL数据以csv文件到入neo4j:反欺诈简单案例

       该种方式导入只适合小数据量的方式,并且节点的id创建困难的情况下。在实测的情况下,200万条数据,每条数据创建9个节点,8条边的情况下需要7~8分钟。

1.测试数据准备

    数据规则,一个订单有多个角色,一个客户可以属于多个订单,一个客户可谓不同角色,具体表结构如下:

字段类型注释
order_idvarchar(50)订单id
order_dtvarchar(50)订单日期
cust_namevarchar(50)客户名称
id_novarchar(50)证件号
role_typevarchar(50)角色
mobile_phonevarchar(50)手机号
home_areavarchar(50)家庭区域编码
home_detailvarchar(50)家庭详细地址
company_areavarchar(50)公司区域编码
company_detailvarchar(50)公司详细地址

  导出的CVS文件anti_fraud.csv,字符集为UTF-8,内容如下:

order_id,order_dt,cust_name,id_no,role_type,mobile_phone,home_area,home_detail,company_area,company_detail
"201908010001","2019-08-01","张三","1001","01","10000010001","111101","家庭地址1","111101","公司地址1"
"201908010001","2019-08-01","赵四","1002","02","10000010002","111101","家庭地址2","111101","公司地址2"
"201908020002","2019-08-02","王一","1003","01","10000010003","111102","家庭地址3","111102","公司地址3"
"201908020002","2019-08-02","李二","1004","02","10000010004","111102","家庭地址4","111102","公司地址4"
"201908020003","2019-08-02","黄五","1005","01","10000010001","111102","家庭地址5","111103","公司地址5"
"201908020003","2019-08-02","马六","1006","02","10000010006","111102","家庭地址6","111102","公司地址6"
"201908020004","2019-08-02","张三","1001","02","10000010001","111102","家庭地址1","111103","公司地址5"
"201908020004","2019-08-02","伟奇","1007","01","10000010007","111102","家庭地址7","111102","公司地址7"

2.neo4j关系设计

2.1 node设计

     将不同的信息放入不同的实体中,不同的实体中属性名称可以重复,但不推荐,具体实体如下:

订单实体          (Order{order_id,order_dt})
人实体             (Person{cust_name,id_no,role_type})
手机号码实体  (Mobilephone{mobile_phone})
家庭地址实体  (HomeAddress{home_area,home_detail})
公司地址实体  (CompanyAddress{company_area,company_detail})

2.2 label设计

    边可以添加属性,用与区分实体之间的关系,边的信息如下:

人与订单关系          (Person)-[person_appl{order_id}]->(Order)
手机号与人关系      (Mobilephone)-[phone_appl{order_id}]->(Person)
家庭地址与人关系  (HomeAddress)-[homeAddr{order_id}]->(Person)
公司地址与人关系  (companyAddr)-[companyAddr{order_id}]->(Person)

3.csv导入neo4j

    首先将csv文件放到neo4j根目录下的import文件夹下,

3.1创建node

     唯一的节点创建从数据表中取出并不困难,将每一个节点的数据保存为一个文件,并且去重去空,5个节点分别保存5个文件。例如取出order节点的csv的sql语句为:

select distinct order_id,order_dt from anti_fraud where order_dt is not null and order_id is not null;

   将节点导入neo4j数据库,在创建接节点时需要首先创建索引(注:实测过程中先创建索引后创建节点索引才生效),提高边的效率,创建语句如下:

create index on :Order(order_id,order_dt) ;
create index on :Person(cust_name,id_no,role_type) ;
create index on :Mobilephone(mobile_phone);
create index on :HomeAddress(home_area,home_detail);
create index on :CompanyAddress(company_area,company_detail);

LOAD CSV WITH HEADERS  FROM "file:///order.csv" AS line
CREATE (appl:Order{order_id:line.order_id,order_dt:line.order_dt});
  
LOAD CSV WITH HEADERS  FROM "file:///person.csv" AS line
CREATE (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type});

LOAD CSV WITH HEADERS  FROM "file:///mobilephone.csv" AS line
CREATE (m:Mobilephone{mobile_phone:line.mobile_phone});

LOAD CSV WITH HEADERS  FROM "file:///homeAddress.csv" AS line   
CREATE (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail});

LOAD CSV WITH HEADERS  FROM "file:///companyAddress.csv" AS line
CREATE (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail});

 

3.2 创建label

    在创建边时关系以两个实体创建方案最优,如创建实体时多个match命令为交集,若一个实体未查询到,则关系不创建,创建边的关系如下:

LOAD CSV WITH HEADERS  FROM "file:///anti_fraud.csv" AS line
MATCH (appl:Order{order_id:line.order_id,order_dt:line.order_dt})   
MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
MERGE (person)-[r:person_appl{order_id:line.order_id}]->(appl);

LOAD CSV WITH HEADERS  FROM "file:///anti_fraud.csv" AS line
MATCH (m:Mobilephone{mobile_phone:line.mobile_phone})   
MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
MERGE (m)-[r:phone_appl{order_id:line.order_id}]->(person);

LOAD CSV WITH HEADERS  FROM "file:///anti_fraud.csv" AS line
MATCH (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail})   
MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
MERGE (h)-[r:homeAddr{order_id:line.order_id}]->(person);

LOAD CSV WITH HEADERS  FROM "file:///anti_fraud.csv" AS line
MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})   
MATCH (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail})
MERGE (c)-[r:companyAddr{order_id:line.order_id}]->(person);

3.3 执行命令

    以shell方式批量执行,到neo4j下home 执行./bin/cypher-shell -u neo4j -p neo4j,执行以上命令便可以创建,创建内容如下;

 

4.将执行命令包装为shell脚本

4.1 创建neo4j命令文件

       若在少量情况下可以用命令语句删除,删除语句如下:

MATCH data=(n)-[r1]->(person:Person)-[r2]->(appl:Appl) delete n,r1,person,r2,appl;

     在实际情况中,由于服务器资源有限,大量的查询删除会出现内存不足的情况。因此在全量删除数据时可以将整个数据库删除重启,具体步骤如下:

1.停止neo4j服务
./bin/neo4j stop

2.删除neo4j数据库文件,最好最备份
rm -rf $neo4jhome/data/databases/graph.db

3.启动neo4j服务
./bin/neo4j start

   创建import_neo4j.cql文件,将执行的neo4j命令写入一个文件,前两行为批量删除,以全量导入的方式实现,若是增量导入,去掉前两行删除命令,文件内容如下:

create index on :Order(order_id,order_dt) ;
create index on :Person(cust_name,id_no,role_type) ;
create index on :Mobilephone(mobile_phone);
create index on :HomeAddress(home_area,home_detail);
create index on :CompanyAddress(company_area,company_detail);

LOAD CSV WITH HEADERS  FROM "file:///order.csv" AS line
CREATE (appl:Order{order_id:line.order_id,order_dt:line.order_dt});
  
LOAD CSV WITH HEADERS  FROM "file:///person.csv" AS line
CREATE (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type});

LOAD CSV WITH HEADERS  FROM "file:///mobilephone.csv" AS line
CREATE (m:Mobilephone{mobile_phone:line.mobile_phone});

LOAD CSV WITH HEADERS  FROM "file:///homeAddress.csv" AS line   
CREATE (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail});

LOAD CSV WITH HEADERS  FROM "file:///companyAddress.csv" AS line
CREATE (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail});


LOAD CSV WITH HEADERS  FROM "file:///anti_fraud.csv" AS line
MATCH (appl:Order{order_id:line.order_id,order_dt:line.order_dt})   
MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
MERGE (person)-[r:person_appl{order_id:line.order_id}]->(appl);

LOAD CSV WITH HEADERS  FROM "file:///anti_fraud.csv" AS line
MATCH (m:Mobilephone{mobile_phone:line.mobile_phone})   
MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
MERGE (m)-[r:phone_appl{order_id:line.order_id}]->(person);

LOAD CSV WITH HEADERS  FROM "file:///anti_fraud.csv" AS line
MATCH (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail})   
MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
MERGE (h)-[r:homeAddr{order_id:line.order_id}]->(person);

LOAD CSV WITH HEADERS  FROM "file:///anti_fraud.csv" AS line
MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})   
MATCH (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail})
MERGE (c)-[r:companyAddr{order_id:line.order_id}]->(person);

4.2 创建调用cql文件

    创建import_csv_neo4j.sh文件,删除读取出import_neo4j.cql文件内容,创建实体与边。

#!/bin/bash
#--------------------------------------
# filename:    import_neo4j.sh
# description: 将csv导入neo4j
# author:      test-abc
# date:        2019.08.05
#--------------------------------------
neo4jhome='/home/neo4j/neo4j-community-3.5.7'
username=neo4j
passwd=neo4j
cd $neo4jhome

source /etc/profile
#stop neo4j service
echo 'stop neo4j service'
./bin/neo4j stop

#delete neo4j database
echo 'delete neo4j database'
rm -rf $neo4jhome/data/databases/graph.db

#start neo4j service
echo 'start neo4j service'
./bin/neo4j start

sleep 30

echo "start importing neo4j"
cat $neo4jhome/import/import_neo4j.cql | ./bin/cypher-shell -u $username -p $passwd > $neo4jhome/import/import_csv_neo4j.log
echo "end of import neo4j"

4.3 调用shell脚本

     给import_csv_neo4j.sh文件添加执行权限,执行该shell脚本

chmod +x import_csv_neo4j.sh
./import_csv_neo4j.sh

 

 

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值