首先看一下整个数据集的组织结构
这里是数据集的下载地址
https://drive.google.com/drive/folders/1NIAo4KptskBytbXoOqmF3Sto4hTX3JIH
我选择的数据集是sf0.01,是最小的一个,选择的是其中snapshot里面的数据。
首先我是先打算导入节点信息,即 Loan,Account,Medium,Company,Person
顶点导入
首先是Loan
LOAD CSV WITH HEADERS FROM "file:///snapshot/Loan.csv" AS row
FIELDTERMINATOR '|'
MERGE (loan:Loan {Id: row.loanId})
ON CREATE SET loan.loanAmount = toFloat(row.loanAmount),
loan.balance = toFloat(row.balance),
loan.usage = row.loanUsage,
loan.interestRate = toFloat(row.interestRate)
首先第一个问,因为文件中是用‘|’作为分隔符,所以要指定‘|’作为分隔符,即命令FIELDTERMINATOR '|',
后面的几个顶点都是相同的操作
Person:
LOAD CSV WITH HEADERS FROM "file:///snapshot/Person.csv" AS row
FIELDTERMINATOR '|'
MERGE (person:Person {Id: row.personId})
ON CREATE SET person.name= row.personName,
person.isBlocked= toBoolean(row.isBlocked),
person.createTime= row.createTime,
person.gender= row.gender,
person.birthday=row.birthday,
person.country= row.country,
person.city= row.city
因为csv文件中,关于时间的标准,貌似并不是neo4j支持的ISO 8601标准,所以我并没有用datetime()函数去处理,而是直接默认他是string。
Medium
LOAD CSV WITH HEADERS FROM "file:///snapshot/Medium.csv" AS row
FIELDTERMINATOR '|'
MERGE (medium:Medium {Id: row.mediumId})
ON CREATE SET medium.type=row.mediumType,
medium.isBlocked= toBoolean(row.isBlocked),
medium.createTime= row.createTime,
medium.lastLoginTime=row.lastLoginTime,
medium.riskLevel=row.riskLevel
时间问题仍然是使用string类型代替
Account
LOAD CSV WITH HEADERS FROM "file:///snapshot/Account.csv" AS row
FIELDTERMINATOR '|'
MERGE (p:Account {Id: row.accountId})
ON CREATE SET p.createTime=row.createTime,
p.isBlocked= toBoolean(row.isBlocked),
p.type=row.accountType,
p.nickname=row.nickname,
p.phoneNumber=row.phonenum,
p.email=row.email,
p.freqLoginType=row.freqLoginType,
p.lastLoginTime=row.lastLoginTime,
p.accountLevel=row.accountLevel
Conpany
LOAD CSV WITH HEADERS FROM "file:///snapshot/Company.csv" AS row
FIELDTERMINATOR '|'
MERGE (p:Company {Id: row.companyId})
ON CREATE SET p.name=row.companyId,
p.isBlocked= toBoolean(row.isBlocked),
p.createTime=row.createTime,
p.country=row.country,
p.city=row.city,
p.business=row.business,
p.description=row.description,
p.url=row.url
关系导入
repay关系 (Account---->Loan)
LOAD CSV WITH HEADERS FROM "file:///snapshot/AccountRepayLoan.csv" AS row
FIELDTERMINATOR '|'
MATCH (account:Account {Id: row.accountId})
MATCH (loan:Loan {Id: row.loanId})
CREATE (account)-[:repay {amount: toFloat(row.amount), createTime: row.createTime}]->(loan)
transfer关系(account----->account)
LOAD CSV WITH HEADERS FROM "file:///snapshot/AccountTransferAccount.csv" AS row
FIELDTERMINATOR '|'
MATCH (account1:Account {Id: row.fromId})
MATCH (account2:Account {Id: row.toId})
CREATE (account1)-[:transfer {amount: toFloat(row.amount), createTime: row.createTime,orderNum:row.orderNum,comment:row.comment,payType:row.payType,goodsType:row.goodsType}]->(account2)
withdraw关系(account----->account)
LOAD CSV WITH HEADERS FROM "file:///snapshot/AccountTransferAccount.csv" AS row
FIELDTERMINATOR '|'
MATCH (account1:Account {Id: row.fromId})
MATCH (account2:Account {Id: row.toId})
CREATE (account1)-[:withraw {amount: toFloat(row.amount), createTime: row.createTime}]->(account2)
apply关系 (company----->loan)
LOAD CSV WITH HEADERS FROM "file:///snapshot/CompanyApplyLoan.csv" AS row
FIELDTERMINATOR '|'
MATCH (company:Company {Id: row.companyId})
MATCH (loan:Loan {Id: row.loanId})
CREATE (company)-[:apply { createTime: row.createTime,org:row.org}]->(loan)
guarantee关系 (company-------->company)
LOAD CSV WITH HEADERS FROM "file:///snapshot/CompanyGuaranteeCompany.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Company {Id: row.fromId})
MATCH (p2:Company {Id: row.toId})
CREATE (p1)-[:guarantee { createTime: row.createTime,relation:row.relation}]->(p2)
invest关系 (company----->company)
LOAD CSV WITH HEADERS FROM "file:///snapshot/CompanyInvestCompany.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Company {Id: row.investorId})
MATCH (p2:Company {Id: row.companyId})
CREATE (p1)-[:invest { ratio:tofloat(row.ratio),createTime: row.createTime}]->(p2)
own关系 (company------>account)
LOAD CSV WITH HEADERS FROM "file:///snapshot/CompanyOwnAccount.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Company {Id: row.companyId})
MATCH (p2:Account {Id: row.accountId})
CREATE (p1)-[:own {createTime: row.createTime}]->(p2)
deposit关系(loan------>account)
LOAD CSV WITH HEADERS FROM "file:///snapshot/LoanDepositAccount.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Loan {Id: row.loanId})
MATCH (p2:Account {Id: row.accountId})
CREATE (p1)-[:deposit {amount:tofloat(row.amount),createTime: row.createTime}]->(p2)
signln关系 (medium------>account)
LOAD CSV WITH HEADERS FROM "file:///snapshot/MediumSignInAccount.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Medium{Id: row.mediumId})
MATCH (p2:Account {Id: row.accountId})
CREATE (p1)-[:signin {createTime: row.createTime,location:row.location}]->(p2)
apply关系(person------>loan)
LOAD CSV WITH HEADERS FROM "file:///snapshot/PersonApplyLoan.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Person{Id: row.personId})
MATCH (p2:Loan{Id: row.loanId})
CREATE (p1)-[:apply {createTime: row.createTime,org:row.org}]->(p2)
guarantee关系 (person----->person)
LOAD CSV WITH HEADERS FROM "file:///snapshot/PersonGuaranteePerson.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Person{Id: row.fromId})
MATCH (p2:Person{Id: row.toId})
CREATE (p1)-[:guarantee {createTime: row.createTime,relation:row.relation}]->(p2)
invest关系 (person------>company)
LOAD CSV WITH HEADERS FROM "file:///snapshot/PersonInvestCompany.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Person {Id: row.investorId})
MATCH (p2:Company {Id: row.companyId})
CREATE (p1)-[:invest { ratio:tofloat(row.ratio),createTime: row.createTime}]->(p2)
own关系 (person----->account)
LOAD CSV WITH HEADERS FROM "file:///snapshot/PersonOwnAccount.csv" AS row
FIELDTERMINATOR '|'
MATCH (p1:Person {Id: row.personId})
MATCH (p2:Account {Id: row.accountId})
CREATE (p1)-[:own {createTime: row.createTime}]->(p2)
至此,所有关系的导入也结束了。
部分关系的属性我可能没有给他设置数据类型,如果没有特别设置,他默认貌似是string,你们在写的时候,如果注意到某些属性应该设置成什么类型欢迎评论告诉我。