1,图数据库的建立方式
①使用create优点是实时插入,但是速度较慢,可以结合分片处理
tx = session.begin_transaction()
while True:
rows = cur.fetchmany(1000)
if not rows: break
for row in rows:
obj = ({
'label': 'product',
'inventory_item_id': str(row[0]),
'ITEM_ID': row[1],
'PRODUCT_CODE': row[2],
'enabled_flag': row[3],
'DESCRIPTION': row[4],
'PRODUCT_TYPE': row[5],
'PRODUCTLINE': row[6]
})
tx.run('create (product:' + obj['label'] + '{inventory_item_id:$inventory_item_id,ITEM_ID:$ITEM_ID,'
'PRODUCT_CODE:$PRODUCT_CODE,enabled_flag:$enabled_flag,'
'DESCRIPTION:$DESCRIPTION,PRODUCT_TYPE:$PRODUCT_TYPE,'
'PRODUCTLINE:$PRODUCTLINE})',**obj)
②如果使用csv方式,可以先将数据库的数据生成csv文件
ip = "xxx"
port = 22
user = "xxx"
password = "xxx"
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(ip, port, user, password)
# file_name = ssh.exec_command("/opt/neo4j/import/relation.csv")
file_name='E:\\MySelfcode\\neo4j\\relation.csv'
with open(file_name, 'w', encoding='utf-8', newline='\n') as f:
cur = connection.cursor()
head=[]
write = csv.writer(f)
count=cur.execute("""SELECT id,
parent_id
FROM cms.CRM_T_COMPANY
WHERE dw_status='A'""")
for index in count.description:
head.append(index[0])
head = tuple(head)
write.writerow(head)
while True:
rows = cur.fetchmany(10000)
if not rows: break
for row in rows:
write.writerow(row)
logging.info("文件创建完成")
github:https://github.com/Joseph025/joseph/blob/master/pyspark/05_pythonAPI%E8%BF%9E%E6%8E%A5linux_neo4j_load_csv.py
3. load方式,当使用非create方式时,file:///relation.csv文件默认是%NEO4J_HOME%/import/
USING PERIODIC COMMIT 300 LOAD CSV WITH HEADERS FROM "file:///relation.csv" AS line
match (ind1:industry{id:line.industry_id}),(ind2:industry {id:line.parent_id})
merge (ind1)-[r:行业从属{type:"行业从属"}]->(ind2)
/*
file:///relation.csv的默认文件夹是安装有neo4j这台机器的%NEO4J_HOME%/import/,
这是一种保护措施,外部路径访问不到,例如G://neo4j//relation.csv,要想访问外部数据,需要修改
%NEO4J_HOME%/conf/neo4j.conf,修改dbms.directories.import设置为空,可删除此保护
1、USING PERIODIC COMMIT 300
使用自动提交,每满300条提交一次,防止内存溢出
2、WITH HEADERS
从文件中读取第一行作为参数名,只有在使用了该参数后,才可以使用line._id这样的表示方式,
否则只用line[0]的表示方式
3、AS line
为每行数据重命名
4、MERGE
merge有更新插入的操作,数据存在更新,不存在插入,防止多次create造成数据重复
*/