只是想记录一下自己的课程作业,如果对读者也有一些帮助就更好啦!
使用的工具为python+neo4j。
1.数据来源
以IncoPat专利数据库为数据来源,通过构建检索式(IPC=(A01C15/00)) AND (PD=[20220101 TO 20230101])筛选出2022年施肥机械相关的数据,共获取到1901条专利数据。
2.本体设计
该本体结构包括5 种实体类型,4种语义关系。
实体名称 | 实体描述 |
patentid | 专利号,专利的唯一标识符,包含属性title:专利的中文标题 |
claimer | 专利的申请人,可以是个人也可以是机构或企业 |
publicdate | 专利的公开日期 |
patenttype | 专利的类型,共有三种:发明申请、发明授权、实用新型 |
country | 专利的公开国别 |
关系名称 | 关系描述 |
claim | (claimer, claim, patentid) |
dateis | (patentid, dateis, publicdate) |
typeis | (patentid, typeis, patenttype) |
countryis | (patentid, countryis, country) |
3.实体与关系构建
4.数据存储
完整代码如下:
from py2neo import Graph,Node,Relationship,Subgraph
import csv
patentgraph = Graph("http://localhost:7474", auth=("neo4j", "neo4j"))
with open('fertilizermachine.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
#create entities
patentid = Node("patentid", name=row["公开(公告)号"])
patentid['title'] = row["标题 (中文)"]#create attributes
patentgraph.create(patentid)
claimer = patentgraph.nodes.match("claimer", name=row["申请人"]).first()
if not claimer:
claimer = Node("claimer", name=row["申请人"])
patentgraph.create(claimer)
publicdate = patentgraph.nodes.match("publicdate", name=row["公开(公告)日"]).first()
if not publicdate:
publicdate = Node("publicdate", name=row["公开(公告)日"])
patentgraph.create(publicdate)
patenttype = patentgraph.nodes.match("patenttype", name=row["专利类型"]).first()
if not patenttype:
patenttype = Node("patenttype", name=row["专利类型"])
patentgraph.create(patenttype)
country = patentgraph.nodes.match("country", name=row["公开国别"]).first()
if not country:
country = Node("country", name=row["公开国别"])
patentgraph.create(country)
#create relationships
claim = Relationship(claimer, "claim", patentid)
dateis = Relationship(patentid, "dateis", publicdate)
typeis = Relationship(patentid, "typeis", patenttype)
countryis = Relationship(patentid, "countryis", country)
A = Subgraph(relationships=[claim, dateis, typeis, countryis])
patentgraph.create(A)
print(row)
5.图谱展示
done√