本文拟构建一个科研人员共同完成一篇论文的合作社会网络。
实验步骤如下:
- 提取每篇论文中作者
- 去除非法字符
- 构建图的结点和权重(合作论文的篇数,其中两个结点相同是表示该作者总共发表的论文数量)
- 保存文件
- 将文件处理成Gephi可识别的格式
import csv
import pandas as pd
row_count = 0
authors_set = set()
# 合作关系与权重(合作的次数)
authors_graph = {}
with open(r'./data/author.csv', 'r') as f:
authors_reader = csv.reader(f)
for row in authors_reader:
row_count += 1
print(f'读取到第:{row_count}行')
authors_row = []
for author in row:
if len(author) == 0:
continue
# 去除空字符串和前后空格
author = author.strip()
authors_row.append(author)
authors_row_num = len(authors_row)
for i in range(authors_row_num):
for j in range(i, authors_row_num):
# 因为是无向图所以只记录了一个
if f'{authors_row[i]}, {authors_row[j]}' in authors_graph.keys():
authors_graph[f'{authors_row[i]}, {authors_row[j]}'] += 1
elif f'{authors_row[i]}, {authors_row[j]}' in authors_graph.keys():
authors_graph[f'{authors_row[i]}, {authors_row[j]}'] += 1
else:
authors_graph[f'{authors_row[i]}, {authors_row[j]}'] = 1
# 将结果写入excel
key = list(authors_graph.keys())
value = list(authors_graph.values())
result_excel = pd.DataFrame()
result_excel["结点"] = key
result_excel["权重"] = value
writer = pd.ExcelWriter(r'./data/author_graph.xlsx', mode="w+")
result_excel.to_excel(writer, index=False)
writer.save()
print('success!!!')
原始数据
保存的数据
最终数据
存在的问题:
- 数据量太庞大(相当于Gephi),约21万条数据,Gephi处理有点慢。(希望同志们多多努力!)
- 如果需要数据私聊!免费给!!