csv数据的差异化对比是一类在数学建模竞赛中常见的需求情景。
作者在翻阅网上大量资料后发现没有较为成熟的帖子有关数据对比处理差异并绘制相关图表的内容,于是在近日参加数学建模竞赛并完成相关数据处理需求后,发布此贴以便于后人取用。
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
# 定义读取CSV并创建图的函数
def create_graph_from_csv(file_content, skiprows=1):
df = pd.read_csv(file_content, header=None, skiprows=skiprows, names=['start', 'end'])
G = nx.DiGraph()
for _, row in df.iterrows():
G.add_edge(row['start'], row['end'])
return G
# 读取两个CSV文件并创建图,跳过第一行(即表头)
G1 = create_graph_from_csv('3.csv', skiprows=1)
G2 = create_graph_from_csv('4.csv', skiprows=1)
# 获取两个图的边集合
edges1 = set(G1.edges())
edges2 = set(G2.edges())
# 找出新增和删除的边
added_edges = edges2 - edges1
removed_edges = edges1 - edges2
# 创建一个包含所有节点的图,用于确定所有节点的位置
all_nodes = set(G1.nodes()).union(set(G2.nodes()))
G_all_nodes = nx.DiGraph()
for node in all_nodes:
G_all_nodes.add_node(node)
# 计算所有节点的位置
pos = nx.spring_layout(G_all_nodes, k=0.3)
# 绘制图形
plt.figure(figsize=(12, 12))
# 绘制通用边
nx.draw_networkx_edges(G2, pos, edgelist=edges1.intersection(edges2), width=1, edge_color='gray')
# 绘制新增的边
nx.draw_networkx_edges(G2, pos, edgelist=added_edges, width=2, edge_color='green', alpha=0.5,
label='Added Edges ({})'.format(len(added_edges)))
# 绘制删除的边
nx.draw_networkx_edges(G2, pos, edgelist=removed_edges, style='dashed', width=2, edge_color='red', alpha=0.5,
label='Removed Edges ({})'.format(len(removed_edges)))
# 绘制所有节点
nx.draw_networkx_nodes(G_all_nodes, pos, node_size=500, node_color='lightblue', node_shape='o')
# 绘制节点标签
nx.draw_networkx_labels(G_all_nodes, pos, font_size=10)
# 设置图例
legend_elements = [plt.Line2D([0], [0], color='gray', lw=1, label='Common Edges'),
plt.Line2D([0], [0], color='green', lw=2, label='Added Edges ({})'.format(len(added_edges))),
plt.Line2D([0], [0], color='red', lw=2, linestyle='dashed',
label='Removed Edges ({})'.format(len(removed_edges)))]
plt.legend(handles=legend_elements)
# 显示图形
plt.axis('off')
plt.show()
代码运行后绘制出来的有向图:
其中红色虚线为对比两表之后所删除的路径,绿色实线为新增的路径。
部分csv数据:(左 3.csv / 右 4.csv)
各位cv工程师如果觉得代码有用别忘了点个赞噢~