需求 一
从不同的csv文件中提取相同的列,然后合并成一个csv
假设有这样一张csv, column A, B, C, D 的数据分别来自a.csv, b.csv, c.csv, d.csv指定的某一列数据。
Title 列的Items是之前的csv共有的
思路:
按列合成一个dict, 再转成Data Frame, 然后保存成csv文件
dict = {'Title':['Item 1', 'Item 2', ...],
'A': [xxx, yyy, zzz, ....],
'B': [xx, '', yy, '', zz, ....],
'C': [cc, aa, '', bb, ......]}
步骤
- Step 1. 定义一个dict: merge_dict = { }
- Step 1, 将标题组成一个key: value, {Title: [‘Item 1’, ‘Item 2’, ‘Item 3’…]}, 并更新merge_dict
- Step 2,先提取某个csv中指定的列, 按key = ‘A’, value = [xx, yy, zz, …] 继续更新merge_dict
- Step 3, 循环提取其他csv中指定的列, 更新merge_dict
- Step 4, 整个结束后,merge_ditc就是上述的样式
伪代码
merge_dict = {}
# open an csv
df = read_csv('a.csv')
# get the Title items list
title_items = df['Title'].to_list()
# update merge_dict
merge_dict['Title'] = title_items
# get related values in x.csv
for csv in csv_list:
df = pd.read_csv(csv)
common_column_data = df['common_column'].to_list()
column_name = csv.split('.')[0]
merge_dict[column_name ] = common_column_data
# convert dict to data Frame
df = pd.DataFrame(merge_dict )
# save to merged csv files
df.to_csv("merged.csv", index= False)
需求 二
构造一个csv文件,按行组织,依次填充这个列
Index | Title 1 | Title 2 | Title 3 |
---|---|---|---|
1 | 1.0 | 2.0 | aa |
2 | 1.2 | 2.2 | bb |
3 | 1.3 | 2.1 | cc |
思路
- 创建dict
- dict[index] = 行元素列表, 如dict[1] = [1.0, 2.0, “aa”]
- dict转dataFrame
伪代码
data_dict = {}
for i in range raw_num:
raw_data = []
raw_data.append(i)
raw_data.append(1.0)
raw_data.append(2.1)
raw_data.append('aa')
data_dict[i] = raw_data
# convert dict to dataFrame
df = pd.DataFrame.from_dict(data_dict, orient = 'index')
df.to_csv('demo.csv', header = False, index = False)
其中
pd.DataFrame.from_dict(diata_dict, orient = 'index')
这里的orient='index'
表示,
Specify `orient='index'` to create the DataFrame using dictionary keys as rows:
>>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data, orient='index')
0 1 2 3
row_1 3 2 1 0
row_2 a b c d
如果使用orient='index'
, 可以自定义列的名字,如下所示
>>> pd.DataFrame.from_dict(data, orient='index',
... columns=['A', 'B', 'C', 'D'])
A B C D
row_1 3 2 1 0
row_2 a b c d
但是可能直接在data_dict中加入一行作为列,比如
data_dict = {}
data_dict[0] = ['Index', 'Title 1','Title 2', 'Title 3']
for i in range (1, raw_num):
raw_data = []
raw_data.append(i)
raw_data.append(1.0)
raw_data.append(2.1)
raw_data.append('aa')
data_dict[i] = raw_data
# convert dict to dataFrame
df = pd.DataFrame.from_dict(data_dict, orient = 'index')
df.to_csv('demo.csv', header = False, index = False)