目录
1.包含字典的嵌套字典,转DataFrame
import pandas as pd
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame = pd.DataFrame(pop)
frame
Out[3]:
Nevada Ohio
2001 2.4 1.7
2002 2.9 3.6
2000 NaN 1.5
类似于numpy的转置:
frame.T
Out[4]:
2001 2002 2000
Nevada 2.4 2.9 NaN
Ohio 1.7 3.6 1.5
1.2 构建dict,并转DataFrame
ans_weight = {'class1': 10.23, 'class3': 18.38}
difference = ans_weight['class3'] - ans_weight['class1']
ans_weight['total'] = difference
print(ans_weight)
{'class1': 10.23, 'class3': 18.38, 'total': 8.15}
ans_weight.items()
Out[15]: dict_items([('class1', 10.23), ('class3', 18.38), ('total', 8.149999999999999)])
list(ans_weight.items())
Out[16]: [('class1', 10.23), ('class3', 18.38), ('total', 8.149999999999999)]
转为DataFrame
# Convert the ans_weight dictionary to a pandas DataFrame
df_ans_weight = pd.DataFrame(list(ans_weight.items()), columns=['Class', 'Value'])
Out[17]:
Class Value
0 class1 10.23
1 class3 18.38
2 total 8.15
2. 利用包含等长度列表或NumPy数组的 字典dict 来形成DataFrame
注意:等长度 的列表或数组
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002, 2003],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)
frame
Out[8]:
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
5 Nevada 2003 3.2
指定列的顺序
pd.DataFrame(data, columns=['year', 'state', 'pop'])
Out[9]:
year state pop
0 2000 Ohio 1.5
1 2001 Ohio 1.7
2 2002 Ohio 3.6
3 2001 Nevada 2.4
4 2002 Nevada 2.9
5 2003 Nevada 3.2
3. pd.Series 转为 dict
C = [6367, 18]
pd.Series(C).value_counts().to_dict() # C 为list
输出结果:
{0: 6367, -1: 1103, 1: 18}
4.快速给dataframe重命名 columns = dict(zip(list1, list2))
df_10min = df_10min.rename(columns=dict(zip(df_10min.columns, wind_profile_clustering.height)))
5. List生成DataFrame
将两个列表(height_ws和height_wd)生成一个DataFrame
height_ws = sorted(
list(set([int(re.split('[_-]', hh)[1]) for hh in new_columns if hh != 'time' and 'ws' in hh])))
height_wd = sorted(
list(set([int(re.split('[_-]', hh)[1]) for hh in new_columns if hh != 'time' and 'wd' in hh])))
from itertools import zip_longest
zipped = zip_longest(height_ws, height_wd, fillvalue=None)
df2 = pd.DataFrame(zipped, columns=['ws', 'wd'])