Pandas中的数据结构:
- Series,序列,大小不可变;
- DataFrame,数据帧
- Panel,面板
这三种数据结构都是基于Numpy数组构建的,除Series外,其他两种大小可变。
创建Series示例:
>>> s = pd.Series([1, 3, 5, np.nan, 6])
创建DataFrame示例:
>>> dates = pd.date_range('20170101', periods = 7)
>>> df = pd.DataFrame(np.random.randn(7, 4), index = dates, columns = list('ABCD'))
>>> df
A B C D
2017-01-01 0.026631 -1.123369 -0.593639 -0.450369
2017-01-02 0.381743 1.069795 -0.521409 -1.688327
2017-01-03 0.173431 1.858795 0.702816 0.338846
2017-01-04 0.497656 -1.271299 -0.698368 0.106706
2017-01-05 1.521118 -0.622020 0.407636 1.247326
2017-01-06 -2.562246 -1.194964 -1.659602 -0.038506
2017-01-07 1.098590 0.422019 0.416406 0.088594
>>>
>>> pd.DataFrame([1,2,3,4,5]) # 生成5行1列的df
0
0 1
1 2
2 3
3 4
4 5
>>>
>>> pd.DataFrame([['Alex', 10], ['Bob',12], ['Clarke', 13]]) # 3行2列
0 1
0 Alex 10
1 Bob 12
2 Clarke 13
也就是说,最外层的' [ ] ' 括住的是列,里层的 ' [ ] ' 是一行,外层的' [ ] '括住了多少个成员,就有多少行。
DataFrame增删行/列的操作:
>>> # 添加新列
>>> df['new'] = ...
>>> # 删除列
>>> del df['one']
>>> df.pop('two')
>>> # 附加新的行
>>> df = df.append(df_2, ignore_index=True)
>>> # 删除行
>>> df = df.drop(0) # 0表示index
这里注意append()方法的notes:
Notes
-----
Iteratively appending rows to a DataFrame can be more computationally
intensive than a single concatenate. A better solution is to append
those rows to a list and then concatenate the list with the original
DataFrame all at once. 先把rows整合成一个list,再一起添加到df中。