Python-Pandas之DataFrame用法总结

置顶

ckSpark

已于 2022-08-23 17:29:38 修改

阅读量5.7w

点赞数 51

分类专栏： python学习文章标签： DataFrame Python

于 2018-10-14 20:40:00 首次发布

本文链接：https://blog.csdn.net/MsSpark/article/details/83050572

版权

DataFrame：类似于表的数据结构

通过与array以及series对比进行学习，会更清楚DataFrame的用法和特点。

Array/Series/DataFrame对比学习
本文对Pandas包中二维（多维）数据结构DataFrame的特点和用法进行了总结归纳。
可以参考：pandas用法速览
Pandas包之DataFrame

3.1 增加数据

3.1.1 创建数据框Object Creation

numpy.random.randn(m,n)：是从标准正态分布中返回m行n列个样本中；
numpy.random.rand(m,n)：是从[0,1)中随机返回m行n列个样本。

import pandas as pd
import numpy as np
#通过Numpy array来创建数据框
dates=pd.date_range('2018-09-01',periods=7)
dF1=pd.DataFrame(np.random.rand(7,4),index=dates) #从[0,1)中随机返回一个数组
>>>
		0		1		2		3
2018-09-01	0.445283	0.798458	0.818208	0.340356
2018-09-02	0.249172	0.535308	0.811825	0.224133
2018-09-03	0.466948	0.178802	0.997567	0.361670
2018-09-04	0.720670	0.407122	0.120310	0.180888
2018-09-05	0.545400	0.169919	0.171649	0.030347
2018-09-06	0.553405	0.013866	0.582740	0.030837
2018-09-07	0.185981	0.137448	0.817721	0.768875

#通过dict来创建数据框
dataDict={
   'A':1.,
          'B':pd.Timestamp('20180901'),
          'C':pd.Series(1,index=range(4),dtype='float'),
          'D':np.array([3]*4,dtype='int'),
          'E':pd.Categorical(['test','train','test','train']),
          'F':'foo'
         }
dF2=pd.DataFrame(dataDict)
>>>
	A	B		C	D	E	F
0	1.0	2018-09-01	1.0	3	test	foo
1	1.0	2018-09-01	1.0	3	train	foo
2	1.0	2018-09-01	1.0	3	test	foo
3	1.0	2018-09-01	1.0	3	train	foo

3.1.2 整合数据

Concat/Merge/Append
Concat:将数据框拼接在一起（可按rows或columns）
Merge:类似于SQL中Join的用法
Append:将数据按rows拼接到数据框中

#Concat:将数据框拼接在一起（可按rows或columns）
dF=pd.DataFrame(np.random.randn(10,4))
>>>

	0		1		2		3
0	-1.135930	-0.371505	0.349293	-2.788323
1	-0.505594	0.012753	0.539757	0.044460
2	1.208134	-0.436352	1.361564	-0.777053
3	-0.909025	0.929461	0.411863	0.866106
4	-0.300255	-0.023755	-1.382157	0.042096
5	0.335969	-0.176301	0.751841	-0.016906
6	0.545919	1.202155	0.705825	-2.305620
7	-0.820600	-2.588532	-0.475357	0.475708
8	-0.097844	0.141700	0.322873	0.586568
9	0.941772	0.789850	-1.017382	-0.762623

#将数据框拆分后在拼接
pieces1=dF[:3]
>>>
	0		1		2		3
0	-1.135930	-0.371505	0.349293	-2.788323
1	-0.505594	0.012753	0.539757	0.044460
2	1.208134	-0.436352	1.361564	-0.777053

pieces2=dF[3:7]
pieces3=dF[7:] 
pd.concat([pieces1,pieces2,pieces3],axis=0) #拼接

#Merge（类似于SQL中Join的用法）
left=pd.</

最低0.47元/天解锁文章