本文实例讲述了Python数据分析模块pandas用法。分享给大家供大家参考,具体如下:
一 介绍
pandas(Python Data Analysis Library)是基于numpy的数据分析模块,提供了大量标准数据模型和高效操作大型数据集所需要的工具,可以说pandas是使得Python能够成为高效且强大的数据分析环境的重要因素之一。
pandas主要提供了3种数据结构:
1)Series,带标签的一维数组。
2)DataFrame,带标签且大小可变的二维表格结构。
3)Panel,带标签且大小可变的三维数组。
二 代码
1、生成一维数组
>>>import pandas as pd
>>>import numpy as np
>>> x = pd.Series([1,3,5, np.nan])
>>>print(x)
01.0
13.0
25.0
3NaN
dtype: float64
2、生成二维数组
>>> dates = pd.date_range(start='20170101', end='20171231', freq='D')#间隔为天
>>>print(dates)
DatetimeIndex(['2017-01-01','2017-01-02','2017-01-03','2017-01-04',
'2017-01-05','2017-01-06','2017-01-07','2017-01-08',
'2017-01-09','2017-01-10',
...
'2017-12-22','2017-12-23','2017-12-24','2017-12-25',
'2017-12-26','2017-12-27','2017-12-28','2017-12-29',
'2017-12-30','2017-12-31'],
dtype='datetime64[ns]', length=365, freq='D')
>>> dates = pd.date_range(start='20170101', end='20171231', freq='M')#间隔为月
>>>print(dates)
DatetimeIndex(['2017-01-31','2017-02-28','2017-03-31','2017-04-30',
'2017-05-31','2017-06-30','2017-07-31','2017-08-31',
'2017-09-30','2017-10-31','2017-11-30','2017-12-31'],
dtype='datetime64[ns]', freq='M')
>>> df = pd.DataFrame(np.random.randn(12,4), index=dates, columns=list('ABCD'))
>>>print(df)
A B C D
2017-01-31-0.6825560.2441020.4508550.236475
2017-02-28-0.6300600.5906670.4824380.225697
2017-03-311.0669890.3193391.0949531.716053
2017-04-300.334944-0.053049-1.009493-1.039470
2017-05-31-0.380778-0.0444290.0756470.931243
2017-06-300.8675400.872197-0.738974-1.114596
2017-07-310.423371-1.0863860.183820-0.438921
2017-08-311.2851630.634134-0.4729731.281057
2017-09-30-1.002832-0.888122-1.316014-0.070637
2017-10-311.735617-0.2538150.5544031.536211
2017-11-302.0303840.6675561.0126980.239479
2017-12-312.059718-0.0890501.4205170.224578
>>> df = pd.DataFrame([[np.random.randint(1,100)for j in range(4)]for i in range(12)], index=dates, columns=list('ABCD'))
>>>print(df)
A B C D
2017-01-317532522
2017-02-2870997098
2017-03-3199477567
2017-04-3033701749
2017-05-3162886891
2017-06-3019751844
2017-07-3150856582
2017-08-315628776
2017-09-306173111
2017-10-318296692
2017-11-306359194
2017-12-3179586933
>>> df = pd.DataFrame({'A':[np.random.randint(1,100)for i in range(4)],
'B':pd.date_range(start='20130101', periods=4, freq='D'),
'C':pd.Series([1,2,3,4],index=list(range(4)),dtype='float32'),
'D':np.array([3]*4,dtype='int32'),
'E':pd.Categorical(["test","train","test","train"]),
'F':'foo'})
>>>print(df)
A B C D E F
0152013-01-011.03 test foo
1112013-01-022.03 train foo
2912013-01-033.03 test foo
3912013-01-044.03 train foo
>>> df = pd.DataFrame({'A':[np.random.randint(1,100)for i in range(