pandas对象中拥有一组常用的数学和统计方法,跟NumPy数组相比,它们是基于没有缺失数据的加上构建的。
In [71]: df = DataFrame([[1.4,np.nan],[7.1,-4.5],[np.nan,np.nan],[0.75,-1.3]],index=['a','b','c','d'],columns=['one','two'])
In [72]: df
Out[72]:
one two
a 1.40 NaN
b 7.10 -4.5
c NaN NaN
d 0.75 -1.3
In [73]: df.sum()
Out[73]:
one 9.25
two -5.80
dtype: float64
NA值会自动排除,当然我们也可以通过skipna参数禁用该功能
有些方法可以累计统计数据
df.cumsum()
Out[74]:
one two
a 1.40 NaN
b 8.50 -4.5
c NaN NaN
d 9.25 -5.8
idmin和idmax返回间接统计,即返回达到最大致或最小值的索引
In [77]: df.idxmax()
Out[77]:
one b
two d
dtype: object
In [78]: df.idxmin()
Out[78]:
one d
two b
dtype: object
相关系数和协方差
有些统计时通过参数计算得到。看个例子,它的数据来自Yahoo!Finace的股票价格和成交量
pip install pandas_datareader
In [81]: from pandas_datareader import data as web
In [82]: all_data={}
In [83]: for ticker in ['AAPL','IBM','GOOG']:
...: all_data[ticker] = web.get_data_yahoo(ticker,'1/1/2000','1/1/2010')
...:
In [84]: price = DataFrame({tic:data['Adj Close'] for tic,data in all_data.iteritems()})
In [85]: volume = DataFrame({tic:data['Volume'] for tic,data in all_data.iteritems()})
In [86]: returns = price.pct_change() #计算百分比
In [87]: returns.tail()
Out[87]:
AAPL GOOG IBM
Date
2009-12-24 0.449644 -0.585705 0.033312
2009-12-28 0.286843 0.977260 0.359968
2009-12-29 -0.309294 -0.160838 -0.278636
2009-12-30 -0.074395 0.028634 -0.075809
2009-12-31 -0.144809 -0.167703 0.092164
获取的数据集格式为:
all_data
{'AAPL': Open High Low Close Adj Close Volume
Date
2000-01-03 3.745536 4.017857 3.631696 3.997768 3.610740 133949200
2000-01-04 3.866071 3.950893 3.613839 3.660714 3.306317 128094400
2000-01-05 3.705357 3.948661 3.678571 3.714286 3.354702 194580400
2000-01-06 3.790179 3.821429 3.392857 3.392857 3.064391 191993200
2000-01-07 3.446429 3.607143 3.410714 3.553571 3.209547 115183600
2000-01-10 3.642857 3.651786 3.383929 3.491071 3.153097 126266000
2000-01-11 3.426339 3.549107 3.232143 3.312500 2.991814 110387200
2000-01-12 3.392857 3.410714 3.089286 3.113839 2.812385 244017200
2000-01-13 3.374439 3.526786 3.303571 3.455357 3.120841 258171200
2000-01-14 3.571429 3.651786 3.549107 3.587054 3.239787 97594000
2000-01-18 3.607143 3.785714 3.587054 3.712054 3.352686 114794400
2000-01-19 3.772321 3.883929 3.691964 3.805804 3.437360 149410800
2000-01-20 4.125000 4.339286 4.053571 4.053571 3.661140 457783200
2000-01-21 4.080357 4.080357 3.935268 3.975446 3.590579 123981200
2000-01-24 3.872768 4.026786 3.754464 3.794643 3.427280 110219200
2000-01-25 3.750000 4.040179 3.656250 4.008929 3.620820 124286400
2000-01-26 3.928571 4.078125 3.919643 3.935268 3.554291 91789600
2000-01-27 3.886161 4.035714 3.821429 3.928571 3.548242 85036000
2000-01-28 3.863839 3.959821 3.593750 3.629464 3.278092 105837200
2000-01-31 3.607143 3.709821 3.375000 3.705357 3.346637 175420000
2000-02-01 3.714286 3.750000 3.571429 3.580357 3.233739 79508800
2000-02-02 3.598214 3.647321 3.464286 3.529018 3.187370 116048800
2000-02-03 3.582589 3.723214 3.580357 3.689732 3.332525 118798400
2000-02-04 3.712054 3.928571 3.700893 3.857143 3.483729 106330000
2000-02-07 3.857143 4.080357 3.783482 4.073661 3.679286 110266800
2000-02-08 4.071429 4.147321 3.973214 4.102679 3.705494 102160800
2000-02-09 4.075893 4.183036 4.015625 4.022