python便捷数据怎么获取_Python基本数据统计（二）---- 数据选择 & 简单统计与处理...

最新推荐文章于 2024-09-04 07:00:00 发布

weixin_39973410

最新推荐文章于 2024-09-04 07:00:00 发布

阅读量104

点赞数

文章标签： python便捷数据怎么获取

3. 数据显示

4. 数据选择

4.1 选择行

4.1.1 索引

In [189]: quotesdf.ix['2016-12-30']

Out[189]:

open7.364590e+01close7.376540e+01high7.402429e+01low7.352641e+01volume3.378800e+06Name:2016-12-30, dtype: float64

quotesdf.ix['index']

4.1.2 切片

In [190]: quotesdf['2016-12-30':'2017-01-02']

Out[190]:

open close high low volume2016-12-30 73.645904 73.765397 74.024287 73.52641 3378800.0

quotesdf['2016-12-30':'2017-01-02']

4.2 选择列

In [192]: djidf['code']

Out[192]:

0 AAPL1AXP2BA

...29XOM

Name: code, dtype: object

In [193]: djidf.code

Out[193]:

0 AAPL1AXP2BA

...29XOM

Name: code, dtype: object

djidf['code'] / djidf.code

4.3 行、列 - 标签label ( loc )

In [64]: djidf.loc[1:5,]

Out[64]:

code name lasttrade1 AXP American Express Company 76.200

2 BA The Boeing Company 159.530

3 CAT Caterpillar Inc. 94.580

4 CSCO 思科系?公司 30.100

5 CVX Chevron Corporation 115.600In [65]: djidf.loc[:,['code','lasttrade']]

Out[65]:

code lasttrade

0 AAPL120.000

1 AXP 76.200

2 BA 159.530

3 CAT 94.580...29 XOM 85.890

obj.loc[x : xx, ['y','yy'] ]

4.4 行和列的区域 - 标签label ( loc 和 at )

In [66]: djidf.loc[1:5,['code','lasttrade']]

Out[66]:

code lasttrade1 AXP 76.200

2 BA 159.530

3 CAT 94.580

4 CSCO 30.100

5 CVX 115.600In [67]: djidf.loc[1,'lasttrade']

Out[67]: '76.200'In [68]: djidf.at[1,'lasttrade']

Out[68]: '76.200'

obj.loc[x, 'y']

4.5 行、列和区域 ( iloc 和 iat )

In [69]: djidf.loc[1:5,['code','lasttrade']]

Out[69]:

code lasttrade1 AXP 76.200

2 BA 159.530

3 CAT 94.580

4 CSCO 30.100

5 CVX 115.600In [70]: djidf.iloc[1:6,[0,2]]

Out[70]:

code lasttrade1 AXP 76.200

2 BA 159.530

3 CAT 94.580

4 CSCO 30.100

5 CVX 115.600In [71]: djidf.loc[1,'lasttrade']

Out[71]: '76.200'In [72]: djidf.at[1,'lasttrade']

Out[72]: '76.200'In [73]: djidf.iloc[1,2]

Out[73]: '76.200'In [74]: djidf.iat[1,2]

Out[74]: '76.200'

obj.iloc[ a:b, [c,d] ]

4.5 条件筛选

In [77]: quotesdf[quotesdf.index >= '2016-12-20']

Out[77]:

open close high low volume2016-12-20 74.681487 74.741230 75.179363 74.213482 3244900.0...2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0In [78]: quotesdf[(quotesdf.index >= '2016-12-20') & (quotesdf.close >=76)]

Out[78]:

open close high low volume2017-01-04 75.260002 76.260002 76.550003 75.059998 4635800.0...2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0

quotesdf[(quotesdf.index >= '2016-12-20') & (quotesdf.close >=76)]

5. 简单统计与处理

5.1 平均值(high列的平均值)

In [199]: quotesdf.high.mean()

Out[199]: 64.246265703871586

quotesdf.high.mean()

# 待解决问题：通过正则表达式得到是数据是string类型，无法计算mean值

2017/01/22 解决：djidf.lasttrade = djidf.lasttrade.astype(float) # lasttrade列转为float类型，再赋给djidf

5.2 简单筛选(筛选出开盘价open大于75的数据的收盘价close)

In [204]: quotesdf[quotesdf.open>=75].close

Out[204]:2017-01-04 76.260002

2017-01-05 75.320000...2017-01-20 76.199997Name: close, dtype: float64

quotesdf[quotesdf.open>=75].close

5.3 简单统计(统计AXP公司近一年股票涨和跌的天数)

In [206]: len(quotesdf[quotesdf.close >quotesdf.open])

Out[206]: 135In [207]: len(quotesdf)-135Out[207]: 118

len(quotesdf[quotesdf.close > quotesdf.open])

5.4 简单统计(统计AXP公司近一年相邻两天收盘价的涨跌情况)

In [211]: importnumpy as np

In [212]: np.diff(quotesdf.close) #相邻两天，后一天减前一天的差

Out[212]:

array([-7.436398, -0.039243, 0.068674, -0.559202, -1.60893 , 0.608254,

...-0.030007, -0.259994, -0.020005, 0.89 , -0.799996, -0.490005])

In [213]: len(np.diff(quotesdf.close))

Out[213]: 252In [214]: len(quotesdf)

Out[214]: 253In [215]: status = np.sign(np.diff(quotesdf.close)) #提取差值array的正负符号

In [216]: status

Out[216]:

array([-1., -1., 1., -1., -1., 1., 1., -1., 1., 1., -1., -1., 1.,

...-1., -1., 1., -1., -1.])

In [218]: status[np.where(status==1.)].size #差值符号为正的个数，即涨的天数

Out[218]: 140In [219]: status[np.where(status==-1.)].size #差值符号为负的个数，即跌的天数

Out[219]: 112

np.sign(np.diff(quotesdf.close))

5.5 排序 (按最近一次成交价对30只股票进行排序，列出前三名的code)

In [313]: djidf.sort_values('lasttrade',axis=0,ascending=False)[:3].code

Out[313]:9GS17MMM11IBM

Name: code, dtype: object

In [314]: djidf.sort_values('lasttrade',axis=0,ascending=False)[:3]

Out[314]:

code name lasttrade9 GS The Goldman Sachs Group, Inc. 232.20

17 MMM 3M Company 178.49

11 IBM International Business Machines Corporation 170.55In [315]: djidf.sort_values('lasttrade',axis=0,ascending=False)

Out[315]:

code name lasttrade9 GS The Goldman Sachs Group, Inc. 232.20

17 MMM 3M Company 178.49

11 IBM International Business Machines Corporation 170.55

2 BA The Boeing Company 159.53...4 CSCO 思科系統公司 30.10

djidf.sort_values('lasttrade',axis=0,ascending=False)[:3].code

5.6 计数统计 (统计某月开盘天数)

In [319]: t = quotesdf[(quotesdf.index >= '2016-12-01') & (quotesdf.index < '2017-01-01')]

In [320]: t

Out[320]:

open close high low volume2016-12-01 72.142314 72.221976 72.460955 71.475161 4298200.0

2016-12-02 72.212016 71.554823 72.351429 71.126648 2959100.0...2016-12-30 73.645904 73.765397 74.024287 73.526410 3378800.0In [321]: len(t)

Out[321]: 21

quotesdf[(quotesdf.index >= '2016-12-01') & (quotesdf.index < '2017-01-01')]

5.7 计数统计 (统计近一年每个月的开盘天数)

In [322]: importtime

In [323]: listtemp =[]

In [324]: for i inrange(0,len(quotesdf)):

...: temp= time.strptime(quotesdf.index[i],"%Y-%m-%d") #取出每行索引，转为日期格式

...: listtemp.append(temp.tm_mon) #将日期的月份提取出来，追加到列表

...:

In [325]: printlisttemp

[1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [326]: tempdf = quotesdf.copy() #将quotesdf复制一份

In [327]: tempdf['month'] = listtemp #将日期list添加为df的一列month

In [328]: tempdf['month'].value_counts() #统计month列值的个数

8 23

6 22

3 22

12 21

11 21

10 21

9 21

5 21

4 21