python便捷数据怎么获取_Python基本数据统计(二)---- 数据选择 & 简单统计与处理...

3. 数据显示

4. 数据选择

4.1 选择行

4.1.1 索引

In [189]: quotesdf.ix['2016-12-30']

Out[189]:

open7.364590e+01close7.376540e+01high7.402429e+01low7.352641e+01volume3.378800e+06Name:2016-12-30, dtype: float64

quotesdf.ix['index']

4.1.2 切片

In [190]: quotesdf['2016-12-30':'2017-01-02']

Out[190]:

open close high low volume2016-12-30 73.645904 73.765397 74.024287 73.52641 3378800.0

quotesdf['2016-12-30':'2017-01-02']

4.2 选择列

In [192]: djidf['code']

Out[192]:

0 AAPL1AXP2BA

...29XOM

Name: code, dtype: object

In [193]: djidf.code

Out[193]:

0 AAPL1AXP2BA

...29XOM

Name: code, dtype: object

djidf['code'] / djidf.code

4.3 行、列  - 标签label ( loc )

In [64]: djidf.loc[1:5,]

Out[64]:

code name lasttrade1 AXP American Express Company 76.200

2 BA The Boeing Company 159.530

3 CAT Caterpillar Inc. 94.580

4 CSCO 思科系?公司 30.100

5 CVX Chevron Corporation 115.600In [65]: djidf.loc[:,['code','lasttrade']]

Out[65]:

code lasttrade

0 AAPL120.000

1 AXP 76.200

2 BA 159.530

3 CAT 94.580...29 XOM 85.890

obj.loc[x : xx, ['y','yy'] ]

4.4 行和列的区域  - 标签label ( loc 和 at )

In [66]: djidf.loc[1:5,['code','lasttrade']]

Out[66]:

code lasttrade1 AXP 76.200

2 BA 159.530

3 CAT 94.580

4 CSCO 30.100

5 CVX 115.600In [67]: djidf.loc[1,'lasttrade']

Out[67]: '76.200'In [68]: djidf.at[1,'lasttrade']

Out[68]: '76.200'

obj.loc[x, 'y']

4.5 行、列和区域 ( iloc 和 iat )

In [69]: djidf.loc[1:5,['code','lasttrade']]

Out[69]:

code lasttrade1 AXP 76.200

2 BA 159.530

3 CAT 94.580

4 CSCO 30.100

5 CVX 115.600In [70]: djidf.iloc[1:6,[0,2]]

Out[70]:

code lasttrade1 AXP 76.200

2 BA 159.530

3 CAT 94.580

4 CSCO 30.100

5 CVX 115.600In [71]: djidf.loc[1,'lasttrade']

Out[71]: '76.200'In [72]: djidf.at[1,'lasttrade']

Out[72]: '76.200'In [73]: djidf.iloc[1,2]

Out[73]: '76.200'In [74]: djidf.iat[1,2]

Out[74]: '76.200'

obj.iloc[ a:b, [c,d] ]

4.5 条件筛选

In [77]: quotesdf[quotesdf.index >= '2016-12-20']

Out[77]:

open close high low volume2016-12-20 74.681487 74.741230 75.179363 74.213482 3244900.0...2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0In [78]: quotesdf[(quotesdf.index >= '2016-12-20') & (quotesdf.close >=76)]

Out[78]:

open close high low volume2017-01-04 75.260002 76.260002 76.550003 75.059998 4635800.0...2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0

quotesdf[(quotesdf.index >= '2016-12-20') & (quotesdf.close >=76)]

5. 简单统计与处理

5.1 平均值(high列的平均值)

In [199]: quotesdf.high.mean()

Out[199]: 64.246265703871586

quotesdf.high.mean()

# 待解决问题:通过正则表达式得到是数据是string类型,无法计算mean值

2017/01/22 解决:djidf.lasttrade = djidf.lasttrade.astype(float)  # lasttrade列转为float类型,再赋给djidf

5.2 简单筛选(筛选出开盘价open大于75的数据的收盘价close)

In [204]: quotesdf[quotesdf.open>=75].close

Out[204]:2017-01-04 76.260002

2017-01-05 75.320000...2017-01-20 76.199997Name: close, dtype: float64

quotesdf[quotesdf.open>=75].close

5.3 简单统计(统计AXP公司近一年股票涨和跌的天数)

In [206]: len(quotesdf[quotesdf.close >quotesdf.open])

Out[206]: 135In [207]: len(quotesdf)-135Out[207]: 118

len(quotesdf[quotesdf.close > quotesdf.open])

5.4 简单统计(统计AXP公司近一年相邻两天收盘价的涨跌情况)

In [211]: importnumpy as np

In [212]: np.diff(quotesdf.close) #相邻两天,后一天减前一天的差

Out[212]:

array([-7.436398, -0.039243, 0.068674, -0.559202, -1.60893 , 0.608254,

...-0.030007, -0.259994, -0.020005, 0.89 , -0.799996, -0.490005])

In [213]: len(np.diff(quotesdf.close))

Out[213]: 252In [214]: len(quotesdf)

Out[214]: 253In [215]: status = np.sign(np.diff(quotesdf.close)) #提取差值array的正负符号

In [216]: status

Out[216]:

array([-1., -1., 1., -1., -1., 1., 1., -1., 1., 1., -1., -1., 1.,

...-1., -1., 1., -1., -1.])

In [218]: status[np.where(status==1.)].size #差值符号为正的个数,即涨的天数

Out[218]: 140In [219]: status[np.where(status==-1.)].size #差值符号为负的个数,即跌的天数

Out[219]: 112

np.sign(np.diff(quotesdf.close))

5.5 排序 (按最近一次成交价对30只股票进行排序,列出前三名的code)

In [313]: djidf.sort_values('lasttrade',axis=0,ascending=False)[:3].code

Out[313]:9GS17MMM11IBM

Name: code, dtype: object

In [314]: djidf.sort_values('lasttrade',axis=0,ascending=False)[:3]

Out[314]:

code name lasttrade9 GS The Goldman Sachs Group, Inc. 232.20

17 MMM 3M Company 178.49

11 IBM International Business Machines Corporation 170.55In [315]: djidf.sort_values('lasttrade',axis=0,ascending=False)

Out[315]:

code name lasttrade9 GS The Goldman Sachs Group, Inc. 232.20

17 MMM 3M Company 178.49

11 IBM International Business Machines Corporation 170.55

2 BA The Boeing Company 159.53...4 CSCO 思科系統公司 30.10

djidf.sort_values('lasttrade',axis=0,ascending=False)[:3].code

5.6 计数统计 (统计某月开盘天数)

In [319]: t = quotesdf[(quotesdf.index >= '2016-12-01') & (quotesdf.index < '2017-01-01')]

In [320]: t

Out[320]:

open close high low volume2016-12-01 72.142314 72.221976 72.460955 71.475161 4298200.0

2016-12-02 72.212016 71.554823 72.351429 71.126648 2959100.0...2016-12-30 73.645904 73.765397 74.024287 73.526410 3378800.0In [321]: len(t)

Out[321]: 21

quotesdf[(quotesdf.index >= '2016-12-01') & (quotesdf.index < '2017-01-01')]

5.7 计数统计 (统计近一年每个月的开盘天数)

In [322]: importtime

In [323]: listtemp =[]

In [324]: for i inrange(0,len(quotesdf)):

...: temp= time.strptime(quotesdf.index[i],"%Y-%m-%d") #取出每行索引,转为日期格式

...: listtemp.append(temp.tm_mon) #将日期的月份提取出来,追加到列表

...:

In [325]: printlisttemp

[1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [326]: tempdf = quotesdf.copy() #将quotesdf复制一份

In [327]: tempdf['month'] = listtemp #将日期list添加为df的一列month

In [328]: tempdf['month'].value_counts() #统计month列值的个数

8 23

6 22

3 22

12 21

11 21

10 21

9 21

5 21

4 21

7 20

2 20

1 20Name: month, dtype: int64

tempdf['month'].value_counts()

In [344]: printlisttemp

[1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [345]: adf =pd.DataFrame(listtemp)

In [346]: adf

Out[346]:

0

01

1 1

2 1

3 1

4 1

5 1

6 1

7 2

8 2

9 2

10 2

11 2

12 2

13 2

14 2

15 2

16 2

17 2

18 2

19 2

20 2

21 2

22 2

23 2

24 2

25 2

26 2

27 3

28 3

29 3.. ..223 12

224 12

225 12

226 12

227 12

228 12

229 12

230 12

231 12

232 12

233 12

234 12

235 12

236 12

237 12

238 12

239 12

240 1

241 1

242 1

243 1

244 1

245 1

246 1

247 1

248 1

249 1

250 1

251 1

252 1[253 rows x 1columns]

In [347]: adf[0].value_counts()

Out[347]:8 23

6 22

3 22

12 21

11 21

10 21

9 21

5 21

4 21

7 20

2 20

1 20Name: 0, dtype: int64

adf[0].value_counts()

6. Grouping

7. Merge

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值