3. 数据显示
4. 数据选择
4.1 选择行
4.1.1 索引
In [189]: quotesdf.ix['2016-12-30']
Out[189]:
open7.364590e+01close7.376540e+01high7.402429e+01low7.352641e+01volume3.378800e+06Name:2016-12-30, dtype: float64
quotesdf.ix['index']
4.1.2 切片
In [190]: quotesdf['2016-12-30':'2017-01-02']
Out[190]:
open close high low volume2016-12-30 73.645904 73.765397 74.024287 73.52641 3378800.0
quotesdf['2016-12-30':'2017-01-02']
4.2 选择列
In [192]: djidf['code']
Out[192]:
0 AAPL1AXP2BA
...29XOM
Name: code, dtype: object
In [193]: djidf.code
Out[193]:
0 AAPL1AXP2BA
...29XOM
Name: code, dtype: object
djidf['code'] / djidf.code
4.3 行、列 - 标签label ( loc )
In [64]: djidf.loc[1:5,]
Out[64]:
code name lasttrade1 AXP American Express Company 76.200
2 BA The Boeing Company 159.530
3 CAT Caterpillar Inc. 94.580
4 CSCO 思科系?公司 30.100
5 CVX Chevron Corporation 115.600In [65]: djidf.loc[:,['code','lasttrade']]
Out[65]:
code lasttrade
0 AAPL120.000
1 AXP 76.200
2 BA 159.530
3 CAT 94.580...29 XOM 85.890
obj.loc[x : xx, ['y','yy'] ]
4.4 行和列的区域 - 标签label ( loc 和 at )
In [66]: djidf.loc[1:5,['code','lasttrade']]
Out[66]:
code lasttrade1 AXP 76.200
2 BA 159.530
3 CAT 94.580
4 CSCO 30.100
5 CVX 115.600In [67]: djidf.loc[1,'lasttrade']
Out[67]: '76.200'In [68]: djidf.at[1,'lasttrade']
Out[68]: '76.200'
obj.loc[x, 'y']
4.5 行、列和区域 ( iloc 和 iat )
In [69]: djidf.loc[1:5,['code','lasttrade']]
Out[69]:
code lasttrade1 AXP 76.200
2 BA 159.530
3 CAT 94.580
4 CSCO 30.100
5 CVX 115.600In [70]: djidf.iloc[1:6,[0,2]]
Out[70]:
code lasttrade1 AXP 76.200
2 BA 159.530
3 CAT 94.580
4 CSCO 30.100
5 CVX 115.600In [71]: djidf.loc[1,'lasttrade']
Out[71]: '76.200'In [72]: djidf.at[1,'lasttrade']
Out[72]: '76.200'In [73]: djidf.iloc[1,2]
Out[73]: '76.200'In [74]: djidf.iat[1,2]
Out[74]: '76.200'
obj.iloc[ a:b, [c,d] ]
4.5 条件筛选
In [77]: quotesdf[quotesdf.index >= '2016-12-20']
Out[77]:
open close high low volume2016-12-20 74.681487 74.741230 75.179363 74.213482 3244900.0...2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0In [78]: quotesdf[(quotesdf.index >= '2016-12-20') & (quotesdf.close >=76)]
Out[78]:
open close high low volume2017-01-04 75.260002 76.260002 76.550003 75.059998 4635800.0...2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0
quotesdf[(quotesdf.index >= '2016-12-20') & (quotesdf.close >=76)]
5. 简单统计与处理
5.1 平均值(high列的平均值)
In [199]: quotesdf.high.mean()
Out[199]: 64.246265703871586
quotesdf.high.mean()
# 待解决问题:通过正则表达式得到是数据是string类型,无法计算mean值
2017/01/22 解决:djidf.lasttrade = djidf.lasttrade.astype(float) # lasttrade列转为float类型,再赋给djidf
5.2 简单筛选(筛选出开盘价open大于75的数据的收盘价close)
In [204]: quotesdf[quotesdf.open>=75].close
Out[204]:2017-01-04 76.260002
2017-01-05 75.320000...2017-01-20 76.199997Name: close, dtype: float64
quotesdf[quotesdf.open>=75].close
5.3 简单统计(统计AXP公司近一年股票涨和跌的天数)
In [206]: len(quotesdf[quotesdf.close >quotesdf.open])
Out[206]: 135In [207]: len(quotesdf)-135Out[207]: 118
len(quotesdf[quotesdf.close > quotesdf.open])
5.4 简单统计(统计AXP公司近一年相邻两天收盘价的涨跌情况)
In [211]: importnumpy as np
In [212]: np.diff(quotesdf.close) #相邻两天,后一天减前一天的差
Out[212]:
array([-7.436398, -0.039243, 0.068674, -0.559202, -1.60893 , 0.608254,
...-0.030007, -0.259994, -0.020005, 0.89 , -0.799996, -0.490005])
In [213]: len(np.diff(quotesdf.close))
Out[213]: 252In [214]: len(quotesdf)
Out[214]: 253In [215]: status = np.sign(np.diff(quotesdf.close)) #提取差值array的正负符号
In [216]: status
Out[216]:
array([-1., -1., 1., -1., -1., 1., 1., -1., 1., 1., -1., -1., 1.,
...-1., -1., 1., -1., -1.])
In [218]: status[np.where(status==1.)].size #差值符号为正的个数,即涨的天数
Out[218]: 140In [219]: status[np.where(status==-1.)].size #差值符号为负的个数,即跌的天数
Out[219]: 112
np.sign(np.diff(quotesdf.close))
5.5 排序 (按最近一次成交价对30只股票进行排序,列出前三名的code)
In [313]: djidf.sort_values('lasttrade',axis=0,ascending=False)[:3].code
Out[313]:9GS17MMM11IBM
Name: code, dtype: object
In [314]: djidf.sort_values('lasttrade',axis=0,ascending=False)[:3]
Out[314]:
code name lasttrade9 GS The Goldman Sachs Group, Inc. 232.20
17 MMM 3M Company 178.49
11 IBM International Business Machines Corporation 170.55In [315]: djidf.sort_values('lasttrade',axis=0,ascending=False)
Out[315]:
code name lasttrade9 GS The Goldman Sachs Group, Inc. 232.20
17 MMM 3M Company 178.49
11 IBM International Business Machines Corporation 170.55
2 BA The Boeing Company 159.53...4 CSCO 思科系統公司 30.10
djidf.sort_values('lasttrade',axis=0,ascending=False)[:3].code
5.6 计数统计 (统计某月开盘天数)
In [319]: t = quotesdf[(quotesdf.index >= '2016-12-01') & (quotesdf.index < '2017-01-01')]
In [320]: t
Out[320]:
open close high low volume2016-12-01 72.142314 72.221976 72.460955 71.475161 4298200.0
2016-12-02 72.212016 71.554823 72.351429 71.126648 2959100.0...2016-12-30 73.645904 73.765397 74.024287 73.526410 3378800.0In [321]: len(t)
Out[321]: 21
quotesdf[(quotesdf.index >= '2016-12-01') & (quotesdf.index < '2017-01-01')]
5.7 计数统计 (统计近一年每个月的开盘天数)
In [322]: importtime
In [323]: listtemp =[]
In [324]: for i inrange(0,len(quotesdf)):
...: temp= time.strptime(quotesdf.index[i],"%Y-%m-%d") #取出每行索引,转为日期格式
...: listtemp.append(temp.tm_mon) #将日期的月份提取出来,追加到列表
...:
In [325]: printlisttemp
[1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
In [326]: tempdf = quotesdf.copy() #将quotesdf复制一份
In [327]: tempdf['month'] = listtemp #将日期list添加为df的一列month
In [328]: tempdf['month'].value_counts() #统计month列值的个数
8 23
6 22
3 22
12 21
11 21
10 21
9 21
5 21
4 21
7 20
2 20
1 20Name: month, dtype: int64
tempdf['month'].value_counts()
In [344]: printlisttemp
[1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
In [345]: adf =pd.DataFrame(listtemp)
In [346]: adf
Out[346]:
0
01
1 1
2 1
3 1
4 1
5 1
6 1
7 2
8 2
9 2
10 2
11 2
12 2
13 2
14 2
15 2
16 2
17 2
18 2
19 2
20 2
21 2
22 2
23 2
24 2
25 2
26 2
27 3
28 3
29 3.. ..223 12
224 12
225 12
226 12
227 12
228 12
229 12
230 12
231 12
232 12
233 12
234 12
235 12
236 12
237 12
238 12
239 12
240 1
241 1
242 1
243 1
244 1
245 1
246 1
247 1
248 1
249 1
250 1
251 1
252 1[253 rows x 1columns]
In [347]: adf[0].value_counts()
Out[347]:8 23
6 22
3 22
12 21
11 21
10 21
9 21
5 21
4 21
7 20
2 20
1 20Name: 0, dtype: int64
adf[0].value_counts()
6. Grouping
7. Merge