数据分析第三篇——Pandas的基本数据操作(赋值、索引与排序)

3.2 基本数据操作

  • 3.2.1 索引操作
  • 3.2.2 赋值操作
  • 3.2.3 排序
    • DataFrame
    • Series
读取数据
import pandas as pd
data = pd.read_excel('stock.xls', header=0, index_col=0, encoding='utf-8')
data
closeopenhighlowpre_closechangepct_chgvolamount
trade_date
202003132887.42652804.23222910.88122799.98412923.4856-36.0591-1.2334366450436.03.930197e+08
202003122923.48562936.01632944.46512906.28382968.5174-45.0318-1.5170307778457.03.282092e+08
202003112968.51743001.76163010.02862968.51742996.7618-28.2444-0.9425352470970.03.787666e+08
202003102996.76182918.93473000.29632904.79892943.290753.47111.8167393296648.04.250172e+08
202003092943.29072987.18052989.20512940.71383034.5113-91.2206-3.0061414560736.04.381439e+08
..............................
19910719136.7000137.6600138.5400136.6600137.1700-0.4700-0.342610823.05.242826e+03
19910718137.1700137.1700137.1700135.8100135.81001.36001.0014847.04.644160e+02
19910717135.8100135.8100135.8100135.3900134.47001.34000.9965660.03.975240e+02
19910716134.4700134.3900134.4700133.1400133.14001.33000.99892796.01.328502e+03
19910715133.1400133.9000134.1000131.8700132.80000.34000.256011938.05.534900e+03

7002 rows × 9 columns

data.drop(labels=['pre_close'], axis=1, inplace=True)
data
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42652804.23222910.88122799.9841-36.0591-1.2334366450436.03.930197e+08
202003122923.48562936.01632944.46512906.2838-45.0318-1.5170307778457.03.282092e+08
202003112968.51743001.76163010.02862968.5174-28.2444-0.9425352470970.03.787666e+08
202003102996.76182918.93473000.29632904.798953.47111.8167393296648.04.250172e+08
202003092943.29072987.18052989.20512940.7138-91.2206-3.0061414560736.04.381439e+08
...........................
19910719136.7000137.6600138.5400136.6600-0.4700-0.342610823.05.242826e+03
19910718137.1700137.1700137.1700135.81001.36001.0014847.04.644160e+02
19910717135.8100135.8100135.8100135.39001.34000.9965660.03.975240e+02
19910716134.4700134.3900134.4700133.14001.33000.99892796.01.328502e+03
19910715133.1400133.9000134.1000131.87000.34000.256011938.05.534900e+03

7002 rows × 8 columns

3.2.1 索引操作

  1. 直接索引(直接使用行列索引——先列后行)
    • 先列后行
  2. 按名字索引(按标签索引)
    • DataFrame.loc[]
  3. 按数字索引(按位置索引)
    • DataFrame.iloc[]
  4. 组合索引(标签和位置混用)——已经被弃用
    • DataFrame.ix[0:4, [‘open’, ‘close’, ‘high’, ‘low’]]
1. 直接索引(直接使用行列索引——先列后行)
data.head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42652804.23222910.88122799.9841-36.0591-1.2334366450436.0393019665.2
202003122923.48562936.01632944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51743001.76163010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76182918.93473000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29072987.18052989.20512940.7138-91.2206-3.0061414560736.0438143854.6
# 直接索引——先列后行
data['open'][20200313] # 第一次索引得到一个Series,第二次从Series中索引数据
2804.2322
2. 按名字(标签)索引
data.loc[20200313]['open']
2804.2322
data.loc[20200313, 'open']
2804.2322
3. 按数字索引(按位置索引)
data.iloc[0, 1]
2804.2322
4. 组合索引
  • ix已经被废弃
  • 可以使用loc和iloc来达到相同的效果
    • DataFrame.columns.get_indexer():获得列标签索引对应的位置
    • DataFrame.index.get_indexer():获得行标签索引对应的位置
data.loc[data.index[0:4], ['open', 'close', 'high', 'low']]
openclosehighlow
trade_date
202003132804.23222887.42652910.88122799.9841
202003122936.01632923.48562944.46512906.2838
202003113001.76162968.51743010.02862968.5174
202003102918.93472996.76183000.29632904.7989
data.iloc[0:4, data.columns.get_indexer(['open', 'close', 'high', 'low'])]
openclosehighlow
trade_date
202003132804.23222887.42652910.88122799.9841
202003122936.01632923.48562944.46512906.2838
202003113001.76162968.51743010.02862968.5174
202003102918.93472996.76183000.29632904.7989
data.index.get_indexer([20200310])
array([3], dtype=int64)
data.columns.get_indexer(['open'])
array([1], dtype=int64)

3.2.2 赋值操作

data.head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42652804.23222910.88122799.9841-36.0591-1.2334366450436.0393019665.2
202003122923.48562936.01632944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51743001.76163010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76182918.93473000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29072987.18052989.20512940.7138-91.2206-3.0061414560736.0438143854.6
data['open']
trade_date
20200313    2804.2322
20200312    2936.0163
20200311    3001.7616
20200310    2918.9347
20200309    2987.1805
              ...    
19910719     137.6600
19910718     137.1700
19910717     135.8100
19910716     134.3900
19910715     133.9000
Name: open, Length: 7002, dtype: float64
data.open
trade_date
20200313    2804.2322
20200312    2936.0163
20200311    3001.7616
20200310    2918.9347
20200309    2987.1805
              ...    
19910719     137.6600
19910718     137.1700
19910717     135.8100
19910716     134.3900
19910715     133.9000
Name: open, Length: 7002, dtype: float64
data.open = 100
data.head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42651002910.88122799.9841-36.0591-1.2334366450436.0393019665.2
202003122923.48561002944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51741003010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76181003000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29071002989.20512940.7138-91.2206-3.0061414560736.0438143854.6
data.iloc[0, 1] = 222
data.head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42652222910.88122799.9841-36.0591-1.2334366450436.0393019665.2
202003122923.48561002944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51741003010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76181003000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29071002989.20512940.7138-91.2206-3.0061414560736.0438143854.6

3.2.3 排序

排序有两种形式,一种对内容进行排序,一种对索引进行排序

  • 对内容排序
    • DataFrame
    • Series
  • 对索引排序
    • DataFrame
    • Series
data.head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42652222910.88122799.9841-36.0591-1.2334366450436.0393019665.2
202003122923.48561002944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51741003010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76181003000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29071002989.20512940.7138-91.2206-3.0061414560736.0438143854.6
DataFrame
  • 使用df.sort_values(by=, ascending=True)对内容进行排序
    • 单个键或多个键进行排序,默认升序
    • ascending=False:降序
    • ascending=True:升序
按照单个字段排序
data.sort_values(by='high').head()
closeopenhighlowchangepct_chgvolamount
trade_date
19910715133.14100134.10131.870.340.256011938.05534.900
19910716134.47100134.47133.141.330.99892796.01328.502
19910717135.81100135.81135.391.340.9965660.0397.524
19910718137.17100137.17135.811.361.0014847.0464.416
19910722138.07100138.07136.701.371.00222764.01423.205
data.sort_values(by='high', ascending=False).head()
closeopenhighlowchangepct_chgvolamount
trade_date
200710166092.0571006124.0446040.71361.9711.027788152159.01.635477e+08
200710176036.2811006088.8925982.203-55.776-0.915671312588.01.337010e+08
200710185825.2821006055.4735804.973-210.999-3.495569805425.01.303470e+08
200710156030.0861006039.0425866.131126.8222.148398707442.01.934696e+08
200711015914.2851006005.1315912.502-40.480-0.679864885103.01.107685e+08
按照多个字段排序

先按照第一个字段排序,对第一个字段相同的,再按照第二个字段进行排序

data.sort_values(by=['high', 'pct_chg'], ascending=False).head()
closeopenhighlowchangepct_chgvolamount
trade_date
200710166092.0571006124.0446040.71361.9711.027788152159.01.635477e+08
200710176036.2811006088.8925982.203-55.776-0.915671312588.01.337010e+08
200710185825.2821006055.4735804.973-210.999-3.495569805425.01.303470e+08
200710156030.0861006039.0425866.131126.8222.148398707442.01.934696e+08
200711015914.2851006005.1315912.502-40.480-0.679864885103.01.107685e+08
对索引进行排序
data.head()
closeopenhighlowchangepct_chgvolamount
trade_date
202003132887.42652222910.88122799.9841-36.0591-1.2334366450436.0393019665.2
202003122923.48561002944.46512906.2838-45.0318-1.5170307778457.0328209202.4
202003112968.51741003010.02862968.5174-28.2444-0.9425352470970.0378766619.0
202003102996.76181003000.29632904.798953.47111.8167393296648.0425017184.8
202003092943.29071002989.20512940.7138-91.2206-3.0061414560736.0438143854.6
data.sort_index().head()
closeopenhighlowchangepct_chgvolamount
trade_date
19910715133.14100134.10131.870.340.256011938.05534.900
19910716134.47100134.47133.141.330.99892796.01328.502
19910717135.81100135.81135.391.340.9965660.0397.524
19910718137.17100137.17135.811.361.0014847.0464.416
19910719136.70100138.54136.66-0.47-0.342610823.05242.826
Series
  • 使用Series.sort_values(ascending=True)对内容进行排序

对Series排序时,只有一列,不需要参数

sr = data['pct_chg']
sr
trade_date
20200313   -1.2334
20200312   -1.5170
20200311   -0.9425
20200310    1.8167
20200309   -3.0061
             ...  
19910719   -0.3426
19910718    1.0014
19910717    0.9965
19910716    0.9989
19910715    0.2560
Name: pct_chg, Length: 7002, dtype: float64
sr.sort_values()
trade_date
19950523    -16.3937
19931220    -13.0764
19940809    -12.6749
19930322    -11.7507
19921027    -11.1769
              ...   
19940805     21.3711
19921125     29.9041
19950518     30.9870
19940801     33.4571
19920521    105.2691
Name: pct_chg, Length: 7002, dtype: float64
sr.sort_values(ascending=False)
trade_date
19920521    105.2691
19940801     33.4571
19950518     30.9870
19921125     29.9041
19940805     21.3711
              ...   
19921027    -11.1769
19930322    -11.7507
19940809    -12.6749
19931220    -13.0764
19950523    -16.3937
Name: pct_chg, Length: 7002, dtype: float64
sr.sort_index()
trade_date
19910715    0.2560
19910716    0.9989
19910717    0.9965
19910718    1.0014
19910719   -0.3426
             ...  
20200309   -3.0061
20200310    1.8167
20200311   -0.9425
20200312   -1.5170
20200313   -1.2334
Name: pct_chg, Length: 7002, dtype: float64
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值