python-DataFrame数据选择方法

DataFrame数据选择方法

将数据框看成字典

1. 把dataframe当作由若干个series对象构成的字典
import numpy as np
import pandas as pd
area = pd.Series({'California': 423967,'Texas': 695662,
                  'Nwe York':141297,'Florida':170312,'Illinois':149995})
pop = pd.Series({'California': 38332521,'Texas': 26448193,
                  'Nwe York':19651127,'Florida':19552860,'Illinois':12882135})
data = pd.DataFrame({'Area':area,'Pop':pop})
data
AreaPop
California42396738332521
Texas69566226448193
Nwe York14129719651127
Florida17031219552860
Illinois14999512882135
data.Pop
California    38332521
Texas         26448193
Nwe York      19651127
Florida       19552860
Illinois      12882135
Name: Pop, dtype: int64
data['Area']
California    423967
Texas         695662
Nwe York      141297
Florida       170312
Illinois      149995
Name: Area, dtype: int64
#添加新列
data['Density'] = data['Pop'] / data['Area']
data
AreaPopDensity
California4239673833252190.413926
Texas6956622644819338.018740
Nwe York14129719651127139.076746
Florida17031219552860114.806121
Illinois1499951288213585.883763
2. 将dataframe看作二维数组
#用values属性查看数组数据
data.values
array([[4.23967000e+05, 3.83325210e+07, 9.04139261e+01],
       [6.95662000e+05, 2.64481930e+07, 3.80187404e+01],
       [1.41297000e+05, 1.96511270e+07, 1.39076746e+02],
       [1.70312000e+05, 1.95528600e+07, 1.14806121e+02],
       [1.49995000e+05, 1.28821350e+07, 8.58837628e+01]])
data.T
CaliforniaTexasNwe YorkFloridaIllinois
Area4.239670e+056.956620e+051.412970e+051.703120e+051.499950e+05
Pop3.833252e+072.644819e+071.965113e+071.955286e+071.288214e+07
Density9.041393e+013.801874e+011.390767e+021.148061e+028.588376e+01
# 通过 pandas索引器 loc iloc ix
print(data.iloc[:1,:1])
print(data.loc[:'Texas',:'Pop']) # 按照列名索引
print(data.ix[:2,:'Pop']) #目前混合索引已经弃用
              Area
California  423967
              Area       Pop
California  423967  38332521
Texas       695662  26448193
              Area       Pop
California  423967  38332521
Texas       695662  26448193


d:\python3.6\lib\site-packages\ipykernel_launcher.py:4: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  after removing the cwd from sys.path.
# loc 的花式索引
data.loc[data.Density > 100 ,['Pop','Density']]

# 索引后更改数据 
data.iloc[0,2] = 90
data
AreaPopDensity
California4239673833252190.000000
Texas6956622644819338.018740
Nwe York14129719651127139.076746
Florida17031219552860114.806121
Illinois1499951288213585.883763
3.其他取值方法
# 对数据直接过滤
data[data.Density > 100]
AreaPopDensity
Nwe York14129719651127139.076746
Florida17031219552860114.806121
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值