Pandas - 2.抽取行列

陈天睡懒觉.

已于 2022-05-21 16:22:08 修改

阅读量244

点赞数

分类专栏： Pandas 文章标签： python 数据挖掘数据分析

于 2022-05-21 14:45:05 首次发布

本文链接：https://blog.csdn.net/Aaron_ChenShenyu/article/details/124898145

版权

Pandas 专栏收录该内容

14 篇文章 1 订阅

订阅专栏

import pandas as pd
df = pd.read_csv('data/gapminder.tsv',sep='\t')
print(df.head())

       country continent  year  lifeExp       pop   gdpPercap
0  Afghanistan      Asia  1952   28.801   8425333  779.445314
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
2  Afghanistan      Asia  1962   31.997  10267083  853.100710
3  Afghanistan      Asia  1967   34.020  11537966  836.197138
4  Afghanistan      Asia  1972   36.088  13079460  739.981106

查看每一列的类型 df.dtypes或df.info()

object – string – 字符串
int64 – int – 整型
float64 – float – 浮点型
datetime64 – datetime – 时间

print(df.dtypes)

country       object
continent     object
year           int64
lifeExp      float64
pop            int64
gdpPercap    float64
dtype: object

查看行列信息

# df.shape shape是属性,加上括号会报错
print(df.shape) #(行数，列数)

(1704, 6)

获取列名和行索引

# df.columns (列名)
print(df.columns)
# df.index (行索引)
print(df.index)
print(list(df.index)[:10])

Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
RangeIndex(start=0, stop=1704, step=1)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

获取列子集

# 单列
continent = df.continent #只适合英文列名
continent = df['continent']
print(continent[:5])
# 多列
year_continent = df[['year','continent']]
print(year_continent[:5])

0    Asia
1    Asia
2    Asia
3    Asia
4    Asia
Name: continent, dtype: object
   year continent
0  1952      Asia
1  1957      Asia
2  1962      Asia
3  1967      Asia
4  1972      Asia

获取行子集

通过行名(loc)
用过行号(iloc)

# 取一行
sample = df.loc[0] # 因为只取1行输出Series
print(sample)
# 取多行
samples = df.loc[[0,100,200]]
print(samples)
# df.loc[-1]会报错，因为没有-1这个标签的行

# 取一行
sample = df.iloc[0] # 因为只取1行输出Series

# 取多行
samples = df.iloc[[0,100,200]]

# iloc可以输入数值
sample = df.iloc[-1]

country      Afghanistan
continent           Asia
year                1952
lifeExp           28.801
pop              8425333
gdpPercap        779.445
Name: 0, dtype: object
          country continent  year  lifeExp       pop   gdpPercap
0     Afghanistan      Asia  1952   28.801   8425333  779.445314
100    Bangladesh      Asia  1972   45.252  70759295  630.233627
200  Burkina Faso    Africa  1992   50.260   8878303  931.752773

混合，抽取行列子集

iloc/loc[,] 逗号左边是行，右边是列

# 获取整列
subset = df.loc[:,['year','pop']]
subset = df.iloc[:,[1,3,-1]] # 可以指定具体位置的列
subset = df.iloc[:,3:6] 
subset = df.iloc[:,:3] 

# 多行多列
subset = df.loc[[1,10,20],['year','pop']]
subset = df.iloc[[1,10,20],[1,-1]]
print(subset)

   continent    gdpPercap
1       Asia   820.853030
10      Asia   726.734055
20    Europe  2497.437901

陈天睡懒觉.

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas - 2.抽取行列

import pandas as pddf = pd.read_csv('data/gapminder.tsv',sep='\t')print(df.head()) country continent year lifeExp pop gdpPercap0 Afghanistan Asia 1952 28.801 8425333 779.4453141 Afghanistan Asia 1957 30.332 9240
复制链接

扫一扫