【Python】DataFrame系列2之常用方法和函数

最新推荐文章于 2024-06-22 16:54:28 发布

J小白Y

最新推荐文章于 2024-06-22 16:54:28 发布

阅读量1.1k

点赞数

分类专栏： Python小白的进阶之路文章标签： python

本文链接：https://blog.csdn.net/Jarry_cm/article/details/105371335

版权

Python小白的进阶之路专栏收录该内容

54 篇文章 19 订阅

订阅专栏

本篇是DataFrame系列博文的，常用的基础方法。

以下面dataframe为案例：

1.查看有哪些列

data.columns
Out[183]: Index(['a', 'b', 'c', 'd'], dtype='object')

2.查看索引

data.index
Out[185]: RangeIndex(start=0, stop=2, step=1)

3.查看每列的数据类型

data.dtypes
Out[186]: 
a    int64
b    int64
c    int64
d    int64
dtype: object

4.查看各列数据的数据类型

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
a    2 non-null int64
b    2 non-null int64
c    2 non-null int64
d    2 non-null int64
dtypes: int64(4)
memory usage: 144.0 bytes

5.查看行列的大小

data.shape
Out[190]: (2, 4)

6.查看总计有多少个单元格

data.size
Out[191]: 8

7.查看行数

len(data)
Out[192]: 2

8.返回前几行

如果括号中不填，默认展示前5行

data.head(1)
Out[196]: 
    a   b   c   d
0  11  21  31  41

9.返回倒数几行

如果括号中不填，默认展示后5行

data.tail(1)
Out[195]: 
    a   b   c   d
1  12  22  32  42

10.修改列名

data.rename(columns={"a":"A"},inplace=True)
data
Out[198]: 
    A   b   c   d
0  11  21  31  41
1  12  22  32  42

11.替换值

注意：这种方法，是不对原data的值进行替换的，可以看到，data中11还是11

data.replace({11:88})
Out[199]: 
    A   b   c   d
0  88  21  31  41
1  12  22  32  42
data
Out[200]: 
    A   b   c   d
0  11  21  31  41
1  12  22  32  42

12.指定列数据替换

data['A'].replace(11,88)
Out[201]: 
0    88
1    12
Name: A, dtype: int64
data
Out[202]: 
    A   b   c   d
0  11  21  31  41
1  12  22  32  42

13.查看列为“A”中元素的个数

data.A.value_counts()
Out[204]: 
11    1
12    1
Name: A, dtype: int64

14.根据指定列进行排序

不指定如何排序，默认是升序。

data.sort_values(by=['A','b'])
Out[205]: 
    A   b   c   d
0  11  21  31  41
1  12  22  32  42

也可以对指定列排序。

A降序，b升序可以如下操作：

data.sort_values(by=['A','b'],axis=0,ascending=[False,True])
Out[206]: 
    A   b   c   d
1  12  22  32  42
0  11  21  31  41

15.查看描述性统计的相关信息

有两种方式，常用的是第一种。

data.describe()
Out[207]: 
               A          b          c          d
count   2.000000   2.000000   2.000000   2.000000
mean   11.500000  21.500000  31.500000  41.500000
std     0.707107   0.707107   0.707107   0.707107
min    11.000000  21.000000  31.000000  41.000000
25%    11.250000  21.250000  31.250000  41.250000
50%    11.500000  21.500000  31.500000  41.500000
75%    11.750000  21.750000  31.750000  41.750000
max    12.000000  22.000000  32.000000  42.000000

data.describe(include='all')
Out[209]: 
               A          b          c          d
count   2.000000   2.000000   2.000000   2.000000
mean   11.500000  21.500000  31.500000  41.500000
std     0.707107   0.707107   0.707107   0.707107
min    11.000000  21.000000  31.000000  41.000000
25%    11.250000  21.250000  31.250000  41.250000
50%    11.500000  21.500000  31.500000  41.500000
75%    11.750000  21.750000  31.750000  41.750000
max    12.000000  22.000000  32.000000  42.000000

17.最大、最小、均值等函数

不指定axis时，默认是对列做操作

data.max()
Out[210]: 
A    12
b    22
c    32
d    42
dtype: int64
data.min()
Out[211]: 
A    11
b    21
c    31
d    41
dtype: int64
data.sum()
Out[212]: 
A    23
b    43
c    63
d    83
dtype: int64
data.mean()
Out[213]: 
A    11.5
b    21.5
c    31.5
d    41.5
dtype: float64

指定axis=1，对行做操作

data.mean(axis=1)
Out[214]: 
0    26.0
1    27.0
dtype: float64
data.max(axis=1)
Out[215]: 
0    41
1    42
dtype: int64
data.sum(axis=1)
Out[216]: 
0    104
1    108
dtype: int64
data.min(axis=1)
Out[217]: 
0    11
1    12
dtype: int64

还有以下方式可以实现对列的操作：

data['A'].sum()
Out[218]: 23
data['A'].min()
Out[219]: 11
data['A'].max()
Out[220]: 12
data['A'].min()
Out[221]: 11

以上就是一些常用的方法和函数了~

J小白Y

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录