pandas Dataframe操作

最新推荐文章于 2024-09-20 10:34:12 发布

酒桶在你野区

最新推荐文章于 2024-09-20 10:34:12 发布

阅读量358

点赞数 1

分类专栏： data 文章标签： python 数据分析 pandas

本文链接：https://blog.csdn.net/s7777777777777/article/details/117440612

版权

data 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

import pandas as pd

1 创建空Dataframe

df = pd.DataFrame(columns=('a', 'b', 'c'))
df

	a	b	c

2 添加一行Series数据

先创建Series

s1 = pd.Series({'a': 1, 'b': 2, 'c': 3})
s1

a    1
b    2
c    3
dtype: int64

s2 = pd.Series({'a': 4, 'b': 5, 'c': 6}, name='new')
s2

a    4
b    5
c    6
Name: new, dtype: int64

一定要用等号赋值才有效果

df = df.append(s1, ignore_index=True)  # Series没有name时，ignore_index=True
df

	a	b	c
0	1	2	3

df = df.append(s2)
df

	a	b	c
0	1	2	3
new	4	5	6

3 获取列数据中某个值的索引

b列中值为4的index为new

df[(df['b'] == 5)].index[0]

'new'

len(df[(df['b'] == 4)].index)

如果没有这个值

len(df[(df['b'] == 100)].index)

4 选取某些列（行）

output = df.loc(axis=1)['a', 'c']  # axis=0选取行  loc通过标签索引
output

	a	c
0	1	3
new	4	6

output = df.iloc(axis=1)[0:2]  # iloc通过下标索引
output

	a	b
0	1	2
new	4	5

当只选了一列（行）时，返回Series

将一列的值转化为列表

df.loc(axis=1)['a'].values.tolist()

[1, 4]

5 对列数据统一处理

df['a'] = df['a'].apply(lambda x: x*4)  # 可以使用其它函数
df

	a	b	c
0	4	2	3
new	16	5	6

6 使用索引

6.1 根据下标索引

df.iloc[0, 0]

df.iloc[0, 0] = df.iloc[1, 0] + 1
df

	a	b	c
0	17	2	3
new	16	5	6

6.2 根据标签索引

df.loc['new', 'a']

6.3 根据下标和标签索引

df.loc[0, 'a']

7 修改索引

7.1 设置列标签

df.columns = ['a', 'c', 'b']
df

	a	c	b
0	17	2	3
new	16	5	6

7.2 设置index

df.index = [2, 1]
df

	a	c	b
2	17	2	3
1	16	5	6

7.3 重设index（从0开始）

df = df.reset_index(drop=True)  # drop=True表示不保留原来的index
df

	a	c	b
0	17	2	3
1	16	5	6

7.4 按照某一列的值排序

df = df.sort_values(by='a')  # 按照a列的值从小到大排序
df

	a	c	b
1	16	5	6
0	17	2	3

8 滑动窗口rolling

对某一列进行滑动窗口操作
先添加一行数据

df = df.append(pd.Series({'a': 4, 'b': 7, 'c': 8}), ignore_index=True)
df

	a	c	b
0	16	5	6
1	17	2	3
2	4	8	7

window = 2  # 窗口大小为2
output = df['c'].rolling(window).mean()  # 取平均值  标签默认取窗口右端
output

0    NaN
1    3.5
2    5.0
Name: c, dtype: float64

window = 3
output = df['c'].rolling(window).mean(center=True)  #标签取窗口中间
output

0    NaN
1    5.0
2    NaN
Name: c, dtype: float64

9 计算标准差

每一列计算标准差

df.std(axis=0)

a    7.234178
c    3.000000
b    2.081666
dtype: float64

10 获取行数和列数

df.shape[0]  # 行数

df.shape[1]  # 列数

11 参考资料

Pandas 根据值查索引
 Pandas中loc和iloc函数用法详解（源码+实例）
pandas: transfer Int64Index to int 将Int64Index转换为int类型
 python——修改Dataframe列名的两种方法
 如何将dataframe单列的int类型转化为str类型
 Pandas对一列做运算
 pandas.DataFrame.reset_index
Is it possible to append Series to rows of DataFrame without making a list first?
如何获取Dataframe的行数和列数
 Pandas把dataframe或series转换成list
Pandas的Series的创建
 pd.Series转list并读取值
 Pandas入门之rolling滑动窗口
 Pandas DataFrame.std()函数