Pandas入门——操作数据

dahehe_

于 2021-09-14 17:11:42 发布

阅读量306

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/DAHWHG/article/details/120279645

版权

python 专栏收录该内容

23 篇文章 1 订阅

订阅专栏

1、Series

1、1 向量操作

1、1、1 加减乘除

print(s * 2) # s + s
print(np.exp(s))

a    0
b    2
c    4
dtype: int64

a    1.000000
b    2.718282
c    7.389056
dtype: float64

Pandas中数据间的操作会自动对齐标签，如果数据不匹配则返回NaN。

print(s[1:] + s[:-1])

a    NaN
b    2.0
c    NaN
dtype: float64

1、1、2 取整与取模

divmod()，该函数同时执行向下取整除与模运算，返回两个与左侧类型相同的元组。

1、1、2、1 常量

s = pd.Series([1, 3, 5, 7], index=['a', 'b', 'c', 'd'])
div, rem = divmod(s, 3)

a    0
b    1
c    1
d    2
dtype: int64
a    1
b    0
c    2
d    1
dtype: int64

1、1、2、2 元素

div, rem = divmod(s, [1, 2, 3, 4])

a    1
b    1
c    1
d    1
dtype: int64
a    0
b    1
c    2
d    3
dtype: int64

1、1、3 等效

1、1、3、1 标量

s = pd.Series([20, 9, 6], index=['a', 'b', 'c'])
print(s == 20)

a     True
b    False
c    False
dtype: bool

1、1、3、2 等长对象

print(s == [20, 9, 6])

a    True
b    True
c    True
dtype: bool

1、2 提取数组

需要注意的是，Series并不是列表，只是类似列表，将其理解为Dataframe的一部分更为合适。

1、2、1 Series.array

<PandasArray>
[20, 9, 6]
Length: 3, dtype: int64

1、2、2 Series.to_numpy()

[20  9  6]

2、DataFrame

2、1 向量操作

df['flag'] = df['one'] > 1
print(1 / df)

   one  two   flag
a    0    4  False
b    1    9  False
c    2    3   True

   one       two
a  0.2  0.250000
b  1.0  0.111111
c  0.5  0.333333

可以自动对齐数据，生成的结果是行标签与列标签的并集。

df1 = pd.DataFrame(np.random.rand(4, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.rand(5, 3), columns=['A', 'B', 'C'])
print(df1 + df2)

          A         B         C   D
0  0.977221  0.873670  1.729117 NaN
1  1.192113  1.164575  1.756316 NaN
2  1.732756  0.926804  1.498495 NaN
3  1.526619  0.927637  0.876293 NaN
4       NaN       NaN       NaN NaN

2、1、1 加减乘除

add()：加法、sub()：减法、mul()：乘法、div()：除法、radd()：二进制加法、rsub()：二进制减法

2、1、1、1 常量

df = pd.DataFrame({'one': [1, 2, 3, 4], 'two': [5, 6, 7, 8]}, index=['a', 'b', 'c', 'd']
print(df.sub(1, axis='columns'))

   one  two
a    0    4
b    1    5
c    2    6
d    3    7

2、1、1、2 Series

当axis=‘columns’时，Series轴标签与列对齐；当axis=‘index’时，Series轴标签与行对齐。注意：轴标签要与行/列一致。

与列对齐：

df = pd.DataFrame({'one': [1, 2, 3, 4], 'two': [5, 6, 7, 8]}, index=['a', 'b', 'c', 'd'])
s = pd.Series([1, 1], index=['one', 'two'])
print(df.sub(s, axis='columns'))

与行对齐：

df = pd.DataFrame({'one': [1, 2, 3, 4], 'two': [5, 6, 7, 8]}, index=['a', 'b', 'c', 'd'])
s = pd.Series([1, 1, 1, 1], index=['a', 'b', 'c', 'd'])
print(df.sub(s, axis='index'))

结果：

   one  two
a    0    4
b    1    5
c    2    6
d    3    7

2、1、1、3 列表

与列对齐：

print(df.sub([1, 1], axis='columns'))

与行对齐：

print(df.sub([1, 1, 1, 1], axis='index'))

结果：

   one  two
a    0    4
b    1    5
c    2    6
d    3    7

2、1、1、4 Dataframe

df = pd.DataFrame({'one': [1, np.nan], 'two': [0, 0], 'three': [3, 5]}, index=['a', 'b'])
df1 = pd.DataFrame({'one': [1, 5], 'two': [0, 0], 'three': [3, 5]}, index=['a', 'b'])
df2 = df.add(df1, fill_value=2)

   one  two  three
a  2.0    0      6
b  7.0    0     10

2、1、2 比较

序号	缩写	英文	中文
1	eq	equal to	等于
2	ne	not equal to	不等于
3	lt	less than	小于
4	gt	greater than	大于
5	le	less than or equal to	小于等于
6	ge	greater than or equal to	大于等于

df = pd.DataFrame({'one': [1, 2, 3, 4], 'two': [5, 6, 7, 8]}, index=['a', 'b', 'c', 'd'])
df2 = pd.DataFrame({'one': [8, 4, 2, 6], 'two': [3, 9, 5, 1]}, index=['a', 'b', 'c', 'd'])
print(df2.gt(df))

     one    two
a   True  False
b   True   True
c  False  False
d   True  False

2、1、3 等效

2、1、3、1 不验证NaN

print(df + df == df * 2)

     one   two  three
a   True  True   True
b  False  True   True

2、1、3、2 验证NaN

print((df + df).equals(df * 2))

True

2、2 增添

2、2、1 指定插入位置

insert()函数无返回值。

df.insert(2, 'bar', df[:2]['one'])

   one  two  bar
a    5    4  5.0
b    1    9  1.0
c    2    3  NaN

2、2、2 未指定插入位置

（1）未使用函数

df['three'] = df['one'] * df['two']

注意行标签要对应。

df['three'] = pd.Series([20, 9, 6], index=['a', 'b', 'c'])

   one  two  three
a    5    4     20
b    1    9      9
c    2    3      6

（2）使用函数

df1 = df.assign(three = df['one']+ df['two']) #df['one']等效于df.one

df1 = df.assign(three = lambda x: x['one'] + x['two']) #x['one]等效于x.one

df1 = df.assign(three = [9, 10, 5])

注意行标签要相对应。

df1 = df.assign(three = pd.DataFrame([9, 10, 5], index=['a', 'b', 'c']))

   one  two  three
a    5    4      9
b    1    9     10
c    2    3      5

2、3 删除

2、3、1 del

del df['two']

   one  three
a    5     20
b    1      9
c    2      6

2、3、2 drop()

df = pd.DataFrame({'one': [2, 6, 4], 'two': [5, 7, 10], 'three': [4, 8, 8]}, index=['a', 'b', 'c'])
print(df.drop(['one'], axis=1))

   one  two  three
a    2    5      4
b    6    7      8
c    4   10      8

#删除后：
   two  three
a    5      4
b    7      8
c   10      8

2、4 合并

df = pd.DataFrame({'one': [1, np.nan], 'two': [6, 0], 'three': [3, 5]}, index=['a', 'b'])
df1 = pd.DataFrame({'one': [1, 5, 7], 'two': [0, np.nan, 8], 'three': [3, 5, 4]}, index=['a', 'b', 'c'])
print(df.combine_first(df1))

   one  two  three
a  1.0    6      3
b  NaN    0      5

   one  two  three
a    1  0.0      3
b    5  NaN      5
c    7  8.0      4

   one  two  three
a  1.0  6.0      3
b  5.0  0.0      5
c  7.0  8.0      4

dahehe_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas入门——操作数据

1、Series1、1 向量操作print(df * 2) # df + dfprint(np.exp(df))a 0b 2c 4dtype: int64a 1.000000b 2.718282c 7.389056dtype: float64Pandas中数据间的操作会自动对齐标签，如果数据不匹配则返回NaN。print(df[1:] + df[:-1])a NaNb 2.0c NaNdtyp.
复制链接

扫一扫