Notes of Pandas[1]：Indexing and selecting

最新推荐文章于 2024-09-17 23:15:58 发布

IvoryPillar

最新推荐文章于 2024-09-17 23:15:58 发布

阅读量87

点赞数

分类专栏： python pandas 文章标签： python big data

本文链接：https://blog.csdn.net/IvoryPillar/article/details/124419235

版权

python 同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

pandas

1 篇文章 0 订阅

订阅专栏

本文详细介绍了Python Pandas库中DataFrame的数据操作，包括使用.loc和.iloc进行行和列的定位，如何正确交换列值，以及通过切片和布尔数组进行数据选取。此外，还涵盖了随机抽样、设置数据和索引转换等关键概念。

摘要由CSDN通过智能技术生成

Indexing:

行：

.loc[ ‘a’ ] : ‘a’为index名，也可以是数字。 # by label

.iloc[5] : 只能是数字，表示第几行。 # by position

列：

df['A']

df [['A', 'B']] # 注意index多行传入列表

df.A # df.C 用于赋值，C不能是新增列, 新增列只能用df['C']

This will not modify df because the column alignment is before value assignment.

In [9]: df[['A', 'B']]
Out[9]: 
                   A         B
2000-01-01 -0.282863  0.469112
2000-01-02 -0.173215  1.212112

In [10]: df.loc[:, ['B', 'A']] = df[['A', 'B']]

In [11]: df[['A', 'B']]
Out[11]: 
                   A         B
2000-01-01 -0.282863  0.469112
2000-01-02 -0.173215  1.212112

The correct way to swap column values is by using raw values:

In [12]: df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy()

In [13]: df[['A', 'B']]
Out[13]: 
                   A         B
2000-01-01  0.469112 -0.282863
2000-01-02  1.212112 -0.173215

Slicing:

df[::-1] --> Slicing rows, because it is convenient.

Slicing by label: loc

a single label: df.loc[5], df.loc['a']

a list of labels: df.loc['a', 'b', 'c']

a slice object with labels: df.loc['a' : 'f'], df. loc['a':]

a boolean array:

a callable:

df1.loc['d':, 'A':'C']

df1.loc[:, df1.loc['a'] > 0]
# df.loc['a'] >0,    Return a  boolean array.  True, False, False, False..

df1.iloc[[1, 3, 5], [1, 3]]     -> [1,3,5] 行，[1,3] 列

df.iloc[1,1] -->第一行，第一列， ==df.iat[1,1]

!!!!!!!!!Selecting random samples: .sample()

1. replace -->带放回(可能重复)

s. sample(n=6, replace=True)

2. weights --> 每个样本的权重

s.sample(n=6, weights=[0.2,0.2,0.2,0.4])

3. sample column

df.sample(n=2, axis)

Setting

Setting with enlargement:

dfi = pd.DataFrame(np.arange(6).reshape(3, 2),
                   columns=['A', 'B'])

dfi
Out[136]: 
   A  B
0  0  1
1  2  3
2  4  5

dfi.loc[:, 'C'] = dfi.loc[:, 'A']

dfi
Out[138]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4

set_index(), reset_index(), .index=

set_index(): 把dataframe中的一列作为index, 输入：existing column name

reset_index(): inverse of set_index...把原来的index还原成column, 然后加入一个数字序列的index。

data.index = index：直接设置index

data
Out[341]: 
     a    b  c    d
0  bar  one  z  1.0
1  bar  two  y  2.0
2  foo  one  x  3.0
3  foo  two  w  4.0

indexed1 = data.set_index('c')

indexed1
Out[343]: 
     a    b    d
c               
z  bar  one  1.0
y  bar  two  2.0
x  foo  one  3.0
w  foo  two  4.0

indexed2 = data.set_index(['a', 'b'])

indexed2
Out[345]: 
         c    d
a   b          
bar one  z  1.0
    two  y  2.0
foo one  x  3.0
    two  w  4.0