Indexing:
行:
.loc[ ‘a’ ] : ‘a’为index名,也可以是数字。 # by label
.iloc[5] : 只能是数字,表示第几行。 # by position
列:
df['A']
df [['A', 'B']] # 注意index多行 传入列表
df.A # df.C 用于赋值,C不能是新增列, 新增列只能用df['C']
This will not modify df
because the column alignment is before value assignment.
In [9]: df[['A', 'B']]
Out[9]:
A B
2000-01-01 -0.282863 0.469112
2000-01-02 -0.173215 1.212112
In [10]: df.loc[:, ['B', 'A']] = df[['A', 'B']]
In [11]: df[['A', 'B']]
Out[11]:
A B
2000-01-01 -0.282863 0.469112
2000-01-02 -0.173215 1.212112
The correct way to swap column values is by using raw values:
In [12]: df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy()
In [13]: df[['A', 'B']]
Out[13]:
A B
2000-01-01 0.469112 -0.282863
2000-01-02 1.212112 -0.173215
Slicing:
df[::-1] --> Slicing rows, because it is convenient.
Slicing by label: loc
a single label: df.loc[5], df.loc['a']
a list of labels: df.loc['a', 'b', 'c']
a slice object with labels: df.loc['a' : 'f'], df. loc['a':]
a boolean array:
a callable:
df1.loc['d':, 'A':'C']
df1.loc[:, df1.loc['a'] > 0] # df.loc['a'] >0, Return a boolean array. True, False, False, False..
df1.iloc[[1, 3, 5], [1, 3]] -> [1,3,5] 行,[1,3] 列
df.iloc[1,1] -->第一行,第一列, ==df.iat[1,1]
!!!!!!!!!Selecting random samples: .sample()
1. replace -->带放回(可能重复)
s. sample(n=6, replace=True)
2. weights --> 每个样本的权重
s.sample(n=6, weights=[0.2,0.2,0.2,0.4])
3. sample column
df.sample(n=2, axis)
Setting
Setting with enlargement:
dfi = pd.DataFrame(np.arange(6).reshape(3, 2),
columns=['A', 'B'])
dfi
Out[136]:
A B
0 0 1
1 2 3
2 4 5
dfi.loc[:, 'C'] = dfi.loc[:, 'A']
dfi
Out[138]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
set_index(), reset_index(), .index=
set_index(): 把dataframe中的一列作为index, 输入:existing column name
reset_index(): inverse of set_index...把原来的index还原成column, 然后加入一个数字序列的index。
data.index = index: 直接设置index
data
Out[341]:
a b c d
0 bar one z 1.0
1 bar two y 2.0
2 foo one x 3.0
3 foo two w 4.0
indexed1 = data.set_index('c')
indexed1
Out[343]:
a b d
c
z bar one 1.0
y bar two 2.0
x foo one 3.0
w foo two 4.0
indexed2 = data.set_index(['a', 'b'])
indexed2
Out[345]:
c d
a b
bar one z 1.0
two y 2.0
foo one x 3.0
two w 4.0