pandas index索引简易教程

Index

pd.Index, pd.MultiIndex
df.index, df.column

Note: we could assign after more than one layer indexer

Set index

df.set_index()

loc element indexer

df.loc[*] is df.loc[row].
df.loc[*, *] is df.loc[row, col].
* is

  1. only one element
  2. list of elements
  3. element slice
  4. list of bool value
  5. function (or lamda): input:df; output: belong above

iloc position indexer

df.iloc[*] is df.iloc[row].
df.iloc[*, *] is df.iloc[row, col].
* is

  1. only one int
  2. list of ints
  3. int slice
  4. list of bool value
  5. function (or lamda): input:df; output: belong above

query function

Similar to eval function.
df.query(*), * is a legal query expression.
Legal expressions or registered variates:

  1. all column names
  2. ‘`column name` == [1, 2, 3]’ (’==’ is equivalent to ‘in’)
  3. ‘@variate’ (variate defined outer query)

Random sample

df.sample(size, replace, weights)

size, a int.
replace, a bool.
weights, a list of weight.
replace indicates whether put element back.
weights is relative probability.

get_indexer

Search indexes in df.index.

df.index.get_indexer(target, method)

target, a pd.Index or a value as target.
method, [‘pad’, ‘backfill’, ‘nearest’], search method, default None.

MultiIndex

A value of a MultiIndex is a tuple.

We could not change the values of index directly.
get_level_values(n) get the n layer values.

loc and iloc indexer

Before we used indexer, we should sort the indexes.

df.sort_index(axis=0) # sort index
df.sort_index(axis=1) # sort column

Particularly, * cloud be:

  1. (list_level_0, list_level_1, ...), Cartesian product of lists.
  2. [index_tuple_1, index_tuple_2, ...], list of MultiIndex.
  3. IndexSlice
IndexSlice
import pandas as pd
idx = pd.IndexSlice

Use idx to call the pd.IndexSlice function.
idx[*, *, ...], * corresponds to one layer MultiIndex.
The content of * is determined by outer indexer.

Construct method

pd.MultiIndex.from_tuples(tuples, names)
pd.MultiIndex.from_arrays(arrays, names)
pd.MultiIndex.from_product(lists, names)

from_tuples from a tuple list, a tuple is a MultiIndex.
from_arrays from a list of lists, a list is a layer index.
from_product uses Cartesian product of lists.

Change

axis is 1 or 0. 0 is row, 1 is column.

map

df.index.map(function)

Swap

df.swaplevel(level_1, level_2, axis).
df.reorder_levels(level_list, axis), reorders all levels according to level_list.

Drop

df.droplevel(droped, axis), droped could be a int or list of ints.

Change attributes
change name

df.rename_axis(index=change_index, column=change_column),
change_index and change_column could be a dict[to_replace: value], or a function.

change value

df.rename(index=change_index, level=change_level),
df.rename(column=change_column, level=change_level)
change_index and change_column could be a dict[to_replace: value], or a function.
change_level is the order num of changed layer.

set and reset

df.set_index(indexes, append)
indexes is only a column name or list.
append indicates whether keep origin indexes as inner.

df.set_axis(index, axis), index could be a pd.Index or pd.MultiIndex.

df.reset_index(indexes, drop),
drop indicates whether drop or add to column, default False

df.reindex(indexes), use new indexes and align based on index.
It is often used to resort index.
df_1.reindex_like(df_2) is similar.

Calculation between index

S A . i n t e r s e c t i o n ( S B ) = S A ∩ S B ⇔ { x ∣ x ∈ S A   a n d   x ∈ S B } \rm S_A.intersection(S_B) = \rm S_A \cap S_B \Leftrightarrow \rm \{x|x\in S_A\, and\, x\in S_B\} SA.intersection(SB)=SASB{xxSAandxSB}
S A . u n i o n ( S B ) = S A ∪ S B ⇔ { x ∣ x ∈ S A   o r   x ∈ S B } \rm S_A.union(S_B) = \rm S_A \cup S_B \Leftrightarrow \rm \{x|x\in S_A\, or\, x\in S_B\} SA.union(SB)=SASB{xxSAorxSB}
S A . d i f f e r e n c e ( S B ) = S A − S B ⇔ { x ∣ x ∈ S A   a n d   x ∉ S B } \rm S_A.difference(S_B) = \rm S_A - S_B \Leftrightarrow \rm \{x|x\in S_A\, and\, x\notin S_B\} SA.difference(SB)=SASB{xxSAandx/SB}
S A . s y m m e t r i c _ d i f f e r e n c e ( S B ) = S A △ S B ⇔ { x ∣ x ∈ S A ∪ S B − S A ∩ S B } \rm S_A.symmetric\_difference(S_B) = \rm S_A\triangle S_B\Leftrightarrow \rm \{x|x\in S_A\cup S_B - S_A\cap S_B\} SA.symmetric_difference(SB)=SASB{xxSASBSASB}

At first, we should use unique method to remove duplicates.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值