pandas index索引简易教程

星火流明

已于 2023-06-03 12:03:00 修改

阅读量338

点赞数 1

分类专栏： pandas笔记文章标签： pandas python 开发语言

于 2023-05-27 13:03:28 首次发布

本文链接：https://blog.csdn.net/A2233776/article/details/130900076

版权

pandas笔记专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Index

pd.Index, pd.MultiIndex
df.index, df.column

Note: we could assign after more than one layer indexer

Set index

df.set_index()

`loc` element indexer

df.loc[*] is df.loc[row].
df.loc[*, *] is df.loc[row, col].
* is

only one element
list of elements
element slice
list of bool value
function (or lamda): input:df; output: belong above

`iloc` position indexer

df.iloc[*] is df.iloc[row].
df.iloc[*, *] is df.iloc[row, col].
* is

only one int
list of ints
int slice
list of bool value
function (or lamda): input:df; output: belong above

`query` function

Similar to eval function.
df.query(*), * is a legal query expression.
Legal expressions or registered variates:

all column names
‘`column name` == [1, 2, 3]’ (’==’ is equivalent to ‘in’)
‘@variate’ (variate defined outer query)

Random sample

df.sample(size, replace, weights)

size, a int.
replace, a bool.
weights, a list of weight.
replace indicates whether put element back.
weights is relative probability.

`get_indexer`

Search indexes in df.index.

df.index.get_indexer(target, method)

target, a pd.Index or a value as target.
method, [‘pad’, ‘backfill’, ‘nearest’], search method, default None.

`MultiIndex`

A value of a MultiIndex is a tuple.

We could not change the values of index directly.
get_level_values(n) get the n layer values.

`loc` and `iloc` indexer

Before we used indexer, we should sort the indexes.

df.sort_index(axis=0) # sort index
df.sort_index(axis=1) # sort column

Particularly, * cloud be:

(list_level_0, list_level_1, ...), Cartesian product of lists.
[index_tuple_1, index_tuple_2, ...], list of MultiIndex.
IndexSlice

`IndexSlice`

import pandas as pd
idx = pd.IndexSlice

Use idx to call the pd.IndexSlice function.
idx[*, *, ...], * corresponds to one layer MultiIndex.
The content of * is determined by outer indexer.

Construct method

pd.MultiIndex.from_tuples(tuples, names)
pd.MultiIndex.from_arrays(arrays, names)
pd.MultiIndex.from_product(lists, names)

from_tuples from a tuple list, a tuple is a MultiIndex.
from_arrays from a list of lists, a list is a layer index.
from_product uses Cartesian product of lists.

Change

axis is 1 or 0. 0 is row, 1 is column.

`map`

df.index.map(function)

Swap

df.swaplevel(level_1, level_2, axis).
df.reorder_levels(level_list, axis), reorders all levels according to level_list.

Drop

df.droplevel(droped, axis), droped could be a int or list of ints.

Change attributes

change name

df.rename_axis(index=change_index, column=change_column),
change_index and change_column could be a dict[to_replace: value], or a function.

change value

df.rename(index=change_index, level=change_level),
df.rename(column=change_column, level=change_level)
change_index and change_column could be a dict[to_replace: value], or a function.
change_level is the order num of changed layer.

set and reset

df.set_index(indexes, append)
indexes is only a column name or list.
append indicates whether keep origin indexes as inner.

df.set_axis(index, axis), index could be a pd.Index or pd.MultiIndex.

df.reset_index(indexes, drop),
drop indicates whether drop or add to column, default False

df.reindex(indexes), use new indexes and align based on index.
It is often used to resort index.
df_1.reindex_like(df_2) is similar.

Calculation between index

$\rm S_A.intersection(S_B) = \rm S_A \cap S_B \Leftrightarrow \rm \{x|x\in S_A\, and\, x\in S_B\}$
$\rm S_A.union(S_B) = \rm S_A \cup S_B \Leftrightarrow \rm \{x|x\in S_A\, or\, x\in S_B\}$
$\rm S_A.difference(S_B) = \rm S_A - S_B \Leftrightarrow \rm \{x|x\in S_A\, and\, x\notin S_B\}$
$\rm S_A.symmetric\_difference(S_B) = \rm S_A\triangle S_B\Leftrightarrow \rm \{x|x\in S_A\cup S_B - S_A\cap S_B\}$