pandas索引的重新设置
重新设置索引有三种方法
reset_index,set_index以及reindex
常用的是前两种
reset_index: 重新设置索引列,索引为: [0,1,2,3,4,5,6…]
set_index: 可以设置特定列为索引
首先,构建dataframe
import pandas as pd
d={'gene':{'a':'gene1','b':'gene2','c':'gene3','d':'gene4'},'expression':{'a':'low:0','b':'mid:3','c':'mid:4','d':'high:9'},'description':{'a':'transposon element','b':'nuclear genes','c':'retrotransposon','d':'unknown'}}
df=pd.DataFrame(d)
此时,df的索引列为[a,b,c,d]
print(df)
gene expression description
a gene1 low:0 transposon element
b gene2 mid:3 nuclear genes
c gene3 mid:4 retrotransposon
d gene4 high:9 unknown
print(df.index)
Index(['a', 'b', 'c', 'd'], dtype='object')
设置常规索引:reset_index
df1=df.reset_index()
print(df1)
index gene expression description
0 a gene1 low:0 transposon element
1 b gene2 mid:3 nuclear genes
2 c gene3 mid:4 retrotransposon
3 d gene4 high:9 unknown
之前的索引[a,b,c,d]成为了新的一列
如果不想保留之前的索引,可以加上参数:drop=True
df1=df.reset_index(drop=True)
print(df1)
gene expression description
0 gene1 low:0 transposon element
1 gene2 mid:3 nuclear genes
2 gene3 mid:4 retrotransposon
3 gene4 high:9 unknown
设置df中的特定列为索引set_index
df1=df.set_index('gene')
print(df1)
expression description
gene
gene1 low:0 transposon element
gene2 mid:3 nuclear genes
gene3 mid:4 retrotransposon
gene4 high:9 unknown
之前的索引[a,b,c,d]已经没有了,即便是加上drop=False 参数也不行
df1=df.set_index('gene',drop=False)
print(df1)
gene expression description
gene
gene1 gene1 low:0 transposon element
gene2 gene2 mid:3 nuclear genes
gene3 gene3 mid:4 retrotransposon
gene4 gene4 high:9 unknown
合理使用这两种方法,索引问题就解决了。