pandas.factorize用法
对标签进行数值编码(相当于label-encoding),按照标签在data中出现的先后顺序。
import pandas as pd
data = pd.DataFrame()
data['text'] = ['x1','x2','x3','xx']
data['label'] = ['花','草','树','木']
print(data)
'''
text label
0 x1 花
1 x2 草
2 x3 树
3 xx 木
'''
data['label_encoding'],lbl = pd.factorize(data['label'])
print(data)
'''
text label label_encoding
0 x1 花 0
1 x2 草 1
2 x3 树 2
3 xx 木 3
'''
print(lbl)
'''
Index(['花', '草', '树', '木'], dtype='object')
'''
print(lbl[1])
# 草
print(lbl[0])
# 花