问题起源:在pipeline转换时,突然出现one_hot向量输出维度为第2维为1。通过多次验证,发现第2维都是2而非1?问题在哪里呢,最后发现第2维为2和第2维为1是输入不同导致的?
case1:
y = np.array([1,0,1,1]).reshape(-1,1)
json_pritimive = "sklearn.preprocessing.OneHotEncoder"
obj = AIPrimitive()
obj.load_from_json(json_pritimive)
obj.fit(X_fit=y)
obj.produce(X_produce=y)
sklearn.preprocessing.OneHotEncoder fit start
sklearn.preprocessing.OneHotEncoder fit over
sklearn.preprocessing.OneHotEncoder produce start
sklearn.preprocessing.OneHotEncoder produce over
array([[0., 1.],
[1., 0.],
[0., 1.],
[0., 1.]])
case2:
y = np.array([1,1,1,1]).reshape(-1,1)
json_pritimive = "sklearn.preprocessing.OneHotEncoder"
obj = AIPrimitive()
obj.load_from_json(json_pritimive)
obj.fit(X_fit=y)
obj.produce(X_produce=y)
sklearn.preprocessing.OneHotEncoder fit start
sklearn.preprocessing.OneHotEncoder fit over
sklearn.preprocessing.OneHotEncoder produce start
sklearn.preprocessing.OneHotEncoder produce over
array([[1.],
[1.],
[1.],
[1.]])
case3:
y = np.array([1,1,2,0]).reshape(-1,1)
json_pritimive = "sklearn.preprocessing.OneHotEncoder"
obj = AIPrimitive()
obj.load_from_json(json_pritimive)
obj.fit(X_fit=y)
obj.produce(X_produce=y)
sklearn.preprocessing.OneHotEncoder fit start
sklearn.preprocessing.OneHotEncoder fit over
sklearn.preprocessing.OneHotEncoder produce start
sklearn.preprocessing.OneHotEncoder produce over
array([[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.]])
从这里可以看出axis=1的大小是由类别数量决定的,也符合one_hot向量的本质。我的理解偏离本质,固化的认为第2维大小为2,而不是其他值。