xgb_enc_1 = OneHotEncoder()
xgb_enc_2 = OneHotEncoder()
xgb_enc_1.fit(model_1.apply(train_gb))
xgb_enc_2.fit(model_2.apply(train_gb))
#transform输出据真为稀疏矩阵,train_lr为numpy的稠密矩阵
temp_1 = xgb_enc_1.transform(model_1.apply(train_lr))
temp_2 = xgb_enc_2.transform(model_2.apply(train_lr))
temp_3 = train_lr
temp_1
Out[24]:
<256x1624 sparse matrix of type '<class 'numpy.float64'>'
with 217600 stored elements in Compressed Sparse Row format>
temp_2
Out[25]:
<256x1977 sparse matrix of type '<class 'numpy.float64'>'
with 217600 stored elements in Compressed Sparse Row format>
temp_3.shape
Out[31]: (256, 14)
如果直接使用np.hstack进行拼接:
train_lr_ext_2 = np.hstack((temp_1,temp_3))
报错:
ValueError: all the input arrays must have same number of dimensions
稀疏矩阵与稠密矩阵维度不一致,解决此问题两种方法:
- todense()函数
a = temp_1.todense()
train_lr_ext_2 = np.hstack((a,temp_3))
train_lr_ext_2.shape
Out[34]: (256, 1638)
- 使用scipy.saprse的hstack()函数进行拼接
from scipy.sparse import hstack
b = hstack((temp_1,temp_3))
b.shape
Out[39]: (256, 1638)