在下面的示例中,我使用twitter数据集执行情绪分析。我使用sklearnpipeline来执行一系列转换、添加特性和添加classifier。最后一步是将具有更高预测能力的单词形象化。当我不使用特征选择时,它工作得很好。然而,当我使用它时,我得到的结果是没有意义的。我怀疑当应用特征选择时,文本特征的顺序会改变。有没有办法解决这个问题?在
以下代码已更新,以包含正确答案from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion
features= [c for c in df.columns.values if c not in ['target']]
target = 'target'
#train test split
X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=0.2,stratify = df5[target], random_state=0)
#Create classes which allow to select specific columns from the dataframe
class NumberSelector(BaseEstimator, TransformerMixin):
def __init__(self, key):
self.key = key
def fit(self, X, y=None):
return self
def transform(self, X):
return X[[self.key]]
class TextSelector(BaseEstimator, TransformerMixin):
def __init__(self, key):
self.key = key
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.key]
class ColumnExtractor(T