我读过很多关于这个主题的博客,但一直没有找到一个明确的解决方案。我有以下情况:我有一个标签为1或-1的文本对列表。在
对于每个文本对,我希望这些特性是以下方式的串联:f()=tfidf(t1)“concat”tfidf(t2)
有什么建议吗?我有以下代码,但它给出了一个错误:count_vect = TfidfVectorizer(analyzer=u'char', ngram_range=ngram_range)
X0_train_counts = count_vect.fit_transform([x[0] for x in training_documents])
X1_train_counts = count_vect.fit_transform([x[1] for x in training_documents])
combined_features = FeatureUnion([("x0", X0_train_counts), ("x1", X1_train_counts)])
clf = LinearSVC().fit(combined_features, training_target)
average_training_accuracy += clf.score(combined_features, training_target)
我得到的错误是:
^{pr2}$
更新
解决方法如下:count_vect = TfidfVectorizer(analyzer=u'char', ngram_range=ngram_range)
training_docs_combined = [x[0] for x in training_documents] + [x[1] for x in training_documents]
X_train_counts = count_vect.fit_transform(training_docs_combined)
concat_features = hstack((X_train_counts[0:len(training_docs_combined) / 2 ], X_train_counts[len (training_docs_combined) / 2:]))
clf = LinearSVC().fit(concat_features, training_target)
average_training_accuracy += clf.score(concat_features, training_target)