TextMatch
TextMatch is a semantic matching model library for QA & text search … It’s easy to train models and to export representation vectors.
TextMatch/train_model模块包含 :
(1)train_bow.py : bow模型训练
样例:
import sys
from textmatch.models.text_embedding.bow_sklearn import Bow
from textmatch.config.constant import Constant as const
if __name__ == '__main__':
# 训练集
words_list = ["我去玉龙雪山并且喜欢玉龙雪山玉龙雪山","我在玉龙雪山并且喜欢玉龙雪山","我在九寨沟"]
# doc
words_list1 = ["我去玉龙雪山并且喜欢玉龙雪山玉龙雪山","我在玉龙雪山并且喜欢玉龙雪山","我在九寨沟", "哈哈哈哈"]
# 训练
bow = Bow(dic_path=const.BOW_DIC_PATH, bow_index_path=const.BOW_INDEX_PARH, )
bow.fit(words_list)
# query
bow = Bow(dic_path=const.BOW_DIC_PATH, bow_index_path=const.BOW_INDEX_PARH, )
bow.init(words_list1, update=False)
testword = "我在九寨沟,很喜欢"
#for word in jieba.cut(testword):
# print ('>>>>', word)
pre = bow.predict(testword)
print ('pre>>>>>', pre)
pre = bow._predict(testword)[0]
print ('pre>>>>>', pre)
(2)train_tfidf.py: tfidf模型训练
(3)train_ngram_tfidf.py:ngram_tfidf模型训练
(4)train_w2v.py:word2vector模型训练
(5)train_bert.py:bert模型训练
(6)train_albert.py:albert模型训练
(7)train_dssm.py:dssm模型训练
(8)train_dnn.py:dnn模型训练