SVM文本分类实验过程

最新推荐文章于 2021-03-13 11:09:53 发布

Data_Machine

最新推荐文章于 2021-03-13 11:09:53 发布

阅读量2.2k

点赞数 1

分类专栏：机器学习常用算法

本文链接：https://blog.csdn.net/qq_21500275/article/details/81354627

版权

该博客详细记录了SVM文本分类的实验步骤，包括分词和打标签，特征选择采用卡方检验或PCA，数据经过归一化处理，利用libSVM工具，将标签和特征转化为libSVM所需的格式，并通过grid.py进行参数调优，采用交叉验证方法。

摘要由CSDN通过智能技术生成

1、分词，打标签；

2、特征选择：卡方检验

def chi_select():
#构建停用词表
    stopwords=[]
    with open("../hlt_stop_words.txt","r") as stopword:   
        for line in stopword: #遍历文件，一行行遍历，读取文本
            rs = line.replace('\n', '') 
            stopwords.append(rs)
#读入文本 
    x_text = 1_examples + 2_examples + ...
# 去停用词 
    x_stop=[] 
    for word in x_text:
#        word = word.split(" ")
        rs = []
        for _ in word:
            if _ not in stopwords:
                rs.append(_)
        x_stop.append(rs)

#重组
    x_final=[]
    for i in x_stop:
        x_final.append(str(i))

#建立词典
    max_document_length = max([len(x) for x in x_stop])
    #print(max_document_length)
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
    x = np.array(list(vocab_processor.fit_transform(x_final)