【新手入门】课程5-深度学习入门NLP-文本分类

最新推荐文章于 2021-08-11 08:00:00 发布

oudifuf

最新推荐文章于 2021-08-11 08:00:00 发布

阅读量164

点赞数

本文链接：https://blog.csdn.net/oudifuf/article/details/103298436

版权

本文介绍了深度学习中卷积神经网络（CNN）在自然语言处理（NLP）文本分类任务中的应用。通过使用CNN，将输入的词向量序列转化为特征图，再进行最大池化操作，获取文本的定长向量表示，最后连接至softmax层进行分类。文章还提及了使用不同窗口大小的卷积核以提高效率。

摘要由CSDN通过智能技术生成

In[1]

# 创建数据集和数据字典

data_root_path='/home/aistudio/data/'

def create_data_list(data_root_path):
    with open(data_root_path + 'test_list.txt', 'w') as f:
        pass
    with open(data_root_path + 'train_list.txt', 'w') as f:
        pass

    with open(os.path.join(data_root_path, 'dict_txt.txt'), 'r', encoding='utf-8') as f_data:
        dict_txt = eval(f_data.readlines()[0])

    with open(os.path.join(data_root_path, 'news_classify_data.txt'), 'r', encoding='utf-8') as f_data:
        lines = f_data.readlines()
    i = 0
    for line in lines:
        title = line.split('_!_')[-1].replace('\n', '')
        l = line.split('_!_')[1]
        labs = ""
        if i % 10 == 0:
            with open(os.path.join(data_root_path, 'test_list.txt'), 'a', encoding='utf-8') as f_test:
                for s in title:
                    lab = str(dict_txt[s])
                    labs = labs + lab + ','
                labs = labs[:-1]
                labs = labs + '\t' + l + '\n'
                f_test.write(labs)
        else:
            with open(os.path.join(data_root_path, 'train_list.txt'), 'a', encoding='utf-8') as f_train:
                for s in title:
                    lab = str(dict_txt[s])
                    labs = labs + lab + ','
                labs = labs[:-1]
                labs = labs + '\t' + l + '\n'
                f_train.write(labs)
        i += 1
    print("数据列表生成完成！")


# 把下载得数据生成一个字典
def create_dict(data_path, dict_path):
    dict_set = set()
    # 读取已经下载得数据
    with open(data_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    # 把数据生成一个元组
    for line in lines:
        title = line.split('_!_')[-1].replace('\n', '')
        for s in title:
            dict_set.add(s)
    # 把元组转换成字典，一个字对应一个数字
    dict_list = []
    i = 0
    for s in dict_set:
        dict_list.append([s, i])
        i += 1
    # 添加未知字符
    dict_txt = dict(dict_list)
    end_dict = {"<unk>": i}
    dict_txt.update(end_dict)
    # 把这些字典保存到本地中
    with open(dict_path, 'w', encoding='utf-8') as f:
        f.write(str(dict_txt))

    print("数据字典生成完成！")


# 获取字典的长度
def get_dict_len(dict_path):
    with open(dict_path, 'r', encoding='utf-8') as f:
        line =

最低0.47元/天解锁文章

oudifuf

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【新手入门】课程5-深度学习入门NLP-文本分类

In[1]# 创建数据集和数据字典data_root_path='/home/aistudio/data/'def create_data_list(data_root_path): with open(data_root_path + 'test_list.txt', 'w') as f: pass with open(data_root_path ...
复制链接

扫一扫