用LSTM、GRU来训练字符级的语言模型

最新推荐文章于 2023-10-22 00:13:19 发布

misaka2019

最新推荐文章于 2023-10-22 00:13:19 发布

阅读量625

点赞数 1

分类专栏： NLP 文章标签：自然语言处理深度学习机器学习

本文链接：https://blog.csdn.net/Mikow/article/details/114647844

版权

用LSTM、GRU来训练字符级的语言模型

import torch
import torch.nn as nn
import torch.utils.data as Data
import torch.autograd as autograd
import torch.optim as optim
import numpy as np
from torch.autograd import Variable

# 读取文件
poetrys = []
poetry = ''
with open("poetryFromTang.txt", encoding='utf-8') as f:
    next(f)
    for line in f:
        if len(line)!=1:
            poetry += line.strip('\n')
        else:
            poetrys.append(poetry)
            poetry = ''

# 生成词库
all_word = ''
for potery in poetrys:
    all_word += potery

all_word = all_word.replace('，','').replace('。','')

# 统计词频
word_dict = {
   }

for word in all_word:
    if word not in word_dict:
        word_dict[word] = 1
    else:
        word_dict[word] += 1
        
word_sort = sorted(word_dict.items(),key=lambda x:x[1],reverse=True)
words, _ = zip(*word_sort)

# 获取词典
word_to_token = {
   word:id for id, word in enumerate(words)}
token_to_word = dict(enumerate(words))

# 将字序列转化为id序列
def transword(char_list):
    ids = [word_to_token.get(char, len(word)-1) for

最低0.47元/天解锁文章

misaka2019

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
用LSTM、GRU来训练字符级的语言模型

用LSTM、GRU来训练字符级的语言模型import torchimport torch.nn as nnimport torch.utils.data as Dataimport torch.autograd as autogradimport torch.optim as optimimport numpy as npfrom torch.autograd import Variable# 读取文件poetrys = []poetry = ''with open("poetryF
复制链接

扫一扫

专栏目录