nlp_beginner 任务五（乾隆的诗喂的烂诗生成器）（爬虫爬别人的诗，不用唐诗）

本文链接：https://blog.csdn.net/m0_56110185/article/details/127657801

用LSTM、GRU来训练字符级的语言模型_misaka2019的博客-CSDN博客

大量参考以上链接，加了藏头诗功能，仅作鬼畜版本，娱乐大众

import requests
from bs4 import BeautifulSoup
sentences=[]
for i in range(9):#这个要看你爬的人的诗的页数来修改，如果不喜欢乾隆可以去下方链接在搜索栏换一个人，把链接复制在下面后别忘了改页数哈
    resp=requests.get("https://hanyu.baidu.com/s?wd=%E4%B9%BE%E9%9A%86%E8%AF%97&from=poem&pn={}".format(i))
    soup= BeautifulSoup(resp.text)
    a=soup.find_all("div",class_="poem-list-item-body check-red")
    for sentence in a:
        sentence=sentence.text.replace("\n",'').replace(" ","").replace("...","")
        sentences.append(sentence)
print(sentences)   
poetrys=sentences
all_word = ''
for potery in poetrys:
    all_word += potery
all_word = all_word.replace('，','').replace('。','').replace('、','').replace('》','').replace('《','')
# 统计词频
word_dict = {}

for word in all_word:
    if word not in word_dict:
        word_dict[word] = 1
    else:
        word_dict[word] += 1      
word_sort = sorted(word_dict.items(),key=lambda x:x[1],reverse=True)
words, _ = zip(*word_sort)

# 获取词典
word_to_token = {word:id for id, word in enumerate(words)}
token_to_word = dict(enumerate(words))

# 将字序列转化为id序列
def transword(char_list):
    ids = [word_to_token.get(char, len(word)-1) for char in char_list]
    return ids

然后就正常接之前作者的代码了，另加了一个藏头诗功能，可以写给女朋友（

def generate_random(head):
    """自由生成一首藏头歌"""
    print(head)
    max_len=7#每行几个字
    a=transword(head)
    print(len(a))
    for j in range(len(a)):
        poetry = []
        poetry.append(head[j])
        random_word = [a[j]]
        _ = Variable(torch.zeros(2, 1, 256))
        input = torch.LongTensor(random_word).reshape(1,1)
        for i in range(max_len):
            # 前向计算出概率最大的当前词
            proba, _ = model(input, _)
            top_index = pick_top_n(proba)
            char = token_to_word[top_index]
            input = (input.data.new([top_index])).view(1, 1)
            poetry.append(char)
        for char in poetry:
            print(char,end='')
        print("\n")
generate_random("乾隆诗好")

因为作者写这篇文章时已经在玩辛弃疾的词了，给大家generate一首辛词人的夸奖赞乾隆洗洗眼睛：

乾情来，诉叶痴，似旧牛行行。
隆节可，有幄气，眉是拙业得。
诗社散，恨治候，歌媒闲去雪。
好处事，弟起醉，断寒襟梦路。

这个词生不出感情，但作者以后会继续探索这个烂诗生成器的，会有新版本发布的哦，敬请期待

nlp_beginner 任务五 （乾隆的诗喂的烂诗生成器）（爬虫爬别人的诗，不用唐诗）

nlp_beginner 任务五（乾隆的诗喂的烂诗生成器）（爬虫爬别人的诗，不用唐诗）