用LSTM、GRU来训练字符级的语言模型_misaka2019的博客-CSDN博客
大量参考以上链接,加了藏头诗功能,仅作鬼畜版本,娱乐大众
import requests
from bs4 import BeautifulSoup
sentences=[]
for i in range(9):#这个要看你爬的人的诗的页数来修改,如果不喜欢乾隆可以去下方链接在搜索栏换一个人,把链接复制在下面后别忘了改页数哈
resp=requests.get("https://hanyu.baidu.com/s?wd=%E4%B9%BE%E9%9A%86%E8%AF%97&from=poem&pn={}".format(i))
soup= BeautifulSoup(resp.text)
a=soup.find_all("div",class_="poem-list-item-body check-red")
for sentence in a:
sentence=sentence.text.replace("\n",'').replace(" ","").replace("...","")
sentences.append(sentence)
print(sentences)
poetrys=sentences
all_word = ''
for potery in poetrys:
all_word += potery
all_word = all_word.replace(',','').replace('。','').replace('、','').replace('》','').replace('《','')
# 统计词频
word_dict = {}
for word in all_word:
if word not in word_dict:
word_dict[word] = 1
else:
word_dict[word] += 1
word_sort = sorted(word_dict.items(),key=lambda x:x[1],reverse=True)
words, _ = zip(*word_sort)
# 获取词典
word_to_token = {word:id for id, word in enumerate(words)}
token_to_word = dict(enumerate(words))
# 将字序列转化为id序列
def transword(char_list):
ids = [word_to_token.get(char, len(word)-1) for char in char_list]
return ids
然后就正常接之前作者的代码了,另加了一个藏头诗功能,可以写给女朋友(
def generate_random(head):
"""自由生成一首藏头歌"""
print(head)
max_len=7#每行几个字
a=transword(head)
print(len(a))
for j in range(len(a)):
poetry = []
poetry.append(head[j])
random_word = [a[j]]
_ = Variable(torch.zeros(2, 1, 256))
input = torch.LongTensor(random_word).reshape(1,1)
for i in range(max_len):
# 前向计算出概率最大的当前词
proba, _ = model(input, _)
top_index = pick_top_n(proba)
char = token_to_word[top_index]
input = (input.data.new([top_index])).view(1, 1)
poetry.append(char)
for char in poetry:
print(char,end='')
print("\n")
generate_random("乾隆诗好")
因为作者写这篇文章时已经在玩辛弃疾的词了,给大家generate一首辛词人的夸奖赞乾隆洗洗眼睛:
乾情来,诉叶痴,似旧牛行行。 隆节可,有幄气,眉是拙业得。 诗社散,恨治候,歌媒闲去雪。 好处事,弟起醉,断寒襟梦路。
这个词生不出感情,但作者以后会继续探索这个烂诗生成器的,会有新版本发布的哦,敬请期待