如何开发一个词语级的神经语言模型并使用它生成文本？

最新推荐文章于 2024-07-17 16:15:00 发布

贪心科技

最新推荐文章于 2024-07-17 16:15:00 发布

阅读量1.4k

点赞数 1

分类专栏：贪心科技 AI 人工智能深度讨论文章标签：人工智能 AI Python 神经网络模型神经语言模型

本文链接：https://blog.csdn.net/Mlooker/article/details/80298238

版权

本教程介绍了如何使用Python和深度学习开发统计语言模型，特别是神经网络模型。通过学习，你将掌握如何为文本模型编写代码，设计包含嵌入层和LSTM的模型，以及利用模型生成新文本。教程涵盖了数据准备、模型训练和使用等方面。

摘要由CSDN通过智能技术生成

出品：贪心科技（公众号：贪心科技）

作者：Artem Oppermann（贪心科技编译）

字数：2300

阅读时长：5分钟

前言

在本教程中，您将发现如何使用 Python 中的深层学习来开发统计语言模型。神经网络模型是开发统计语言模型的首选方法，因为这种模型可以使用分布表示的形式，在该形式中具有相似含义的不同词语具有相似的表示形式，并且它们可以使用大量的上下文进行训练预测。

完成本教程后，您将知道:

如何为基于单词的语言模型来编写文本。

如何设计并拟合一个具有嵌入学习以及有 LSTM 隐藏层的神经语言模型。

如何使用学习的语言模型生成具有相似统计属性的新文本作为源文本。

我们开始吧！

教程概述

本教程分为4部分；它们是:

1. 柏拉图的The Republic

2. 数据准备

3. 训练语言模型

4. 使用语言模型

柏拉图的The Republic

The Republic是希腊古典哲学家柏拉图最著名的著作。

它的结构是一长段关于一个国家秩序和正义的对话。

文本从以下内容开始:

BOOK I.

I went down yesterday to the Piraeus with Glaucon the son of Ariston，
…

并以此作为结束：

…
And it shall be well with us both in this life and in the pilgrimage of a thousand years which we have been describing.

将已整理的版本保存为‘republic_clean.txt’并保存在您当前的工作目录中。该文件应有15802行文本。

现在，我们可以从这个文本中开发一个语言模型。

数据准备

我们将从准备数据建模开始。

第一步是查看数据。

查看文本

在编辑器中打开文本，然后查看文本数据。

例如，下面是第一段对话:

BOOK I

I went down yesterday to the Piraeus with Glaucon the son of Ariston，
that I might offer up my prayers to the goddess (Bendis，the Thracian
Artemis.)；and also because I wanted to see in what manner they would
celebrate the festival，which was a new thing. I was delighted with the
procession of the inhabitants；but that of the Thracians was equally，
if not more，beautiful. When we had finished our prayers and viewed the
spectacle，we turned in the direction of the city；and at that instant
Polemarchus the son of Cephalus chanced to catch sight of us from a
distance as we were starting on our way home，and told his servant to
run and bid us wait for him. The servant took hold of me by the cloak
behind，and said: Polemarchus desires you to wait.

I turned round，and asked him where his master was.

There he is，said the youth，coming after you，if you will only wait.

Certainly we will，said Glaucon；and in a few minutes Polemarchus
appeared，and with him Adeimantus，Glaucon’s brother，Niceratus the son
of Nicias，and several others who had been at the procession.

Polemarchus said to me: I perceive，Socrates，that you and your
companion are already on your way to the city.

You are not far wrong，I said.

…

在准备数据时，我们需要处理哪些内容？

以下是我迅速浏览后认为应该处理的:

书籍/章节标题 (如 "BOOK I.")。

英式英语拼写 (如 "honoured")

许多标点 (