机器学习 结构化数据_聊天机器人:根据结构化数据创建自然语言

机器学习 结构化数据

介绍 (Introduction)

But let us first have a look at the basic translation taking place within a chatbot…


The allure of a chatbot is being able to input unstructured data. We are so use to having to structure our input and data according to what the user interface dictates.

聊天机器人的魅力在于能够输入非结构化数据。 我们习惯于必须根据用户界面的要求来构造输入和数据。

Here chatbots come along, and allow us to enter our data in a conversational manner.


And by implication, unstructured.


Image for post
The Continuous Chatbot Process: Structuring & Unstructuring Data

For user input, the chatbot must structure the data. A large part of this structuring can include the following activities:

对于用户输入,聊天机器人必须构造数据。 这种结构的很大一部分可以包括以下活动:

  • Sentence Boundary Detection (helpful for longer input)

    句子边界检测( 有助于较长的输入 )

  • Language Detection (scenarios where users speak different languages)

    语言检测( 用户说不同语言的场景)

  • Intent Detection

  • Determining Entities

  • and more…


Inversely, the data output to the user must be unstructured again into natural language…


与用户交谈 (Speaking To The User)

After the appropriate response to the user have been determined by the chatbot, the data which needs to be presented to the user, is in a structured format.


In the case of a weather bot, the data you want to present to the user might look something like this:


"id": 803,
"main": "Clouds",
"description": "broken clouds",
"icon": "http://openweathermap.org/img/wn/04d@2x.png",
"weather": "Clouds",
"temp": 80,
"high": 82,
"low": 78,
"city": "New York"

Under normal circumstances, to present this to a user via a mobile app or website is standard procedure. With a conversational interface, it is a whole different matter.

通常情况下,通过移动应用程序或网站向用户展示此内容是标准过程。 有了对话界面,情况就完全不同了。

We need to convert the data into conversation, hence unstructured it. This brings us to this continuous process of structuring and unstructuring data.

我们需要将数据转换为对话,从而使其变得非结构化。 这使我们进入了结构化和非结构化数据的连续过程。

This process of unstructuring data into conversation is referred to as Natural Language Generation, NLG.


Natural language generation


Natural language generation is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations.

自然语言生成是从机器表示系统(例如知识库或逻辑形式)生成自然语言的自然语言处理任务。 当这种形式的表述被解释为心理表象的模型时,心理语言学家更喜欢用语言产生一词。

自然语言生成( NLG )基础 (Basics Of Natural Language Generation (NLG))

As with everything, NLG can be performed on various levels of complexity. The most simplistic approach is to have a one-to-one match of return codes and phrases.

就像所有东西一样,NLG可以在各种复杂程度上执行。 最简单的方法是使返回码和短语一对一匹配。

If the API returns 0, then the bot responds with “Thank you, your request have been logged”.

如果API返回0,则漫游器会回复“ 谢谢,您的请求已被记录”。

Else if the API responds with 1, the bot responds with “Sorry, something went wrong, try again later.”


You could see this as a very basic form of unstructuring data.


生命的幻觉 (The Illusion of Lifeness)

Of course, you can take this one step further, by creating an illusion of lifeness. Some development environments, like IBM Watson Assistant allows for multiple responses to be defined per conversational node.

当然,您可以通过创造一种生活错觉来进一步迈出这一步。 某些开发环境(例如IBM Watson Assistant)允许在每个会话节点上定义多个响应。

Image for post
IBM Watson Assistant — Assistant Responses
IBM Watson Assistant-助理回复

These responses can then be set to random or sequential. In the example shown here, there is a list of goodbye messages. This list of messages can be extensive, and set to be different every time the user says goodbye to the chatbot.

然后可以将这些响应设置为随机顺序 。 在此处显示的示例中,有一个再见消息列表。 该消息列表可能很广泛,并且每次用户向聊天机器人说再见时,都会设置为不同。

Hence presenting this idea to the user of an unscripted and spontaneous agent.


脚本语言生成 (Scripted Language Generation)

Taking matters one step further, is creating a language generation script.


Microsoft’s Bot Framework Composer has a Bot Response option on the left, where you can define the bot responses.

Microsoft的Bot Framework Composer在左侧有一个Bot Response选项,您可以在其中定义漫游器响应。

Image for post
Language Generation Script

In the marked example, a Language Generation script is defined called:

在标记的示例中, 语言生成脚本被定义为:


The purpose of this example is to take the response from the weather API, and transform it into more natural sounding language. If the API returns “Dust”, we want our chatbot dialog to return: “There’s dust in the air” etc.

本示例的目的是从天气API中获取响应,并将其转换为更自然的发音语言。 如果API返回“ 灰尘 ”,我们希望聊天机器人对话框返回:“ 空气中有灰尘 ”等。

We can create multiple such scripts quick and easy for different API’s, and scenarios.


Image for post
Calling Language Generation Script From Dialog

And within the Send a response element, we can reference the language script for user feedback:

在“ 发送响应”元素中,我们可以引用语言脚本以获取用户反馈:

- @{DescribeWeather(dialog.weather)} and the temp is @{dialog.weather.temp}°

This affords us a predictable and standardized avenue of crafting responses for the user. Just think of multiple user languages in a chatbots, where the language generator can be used to respond to the user in a particular language.

这为我们提供了用户可预测的标准化响应途径。 只需考虑聊天机器人中的多种用户语言,即可在其中使用语言生成器以特定语言响应用户。

易于扩展 (Ease Of Scaling)

One issue chatbot endeavors often run into, is scaling. Invariably there comes a stage where the environment and framework need to be reconsidered.

聊天机器人经常遇到的一个问题是扩展。 总有一个阶段需要重新考虑环境和框架。

Segmenting chatbot elements as much as possible help to a large degree.


And, segmenting the script/dialog from the dialog flow is prudent, and the Language Generator speaks to this.


But why not take it even a step further…


自然语言理解的逆向 (The Inverse of Natural Language Understanding)

NLG is a software process where structured data is transformed into natural conversational language for output to the user. In other words, structured data is presented in an unstructured manner to the user. Think of NLG is the inverse of NLU.

NLG是一种软件过程,其中结构化数据被转换为自然的对话语言,以输出给用户。 换句话说,结构化数据以非结构化的方式呈现给用户。 认为NLG是NLU的逆。

With NLU we are taking the unstructured conversational input from the user (natural language) and structuring it for our software process. With NLG, we are taking structured data from backend and state machines, and turning this into unstructured data. Conversational output in human language.

使用NLU,我们可以从用户(自然语言)获取非结构化的对话输入,并为我们的软件流程构建结构。 借助NLG,我们将从后端和状态机中获取结构化数据,并将其转换为非结构化数据。 用人类语言进行会话输出。

Commercial NLG is emerging and forward looking solution providers are looking at incorporating it into their solution. At this stage you might be struggling to get your mind around the practicalities of this. Below are two practical examples which might help.

商业NLG正在兴起,前瞻性的解决方案提供商正在寻求将其纳入其解决方案中。 在此阶段,您可能正在努力使自己对此有所了解。 下面是两个可能有用的示例。

伪造产品评论生成器 (Fake Product Review Generator)

For this example I took close to 580,000 product reviews and created a TensorFlow model from that.


Fake Product Review using Natural Language Generation

By providing key words or a phrase, a product review can be generated. This product review can be seen as natural language generation.

通过提供关键词或短语,可以生成产品评论。 该产品评论可以看作是自然语言的生成。

A fictitious review is generated from a corpus of review data, based on a key word.


Imagine of the chatbot has got access to a corpus of response data, and based on key words or values, a response is generated. Unique in a sense.

想象一下,聊天机器人可以访问响应数据集,并根据关键字或值生成响应。 在某种意义上是独特的。

假新闻标题生成器 (Fake News Headline Generator)

In the video below, I got a data set from kaggle.com with about 185,000 records.


Natural Language Generation with Google’s Colab Notebook in Python
使用Google的Colab Notebook用Python生成自然语言

Each of these records where a newspaper headline which I used to create a TensforFlow model from.


Based in this model, I could then enter one or two intents, and random “fake” (hence non-existing) headlines were generated.


There are a host of parameters which can be used to tweak the output used.


结论 (Conclusion)

We have seen growth in the way input data is processed by chatbots. Multiple intents can be detected, with multiple entities. Relations and types of entities can also be identified. The flexibility is astounding in many cases.

我们已经看到聊天机器人处理输入数据的方式有所增长。 可以检测到具有多个实体的多个意图。 实体的关系和类型也可以被识别。 在许多情况下,灵活性令人震惊。

Yet we have not seen the same degree of advancement and flexibility in the chatbot script. Users judge a chatbot by its script and how appropriate and lifelike each response is. The script also informs the user on the current conversation state, and how to proceed; hence its importance.

但是,我们尚未在chatbot脚本中看到相同程度的进步和灵活性。 用户通过其脚本以及每个响应的适当程度和逼真程度来判断聊天机器人。 该脚本还通知用户当前的对话状态以及如何进行。 因此它的重要性。

在这里阅读更多… (Read More Here…)

翻译自: https://medium.com/@CobusGreyling/chatbots-creating-natural-language-from-structured-data-bbc81ee6c78c

机器学习 结构化数据

  • 0
  • 0
    觉得还不错? 一键收藏
  • 0




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


