自然语言处理-聊天机器人

最新推荐文章于 2024-04-25 16:28:47 发布

JQW_YNU

最新推荐文章于 2024-04-25 16:28:47 发布

阅读量1k

点赞数

分类专栏：自然语言处理文章标签：自然语言处理

本文链接：https://blog.csdn.net/qq_35394891/article/details/80795490

版权

自然语言处理专栏收录该内容

12 篇文章 0 订阅

订阅专栏

rule-based的玩法,

以及几个角度的升级。

首先，我们看一个

最基础版本的rule-base机器人

基本就是小学生级别的问什么答什么

    In [11]: 
  

import random

# 打招呼
greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']
# 回复打招呼
random_greeting = random.choice(greetings)

# 对于“你怎么样？”这个问题的回复
question = ['How are you?','How are you doing?']
# “我很好”
responses = ['Okay',"I'm fine"]
# 随机选一个回
random_response = random.choice(responses)

# 机器人跑起来
while True:
    userInput = input(">>> ")
    if userInput in greetings:
        print(random_greeting)
    elif userInput in question:
        print(random_response)
    # 除非你说“拜拜”
    elif userInput == 'bye':
        break
    else:
        print("I did not understand what you said")

>>> hi
hey
>>> how are u
I did not understand what you said
>>> how are you
I did not understand what you said
>>> how are you?
I did not understand what you said
>>> How are you?
I'm fine
>>> bye

升级I:

显然这样的rule太弱智了，我们需要更好一点的“精准对答”

比如

透过关键词来判断这句话的意图是什么（intents）

    In [10]: 
  

from nltk import word_tokenize
import random

# 打招呼
greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']
# 回复打招呼
random_greeting = random.choice(greetings)

# 对于“假期”的话题关键词
question = ['break','holiday','vacation','weekend']
# 回复假期话题
responses = ['It was nice! I went to Paris',"Sadly, I just stayed at home"]
# 随机选一个回
random_response = random.choice(responses)



# 机器人跑起来
while True:
    userInput = input(">>> ")
    # 清理一下输入，看看都有哪些词
    cleaned_input = word_tokenize(userInput)
    # 这里，我们比较一下关键词，确定他属于哪个问题
    if  not set(cleaned_input).isdisjoint(greetings):
        print(random_greeting)
    elif not set(cleaned_input).isdisjoint(question):
        print(random_response)
    # 除非你说“拜拜”
    elif userInput == 'bye':
        break
    else:
        print("I did not understand what you said")

>>> hi
hey
>>> how was your holiday?
It was nice! I went to Paris
>>> wow, amazing!
I did not understand what you said
>>> bye

大家大概能发现，这依旧是文字层面的“精准对应”。

现在主流的研究方向，是做到语义层面的对应。

比如，“肚子好饿哦”， “饭点到了”

都应该表示的是要吃饭了的意思。

在这个层面，就需要用到word vector之类的embedding方法，

这部分内容日后的课上会涉及到。

升级II：

光是会BB还是不行，得有知识体系！才能解决用户的问题。

我们可以用各种数据库，建立起一套体系，然后通过搜索的方式，来查找答案。

比如，最简单的就是Python自己的graph数据结构来搭建一个“地图”。

依据这个地图，我们可以清楚的找寻从一个地方到另一个地方的路径，

然后作为回答，反馈给用户。

    In [17]: 
  

# 建立一个基于目标行业的database
# 比如 这里我们用python自带的graph
graph = {'上海': ['苏州', '常州'],
         '苏州': ['常州', '镇江'],
         '常州': ['镇江'],
         '镇江': ['常州'],
         '盐城': ['南通'],
         '南通': ['常州']}

# 明确如何找到从A到B的路径
def find_path(start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    if start not in graph:
        return None
    for node in graph[start]:
        if node not in path:
            newpath = find_path(node, end, path)
            if newpath: return newpath
    return None

    In [20]: 
  

print(find_path('上海', "镇江"))

['上海', '苏州', '常州', '镇江']

同样的构建知识图谱的玩法，

也可以使用一些Logic Programming，比如上个世纪学AI的同学都会学的Prolog。

或者比如，python版本的prolog：PyKE。

他们可以构建一种复杂的逻辑网络，让你方便提取信息，

而不至于需要你亲手code所有的信息:

son_of(bruce, thomas, norma)
son_of(fred_a, thomas, norma)
son_of(tim, thomas, norma)
daughter_of(vicki, thomas, norma)
daughter_of(jill, thomas, norma)

升级III：

任何行业，都分个前端后端。

AI也不例外。

我们这里讲的算法，都是后端跑的。

那么，为了做一个靠谱的前端，很多项目往往也需要一个简单易用，靠谱的前端。

比如，这里，利用Google的API，写一个类似钢铁侠Tony的语音小秘书Jarvis：

我们先来看一个最简单的说话版本。

利用gTTs(Google Text-to-Speech API), 把文本转化为音频。

    In [4]: 
  

from gtts import gTTS
import os
tts = gTTS(text='您好，我是您的私人助手，我叫小辣椒', lang='zh-tw')
tts.save("hello.mp3")
os.system("mpg321 hello.mp3")

      Out[4]: 
    

同理，

有了文本到语音的功能，

我们还可以运用Google API读出Jarvis的回复：

（注意：这里需要你的机器安装几个库 SpeechRecognition, PyAudio 和 PySpeech）

    In [2]: 
  

import speech_recognition as sr
from time import ctime
import time
import os
from gtts import gTTS
import sys
 
# 讲出来AI的话
def speak(audioString):
    print(audioString)
    tts = gTTS(text=audioString, lang='en')
    tts.save("audio.mp3")
    os.system("mpg321 audio.mp3")

# 录下来你讲的话
def recordAudio():
    # 用麦克风记录下你的话
    r = sr.Recognizer()
    with sr.Microphone() as source:
        audio = r.listen(source)
 
    # 用Google API转化音频
    data = ""
    try:
        data = r.recognize_google(audio)
        print("You said: " + data)
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))
 
    return data

# 自带的对话技能（rules）
def jarvis():
    
    while True:
        
        data = recordAudio()

        if "how are you" in data:
            speak("I am fine")

        if "what time is it" in data:
            speak(ctime())

        if "where is" in data:
            data = data.split(" ")
            location = data[2]
            speak("Hold on Tony, I will show you where " + location + " is.")
            os.system("open -a Safari https://www.google.com/maps/place/" + location + "/&amp;")

        if "bye" in data:
            speak("bye bye")
            break

# 初始化
time.sleep(2)
speak("Hi Tony, what can I do for you?")

# 跑起
jarvis()

Hi Tony, what can I do for you?
You said: how are you
I am fine
You said: what time is it now
Fri Apr  7 18:16:54 2017
You said: where is London
Hold on Tony, I will show you where London is.
You said: ok bye bye
bye bye

不仅仅是语音前端。

包括应用场景：微信，slack，Facebook Messager，等等都可以把我们的ChatBot给integrate进去。

这部分内容也会在我们课程的后面阶段带给大家。

    In [ ]: 
  

JQW_YNU

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
自然语言处理-聊天机器人

rule-based的玩法,以及几个角度的升级。首先，我们看一个最基础版本的rule-base机器人基本就是小学生级别的问什么答什么In [11]:import random# 打招呼greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']# 回复打招呼random_greeting = random.choice(greetin...
复制链接

扫一扫