[AI] 如何制作一个低配山寨版贾维斯？-口语人机交互 Spoken Human Robot Interaction

最新推荐文章于 2024-04-25 16:28:47 发布

是土豆大叔啊！

最新推荐文章于 2024-04-25 16:28:47 发布

阅读量2.2k

点赞数 4

分类专栏：人工智能文章标签：人工智能 python 语音识别 nlp 人机交互

本文链接：https://blog.csdn.net/potato_uncle/article/details/118498358

版权

人工智能专栏收录该内容

24 篇文章 11 订阅

订阅专栏

口语人机交互 Spoken Human Robot Interaction

一、交互结构总览
二、展示
三、实现代码

关注！点赞！评论！收藏！谢谢！

如何制作一个低配山寨版贾维斯?
在这里插入图片描述

一、交互结构总览

在这里插入图片描述

第一步：首先通过语音输入设备，将语音信息输入计算机。这里我使用speech_recognition.sr.Microphone() 函数调用计算机麦克风，然后 sr.Recognizer().listen() 将麦克风输入的语音信息保留下来。
第二步：使用语言识别库，将输入语音信息转为文本信息。sr.Recognizer().recognize_google() 使用谷歌的语音识别获取文本信息。
第三步：使用 en_core_web_sm 库对语言进行解析，并绘制依赖树
第四步：根据文本单词的词性或者特定词（此处可以自由替换对话中的某些信息，比如时间，数量等，并让计算机可以识别到这些信息），来让计算机自动回复设定好的语音（用Espeak工具可以让计算机用语音读出文本）

二、展示

一、代码背景

我设定的对话背景是，我的计算机是个低配版贾维斯 (~~是真的低配~~) ，他管理我的房子，然后有朋友要来家里做客，我让贾维斯帮我提前做一些准备。

二、对话运行结果展示

为了方便，所用的包均用的英文，读者可以自由替换语言包，比如中文包

对话一

贾维斯：What can I do for you sir?
我：   today my friends will come to my house

在这里插入图片描述
下面的对话可以是任意人数，贾维斯会识别语言中的表示数字的单词，并在后面的对话中复述出来。

贾维斯：Sir, How many friends will come?
我：    maybe six people

在这里插入图片描述

对话二

这里“ isyncr” 其实是 “I think”，谷歌语音识别的不是特别准（~~当然不能是我发音不漂准！~~）

此处语音可选择三种饮料中的一种，贾维斯识别后会复述一遍你的选择

贾维斯：and what drinks do I need to prepare？cola, tea or coffee?
我 ：   isyncr coffee is the best
贾维斯：got it sir, I will prepare six cups of coffe

在这里插入图片描述

对话三

贾维斯：By the way, sir, when they will come?
我：   if I remember correctly live well, either 5 p.m.
贾维斯：Sir, please confirm, your six friends will come at 5. pm and I will prepare six cups of coffe for them

在这里插入图片描述

对话四

贾维斯：And Sir, your house looks not clean, can I clean it now?
我：yes  of course please
贾维斯：As your wish, please wait a moment
贾维斯：Sir, your house is clean now!
贾维斯：Have a nice day sir!

三、实现代码

我还在开头加了个下载好的贾维斯“欢迎回家”的语音包，代码及语音包戳这里

import speech_recognition as sr
import en_core_web_sm
import os
import spacy
from spacy import displacy
from spacy.symbols import NOUN, NUM, VERB
from nltk import Tree
import winsound

# Built a nltk tree
def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_

def dependency_tree(text):
    nlp = en_core_web_sm.load()
    nlp_doc = nlp(text)
    print('---------------')
    print('Dependency Graph')
    [to_nltk_tree(sent.root).pretty_print() for sent in nlp_doc.sents]
    print('---------------')

def clean():
    winsound.PlaySound('Jarvis/Jarvis-System/Garbage cleared.wav', winsound.SND_FILENAME)
    reply = 'Sir, your house is clean now!'
    print(reply)
    os.system('espeak "{}"'.format(reply))


winsound.PlaySound('Jarvis/Jarvis-System/Welcome Home Sir(No Song).wav', winsound.SND_FILENAME)

#today my friends will come to my house
r = sr.Recognizer()
with sr.Microphone() as source:
    l = "What can I do for you sir?"
    print(l)
    os.system('espeak "{}"'.format(l))
    audio = r.listen(source)

try:
    textT = r.recognize_google(audio)
except sr.UnknownValueError:
    print("Jarvis could not understand your audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))

print(textT)
dependency_tree(textT)

nlp = en_core_web_sm.load()
nlp_doc = nlp(textT)

for word in nlp_doc:
    
    if word.pos == VERB and str(word) == 'come':
        
        l = 'Sir, How many friends will come?'
        print(l)
        os.system('espeak "{}"'.format(l))

        with sr.Microphone() as source:
            audio = r.listen(source)

        try:
            textT = r.recognize_google(audio)
        except sr.UnknownValueError:
            print("Jarvis could not understand your audio")
        except sr.RequestError as e:
            print("Could not request results from Google Speech Recognition service; {0}".format(e))

        print(textT)
        dependency_tree(textT)

        nlp = en_core_web_sm.load()
        nlp_doc = nlp(textT)

        for word in nlp_doc:
            if word.pos == NUM:
                friends_num = str(word)
#I remember six people

# I think tea is the best
r = sr.Recognizer()
with sr.Microphone() as source:
    l = "and what drinks do I need to prepare？cola, tea or coffee?"
    print(l)
    os.system('espeak "{}"'.format(l))
    audio = r.listen(source)

try:
    textT = r.recognize_google(audio)
except sr.UnknownValueError:
    print("Jarvis could not understand your audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))
    
print(textT)
dependency_tree(textT)

if 'tea' in textT.lower():
    drink = 'tea'
    l = 'got it sir, I will prepare ' + friends_num+' cups of tea'
elif 'cola' in textT.lower():
    drink = 'cola'
    l = 'got it sir, I will prepare ' + friends_num+' cups of cola'
elif 'coffe' in textT.lower():
    drink = 'coffe'
    l = 'got it sir, I will prepare ' + friends_num+' cups of coffe'
else:
    l0 = 'Sir, can you please say it again?'
    print(l0)
    os.system('espeak "{}"'.format(l0))

print(l)
os.system('espeak "{}"'.format(l))

#If I remeber correctly, they will come at 5 pm
r = sr.Recognizer()
with sr.Microphone() as source:
    l = "By the way, sir, when they will come?"
    print(l)
    os.system('espeak "{}"'.format(l))
    audio = r.listen(source)

try:
    textT = r.recognize_google(audio)
except sr.UnknownValueError:
    print("Jarvis could not understand your audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))

print(textT)
dependency_tree(textT)    

nlp_doc = nlp(textT)

time_num = ''

for word in nlp_doc:
    if word.pos == NUM:
        time_num += str(word) + '.'
    if 'a.m.' in textT.lower():
        m = 'am'
    elif 'p.m.' in textT.lower():
        m = 'pm'
    else:
        m = 'pm'
    
l = 'Sir, please confirm, your ' + friends_num + ' friends will come at '\
    + time_num + ' ' + m + ' and I will prepare ' + friends_num + ' cups of ' + drink + ' for them'
print(l)
os.system('espeak "{}"'.format(l))


r = sr.Recognizer()
with sr.Microphone() as source:
    l = "And Sir, your house looks not clean, can I clean it now?"
    print(l)
    os.system('espeak "{}"'.format(l))
    audio = r.listen(source)

try:
    textT = r.recognize_google(audio)
except sr.UnknownValueError:
    print("Jarvis could not understand your audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))

print(textT) 
dependency_tree(textT) 

if 'yes' or 'yeah' in textT.lower():
    l = 'As your wish, please wait a moment'
    print(l)
    os.system('espeak "{}"'.format(l))
    clean()
    l = 'Have a nice day sir!'
    print(l)
    os.system('espeak "{}"'.format(l))

当然，读者可以在此框架下，加入RNN来让贾维斯的回复更加智能！

关注！点赞！评论！收藏！谢谢！

是土豆大叔啊！

关注

4
点赞
踩
26

收藏

觉得还不错? 一键收藏
打赏
3
评论
[AI] 如何制作一个低配山寨版贾维斯？-口语人机交互 Spoken Human Robot Interaction

口语人机交互 Spoken Human Robot Interaction一、交互结构总览二、代码一、代码背景二、对话运行结果展示对话一对话二对话三对话四三、实现代码如何制作一个低配山寨版贾维斯?一、交互结构总览第一步：首先通过语音输入设备，将语音信息输入计算机。这里我使用speech_recognition.sr.Microphone() 函数调用计算机麦克风，然后 sr.Recognizer().listen() 将麦克风输入的语音信息保留下来。第二步：使用语言识别库，将输入语音
复制链接

扫一扫