Aiml中文包含英文(字母,特殊符号)识别问题的解决

        aiml对识别纯英文是没问题的,但是,如果语句中包含了中文和英文字母就识别不了,主要原因是在每个汉子、字母中会加空格,与样本中的标题不匹配,故找不到答案。

        网上很多写法是改写aiml包Kernel类中_check_contain_english的方法,这种方法可行,但不方便,如果程序每部署到一个地方,就要改写下环境包,实则不是明智之举,另外,如果哪天环境一升级,问题又来了。

        既然不能改变别人,那就改变我们自己,我们改造下自己程序就可以了。创建一个类,我们重写下aiml.Kernel就可以了,我重新的是learn、respond两个方法,不让他们加空格就是了。当然,应该根据自己的实际情况而定,因为我的项目只是拿来作为智能客服系统检索答案用,对话中也不会用到全英文(语义识别、实体抽取等用的是NLP其他模型),上代码:

# !/usr/bin/env python
# -*- coding: UTF-8 –*-

import sys
import os
import aiml
import time
import glob
import xml.sax
from aiml.Kernel import create_parser
from aiml import Utils


class myAiml(aiml.Kernel):

    def __init__(self):
        super(myAiml, self).__init__()

    def learn(self, filename):
        """Load and learn the contents of the specified AIML file.

        If filename includes wildcard characters, all matching files
        will be loaded and learned.

        """
        for f in glob.glob(filename):
            if self._verboseMode: print("Loading %s..." % f, end="")
            start = time.clock()
            # Load and parse the AIML file.
            parser = create_parser()
            handler = parser.getContentHandler()
            handler.setEncoding(self._textEncoding)
            try:
                parser.parse(f)
            except xml.sax.SAXParseException as msg:
                err = "\nFATAL PARSE ERROR in file %s:\n%s\n" % (f, msg)
                sys.stderr.write(err)
                continue
            # store the pattern/template pairs in the PatternMgr.
            em_ext = os.path.splitext(filename)[1]
            for key, tem in handler.categories.items():
                new_key = key
                if key and key[0] and key[1] and key[2] and em_ext == '.aiml' and (
                        not self._check_contain_english(key[0])):
                    new_key = (''.join(key[0]), key[1], key[2])
                elif key and key[0] and key[1] and key[2] and em_ext == '.aiml' and self._check_contain_english(key[0]):
                    new_key = (key[0].upper(), key[1], key[2])
                self._brain.add(new_key, tem)
            # Parsing was successful.
            if self._verboseMode:
                print("done (%.2f seconds)" % (time.clock() - start))

    def respond(self, input_, sessionID=aiml.Kernel._globalSessionID):
        """Return the Kernel's response to the input string."""
        if len(input_) == 0:
            return u""
        # Decode the input (assumed to be an encoded string) into a unicode
        # string. Note that if encoding is False, this will be a no-op
        try:
            input_ = self._cod.dec(input_)
        except UnicodeError:
            pass
        except AttributeError:
            pass

        # prevent other threads from stomping all over us.
        self._respondLock.acquire()

        try:
            # Add the session, if it doesn't already exist
            self._addSession(sessionID)

            # split the input into discrete sentences
            sentences = Utils.sentences(input_)
            finalResponse = u""
            for index, s in enumerate(sentences):

                if not self._check_contain_english(s):
                    s = ''.join(s)
                # Add the input to the history list before fetching the
                # response, so that <input/> tags work properly.
                inputHistory = self.getPredicate(self._inputHistory, sessionID)
                inputHistory.append(s)
                while len(inputHistory) > self._maxHistorySize:
                    inputHistory.pop(0)
                self.setPredicate(self._inputHistory, inputHistory, sessionID)

                # Fetch the response
                response = self._respond(s, sessionID)

                # add the data from this exchange to the history lists
                outputHistory = self.getPredicate(self._outputHistory, sessionID)
                outputHistory.append(response)
                while len(outputHistory) > self._maxHistorySize:
                    outputHistory.pop(0)
                self.setPredicate(self._outputHistory, outputHistory, sessionID)

                # append this response to the final response.
                finalResponse += (response + u"  ")

            finalResponse = finalResponse.strip()
            # print( "@ASSERT", self.getPredicate(self._inputStack, sessionID))
            assert (len(self.getPredicate(self._inputStack, sessionID)) == 0)

            # and return, encoding the string into the I/O encoding
            return self._cod.enc(finalResponse)

        finally:
            # release the lock
            self._respondLock.release()

以上代码,主要是把空格去掉(标红色),然后就可以对中文和英文混合句子进行识别了。

调用如下:

from .myAiml import myAiml

self.__alice__ = myAiml()  # 创建机器人alice对象
self.__alice__.learn('startup.xml')  # 加载startup.xml
self.__alice__.respond('这里是目录')  # 加载目录下的语料库

跟正常一样调用。

  • 1
    点赞
  • 0
    收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
©️2022 CSDN 皮肤主题:大白 设计师:CSDN官方博客 返回首页
评论

打赏作者

lianganton

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值