python list转json_我用了100行Python代码，实现了与女神尬聊微信（附代码）

最新推荐文章于 2024-08-04 19:58:19 发布

weixin_39952182

最新推荐文章于 2024-08-04 19:58:19 发布

阅读量65

点赞数

文章标签： python list转json 关键路径算法python

本文介绍了如何使用Python开发一个自动聊天机器人，主要涉及的库包括pickle、json、jieba和gensim。通过加载和训练对话数据，建立TF-IDF模型和索引，实现了输入问题后返回预测答案的功能。此外，还提供了读写文件的工具函数，便于定制对话内容。代码已给出，可以直接运行。

摘要由CSDN通过智能技术生成

朋友圈很多人都想学python，有一个很重要的原因是它非常适合入门。对于人工智能算法的开发，python有其他编程语言所没有的独特优势，代码量少，开发者只需把精力集中在算法研究上面。

本文介绍一个用python开发的，自动与美女尬聊的小软件。以下都是满满的干货，是我工作之余时写的，经过不断优化，现在分享给大家。那现在就让我们抓紧时间开始吧！

准备：

编程工具IDE：pycharm

python版本： 3.6.0

首先新建一个py文件，命名为：ai_chat.py

PS：以下五步的代码直接复制到单个py文件里面就可以直接运行。为了让读者方便写代码，我把代码都贴出来了，但是排版存在问题，我又把在pycharm的代码排版给截图出来。

第一步：引入关键包

# encoding:utf-8import jsonimport jiebaimport picklefrom gensim import corpora, models, similaritiesfrom os.path import existsfrom warnings import filterwarningsfilterwarnings('ignore')  # 不打印警告

简单介绍一下上面几个包的作用： pickle 包是用来对数据序列化存文件、反序列化读取文件，是人类不可读的，但是计算机去读取时速度超快。(就是用记事本打开是乱码)。而json包是一种文本序列化，是人类可读的，方便你对其进行修改(记事本打开，可以看到里面所有内容，而且都认识。) gensim 包是自然语言处理的其中一个python包，简单容易使用，是入门NLP算法必用的一个python包。jieba包是用来分词，对于算法大咖来说效果一般般，但是它的速度非常快，适合入门使用。

关键包代码的排版

以上这些包，不是关键，学习的时候，可以先跳过。等理解整个程序流程后，可以一个一个包有针对性地去看文档。

第二步：静态配置

这里path指的是对话语料(训练数据)存放的位置，model_path是模型存储的路径。

class CONF:    path = '对话语料.json'          # 语料路径    model_path = '对话模型.pk'      # 模型路径

这里是个人编程的习惯，我习惯把一些配置，例如：文件路径、模型存放路径、模型参数统一放在一个类中。当然，实际项目开发的时候，是用config 文件存放，不会直接写在代码里，这里为了演示方便，就写在一起，也方便运行。

配置类排版

第三步：编写一个类，实现导数据、模型训练、对话预测一体化

首次运行的时候，会从静态配置中读取训练数据的路径，读取数据，进行训练，并把训练好的模型存储到指定的模型路径。后续运行，是直接导入模型，就不用再次训练了。

class Model:    def __init__(self, question, answer, dictionary, tfidf, index):        self.dictionary = dictionary    # 字典        self.tfidf = tfidf              # 词袋模型转tfidf        self.index = index              # 稀疏矩阵建立索引        self.question = question        # 语料--问题数组        self.answer = answer            # 语料--答案数组(与问题一一对应)    """模型初始化"""    @classmethod    def initialize(cls, config):        if exists(config.model_path):            # 模型读取            question, answer, dictionary, tfidf, index = cls.__load_model(config.model_path)        else:            # 语料读取            if exists(config.path):                data = load_json(config.path)            else:                data = get_data(config.path)            # 模型训练            question, answer, dictionary, tfidf, index = cls.__train_model(data)            # 模型保存            cls.__save_model(config.model_path, question, answer, dictionary, tfidf, index)        return cls(question, answer, dictionary, tfidf, index)    @staticmethod    def __train_model(data):        """训练模型"""        # 划分问题和答案        question_list = []        answer_list = []        for line in data:            question_list.append(line['question'])            answer_list.append(line['answer'])        # 对问题进行分词        qcut = []        for i in question_list:            data1 = ""            this_data = jieba.cut(i)            for item in this_data:                data1 += item + " "            qcut.append(data1)        docs = qcut        # 将二维数组转为字典        tall = [[w1 for w1 in doc.split()] for doc in docs]        dictionary = corpora.Dictionary(tall)        # # gensim的doc2bow实现词袋模型        corpus = [dictionary.doc2bow(text) for text in tall]        # corpus是一个返回bow向量的迭代器。下面代码将完成对corpus中出现的每一个特征的IDF值的统计工作        tfidf = models.TfidfModel(corpus)        # 通过token2id得到特征数        num = len(dictionary.token2id.keys())        # 稀疏矩阵相似度，从而建立索引        index = similarities.SparseMatrixSimilarity(tfidf[corpus], num_features=num)        return question_list, answer_list, dictionary, tfidf, index    @staticmethod    def __save_model(model_path, question, answer, dictionary, tfidf, index):        """模型的保存"""        model = {}        model['question'] = question        model['answer'] = answer        model['dictionary'] = dictionary        model['tfidf'] = tfidf        model['index'] = index        with open(model_path, "wb") as fh:            pickle.dump(model, fh)    @staticmethod    def __load_model(model_path):        """模型的保存"""        with open(model_path, "rb") as fh:            model = pickle.load(fh)        question = model['question']        answer = model['answer']        dictionary = model['dictionary']        tfidf = model['tfidf']        index = model['index']        return question, answer, dictionary, tfidf, index    def get_answer(self, question, digalog_id = 1):        """获取问题的答案"""        # 对输入的问题进行分词        data3 = jieba.cut(question)        data31 = ""        for item in data3:            data31 += item + " "        new_doc = data31        # 计算该问题的答案        new_vec = self.dictionary.doc2bow(new_doc.split())        sim = self.index[self.tfidf[new_vec]]        position = sim.argsort()[-1]        answer = self.answer[position]        return answer, digalog_id

对于model类，我们一个一个来介绍。

initialize() 函数和 __init__() 函数是对象初始化和实例化，其中包括基本参数的赋值、模型的导入、模型的训练、模型的保存、最后返回用户一个对象。

initialize() 函数和 __init__() 函数排版

__train_model() 函数，对问题进行分词，使用gesim实现词袋模型，统计每个特征的tf-idf, 建立稀疏矩阵，进而建立索引。

__train_model() 函数排版

__save_model() 函数和 __load_model() 函数是成对出现的，很多项目都会有这两个函数，用于保存模型和导入模型。不同的是，本项目用的是文件存储的方式，实际上线用的是数据库

__save_model() 函数和 __load_model() 函数排版

get_answer() 函数使用训练好的模型，对问题进行分析，最终把预测的回答内容反馈给用户。

get_answer() 函数排版

第四步：写三个工具类型的函数，作为读写文件。

其中，获取对话材料，可以自主修改对话内容，作为机器的训练的数据。我这里只是给了几个简单的对话语料，实际上线的项目，需要大量的语料来训练，这样对话内容才饱满。

def load_json(filename, encoding='utf-8'):    """ 读取json数据"""    filename = filename    with open(filename, encoding=encoding) as file_obj:        rnt = json.load(file_obj)    return rnt['data']def save_json(filename, data, encoding='utf-8'):    """保存json"""    with open(filename, 'w', encoding=encoding) as file_obj:        json.dump({"data": data}, file_obj, ensure_ascii=False)def get_data(filename):    """获取对话材"""    # question_list 与 answer_list 一一对应    question_list = ["在吗？", "在干嘛？", "我饿了", "我想看电影。"]    answer_list = ["亲，在的。", "在想你呀！", "来我家，做饭给你吃~", "来我家，我家有30寸大电视。"]    data = []    for question, answer in zip(question_list, answer_list):        data.append({'question': question, "answer":answer})    save_json(filename, data)    return data

这三个工具函数，相对比较简单一些。其中 get_data() 函数，里面的数据是我自己编的，大家可以根据自己的习惯，添加自己的对话数据，这样最终训练的模型，对话方式会更贴近自己的说话方式。

三个工具类的排版

第五步：调用模型，进行对话预测

if __name__ == '__main__':    model = Model.initialize(config=CONF)    question_list = ["在吗？", "在干嘛？", "我饿了", "我肚子饿了", "我肚子好饿", "有好看电影介绍吗？我想看"]    for line in question_list:        rnt, digalog_id = model.get_answer(line)        print("033[031m女神：%s033[0m" % line)        print("033[036m尬聊：%s033[0m" % rnt)

主函数main()，就是你整个程序运行的起点，它控制着所有步骤。