玩转睿思二：生成回帖信息

最新推荐文章于 2025-04-17 11:42:57 发布

hhjhh76

最新推荐文章于 2025-04-17 11:42:57 发布

阅读量553

点赞数

分类专栏： python 文章标签：自动回复帖子 python自动回帖

本文链接：https://blog.csdn.net/hhjhh76/article/details/85616941

版权

python 专栏收录该内容

6 篇文章

订阅专栏

睿思是个很好的平台，给西电er在科研之余带来了很多的乐趣，在水睿思的时候就想能不能写个程序自动回复帖子，于是就有了玩转睿思系列博客。睿思自动回帖共包括三部分：
玩转睿思一：模拟浏览器回帖
玩转睿思二：生成回帖信息
玩转睿思三：自动有选择地回帖

在 ‘玩转睿思一‘ 中，回帖的信息是人工编辑的，能不能自动生成回帖信息？本文给出两种自动生成回帖信息的方法

一.使用图灵机器人获得回帖信息

图灵机器人提供了免费的机器人对话接口。贴子的标题精炼简洁，包括了贴子的主要信息，把贴子的标题送入机器人，获得的机器人回复作为贴子的回复信息。

1.获取帖子的标题

贴子的标题可以在帖子页面的源码中：

<title>这个怎么参考啊 - 西电睿思灌水专区 -  西电睿思BBS -  Powered by Discuz!</title>

获取代码如下：

def get_title(htmlText):
    soup = BeautifulSoup(htmlText,features="html5lib")
    line = soup.find('title')

    if not line:
        print('get title error')
        return None

    title = line.get_text().split('-')[0].strip()
    return title

2.获得机器人回复

图灵机器人官网推荐新的API，但是之前的API还是能用，在使用图灵机器人旧API时要提供三个数据：key：机器人的标识；info：消息内容；‘userid’：自己定义。用户注册机器人的时候会生成key。

def get_automessage(message):
    tulingUrl='http://www.tuling123.com/openapi/api'
    data = {
        'key'   :'',
        'info'  :message,
        'userid':'ruisireply-robot'
    }#使用自己的机器人key
    data['info']=message
    getmessage=requests.post(tulingUrl,data=data).json()
    return getmessage

下面使用机器人的回复消息回复帖子

#使用图灵机器人来进行回复   
def reply_tuling(message,formhash,fid,tid,cookie,userAgent):  
    replyMes = get_automessage(message)['text']
    rText = my_post(replyMes,formhash,fid,tid,cookie,userAgent)
    return rText

二.使用帖子已有信息生成回复

使用机器人API可以获得不错的回复信息，但是机器人的智能程度有限，和人的回复差别还是很大。睿思的版规很严格，因此帖子中别人的回复肯定是合理的。首先随机选一条帖子已有的回复，然后生成和这条回复意思相近的句子作为帖子的回复。

1.获取帖子已有的回复

帖子每层的回复信息位于class="t_f"的td标签文本中

<td class="t_f" id="postmessage_25179073">
沙发是我的<img src="static/image/smiley/jgz/jgz088.png" smilieid="754" border="0" alt="" /></td>

messageList返回每一楼的回复，如果htmlText为帖子首页的内容，则htmlText[0]为1楼的信息，即楼主所发的信息

def get_replyed_message(htmlText):
    soup = BeautifulSoup(htmlText,features="html5lib")
    lines = soup.find_all('td', class_='t_f')

    if not lines:
        print('get replyed message error')
        return None
        
    messageList = []
    for lin in lines:
        message = lin.get_text()
        messageList.append(message)

    return messageList

上面代码获取的是单页面htmlText中的用户回复信息，睿思ID号"shnmng"的同学给出了获取帖子所有用户回复信息的方法：首先获取帖子的总页面数，接着根据帖子tid和页面数构造url获取每个页面的内容，从而获取所有的回复

#获取帖子总页数
def get_pages(htmlText):
    '''
    htmlText:帖子首页的内容
    '''
    soup = BeautifulSoup(htmlText,features="html5lib")
    pgs = soup.find(id='pgt')
    if  not pgs:
        return 1
    pgs = pgs.find('span')
    if  not pgs:
        return 1
    pgs = pgs.get_text().strip() 

    pgs = re.search("\d+",pgs)
    if not pgs :
        return 1
    pgs= int(pgs.group())
    return pgs

#获取帖子所有的回复
def get_all_replyed_message(tid,cookie,userAgent):
    url = 'http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid='+str(tid)
    htmlText = get_html(url,cookie,userAgent)
    pgs = get_pages(htmlText)
    messageList = []
    for pg in range(1,pgs+1):
        pg_url= 'http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid={TID}&page={PAGE}'.format(TID=tid,PAGE=pg)
        pg_htmlText = get_html(pg_url,cookie,userAgent)
        pg_messageList = get_replyed_message(pg_htmlText)
        if  pg_htmlText:
            messageList.extend(pg_messageList)
    return messageList

2.生成相似句子

生成和mes意思相近的句子：对mes进行分词，然后找每个词的近义词，从中找到词义最相近的一组词，进行相互替换。jieba提供了汉语分词的函数，synonyms可用于找到词组的近义词，synonyms.nearby(w)返回(word, score)列表，word为w的近义词，score为词义相近程度。

def message_similar(mes):
    wordList = list(jieba.cut(mes))   #也可以使用sysnonyms.seg

    replace_index = 0
    replace_p = 0
    replace_word = ''
    for i in range(len(wordList)):
        near_w,near_p = synonyms.nearby(wordList[i])
        if not near_p:
            continue
        else:
            for j in range(len(near_w)):
                if near_p[j] == 1.0:
                    continue
                else:
                    if replace_p < near_p[j]:
                        replace_p = near_p[j]
                        replace_index = i
                        replace_word = near_w[j]

    if  replace_index:
        wordList[replace_index] = replace_word
    return ''.join(wordList)

3.利用相似句子进行回复

从messageList中随机选择一句话并用和这句话意思相近的词进行回复

def reply_random_similar(messageList,formhash,fid,tid,cookie,userAgent):
    if not messageList:
        print('messageList is error')
        return None

    message = messageList[random.randint(0,len(messageList)-1)].strip() #返回的随机数包括后面那个值
    message = message_similar(message)
    
    rText = my_post(message,formhash,fid,tid,cookie,userAgent)
    return rText

三.实验

将需要用到的函数都放到 AutoReply.py

import requests
import re
import time
from bs4 import BeautifulSoup
import random
import jieba
import synonyms

def get_html(url,cookie,userAgent):
	pass
def get_info(info_type,htmlText):
	pass
def my_post(message,formhash,fid,tid,cookie,userAgent):
	pass
def get_title(htmlText):
    pass
def get_automessage(message):
    pass
def reply_tuling(message,formhash,fid,tid,cookie,userAgent):  
    pass
def get_replyed_message(htmlText):
    pass
def message_similar(mes):
	pass
def reply_random_similar(messageList,formhash,fid,tid,cookie,userAgent):
    pass

<1>使用图灵机器人回复

import AutoReply

cookie = ''
userAgent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'

tid = 982549
url = 'http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid={tid}'.format(tid = tid)

htmlText = AutoReply.get_html(url,cookie,userAgent)
fid = AutoReply.get_info('fid',htmlText)
formhash = AutoReply.get_info('formhash',htmlText)
title = AutoReply.get_title(htmlText)

AutoReply.reply_tuling(title,formhash,fid,tid,cookie,userAgent)

对id为982549的帖子回复成功，帖子标题为：“【公告】喜迎2019全站Free活动”，图灵机器人给出的回复为“我读书少，不知道你在说什么。”
在这里插入图片描述

<2>使用帖子已有信息生成回复

import AutoReply

cookie = ''
userAgent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'

tid = 982549
url = 'http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid={tid}'.format(tid = tid)

htmlText = AutoReply.get_html(url,cookie,userAgent)
fid = AutoReply.get_info('fid',htmlText)
formhash = AutoReply.get_info('formhash',htmlText)
messageList = AutoReply.get_replyed_message(htmlText)

AutoReply.reply_random_similar(messageList[1:],formhash,fid,tid,cookie,userAgent)