爬取糗事百科的笑话，每按一下Enter键，输出一条信息

本文链接：https://blog.csdn.net/chao_qing/article/details/77986887

这是一个使用Python编写的简单爬虫程序，用于抓取糗事百科网站上的热门笑话内容。程序通过requests和BeautifulSoup库获取网页信息，并使用正则表达式进行数据清洗，最后将笑话的作者、内容、好笑数及评论数展示出来。

摘要由CSDN通过智能技术生成

这是一个简单的爬虫，爬取糗事百科的笑话，每按一下Enter键，输出一条信息。

'''
author:superWang
date:2017-09-14
re模板：2.2.1
requests模板：2.18.4
bs4模板：4.6.0
爬取糗事百科的笑话
'''

#!/usr/bin/env python
#-*- coding:utf-8 -*-

import requests
from bs4 import BeautifulSoup
import re

class GetQiuShiBaiKeInfo():

    #得到url中的信息
    def getPageInfo(self,url,page):
        res = requests.get(url)
        #print(res.text)
        soup = BeautifulSoup(res.text,"html5lib")
        articles = soup.select(".article")
        for article in articles:
            #得到作者信息
            author = article.select(".author")[0].select("h2")[0].text.strip()
            #得到内容
            content = article.select(".content span")[0].text.strip()
            #得到好笑数
            stats_vote = article.select(".stats .stats-vote i")[0].text
            #得到评论数
            stats_comments = article.select(".stats .stats-comments a i")[0].text
            print(author + "\t内容:" + content + "\t好笑:" + stats_vote + "\t评论:" + stats_comments)
            #循环直到按下Enter键时才输出
            while True:
                input1 = input()
                if input1 == '':
                    break
        print("这一页已看完，需要继续看下一页吗？y：是，n：否")
        while True:
            input2 = input()
            if input2 == 'y':
                page = page +1
                self.getInfo(page)
            elif input2 == 'n':
                break
            else:
                pass

    #得到page页的信息
    def getInfo(self,page):
        url = "https://www.qiushibaike.com/hot/page/"+str(page)+"/"
        self.getPageInfo(url,page)


if __name__ == '__main__':
    a = GetQiuShiBaiKeInfo()
    a.getInfo(1)

效果图：