Python爬虫爬取糗事百科段子

最新推荐文章于 2020-11-18 12:03:49 发布

AceFreeze

最新推荐文章于 2020-11-18 12:03:49 发布

阅读量309

点赞数

分类专栏： Python 文章标签： python 爬虫正则表达式糗事百科

本文链接：https://blog.csdn.net/AceFreeze/article/details/78784176

版权

Python 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

使用Python urllib爬取糗事百科段子
import re
import urllib.request

def getcontent(url,page):
    #模拟成浏览器
    headers = ('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64; rv:57.0) Gecko/20100101 Firefox/57.0')
    opener = urllib.request.build_opener()
    opener.addheaders = [headers]
    #将opener安装为全局
    urllib.request.install_opener(opener)
    data = urllib.request.urlopen(url).read().decode("utf-8")
    #提取段子 正则表达式
    pat_content = '<div class="content">.*?<span>(.*?)</span>'
    #内容列表
    contentlist = re.compile(pat_content,re.S).findall(data)
    for content in contentlist:
        content=content.replace('\n','')
        print(content + '\n\n')


for i in range(1,2):
    url = 'http://www.qiushibaike.com/8hr/page/'+str(i)
    getcontent(url,i)

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

AceFreeze

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python爬虫爬取糗事百科段子

使用Python urllib爬取糗事百科段子import reimport urllib.requestdef getcontent(url,page): #模拟成浏览器 headers = ('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64; rv:57.0) Gecko/20100101 Firefox/57.0')
复制链接

扫一扫