python3+bs4爬取糗事百科热门数据

最新推荐文章于 2020-11-17 17:38:34 发布

出走半生归来仍是少年

最新推荐文章于 2020-11-17 17:38:34 发布

阅读量312

点赞数

分类专栏： python爬虫

python爬虫专栏收录该内容

3 篇文章 8 订阅

订阅专栏

import urllib.request
import re
from bs4 import BeautifulSoup

page = 1
url = 'http://www.qiushibaike.com/hot/page/' + str(page)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
try:
    req = urllib.request.Request(url = url, headers = headers)
    response = urllib.request.urlopen(req)
    content = response.read().decode('utf-8')
    bsObj = BeautifulSoup(content,"lxml")
    items = bsObj.findAll(class_ = "content")
    i=0
    for item in items:
        #pattern = re.compile('<div .*?content"><span>(.*?)</span></div>', re.S)
        #it = re.search(pattern, str(item))
        it = item.find('span').text
        i = i+1
        print(str(i)+'.'+(it.strip('\n')))
except urllib.error.URLError as e:
    if hasattr(e,"code"):
        print(e.code)
    if hasattr(e,"reason"):
        print(e.reason)

出走半生归来仍是少年

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
python3+bs4爬取糗事百科热门数据

import urllib.requestimport refrom bs4 import BeautifulSouppage = 1url = 'http://www.qiushibaike.com/hot/page/' + str(page)user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'headers =...
复制链接

扫一扫

专栏目录