爬取百度贴吧

最新推荐文章于 2023-12-26 16:49:36 发布

半日闲12138

最新推荐文章于 2023-12-26 16:49:36 发布

阅读量234

点赞数

分类专栏：爬虫文章标签：爬虫学习

本文链接：https://blog.csdn.net/feiYu12138/article/details/102616099

版权

爬虫专栏收录该内容

17 篇文章 0 订阅

订阅专栏

#coding: utf-8
import urllib.request
import urllib.parse
import time

# http://tieba.baidu.com/f?kw=python&ie=utf-8&pn=0
# http://tieba.baidu.com/f?kw=python&ie=utf-8&pn=50
# http://tieba.baidu.com/f?kw=python&ie=utf-8&pn=100
url = "http://tieba.baidu.com/f?"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36',
}

kw = input("请输入关键字：")
start = int(input("请输入起始页码："))
end = int(input("请输入结束页码："))

for page in range(start, end+1):
    print("开始下载第%d页" % page)
    pn = (page-1)*50
    data = {
        "kw": kw,
        "ie": "utf-8",
        "pn": pn
    }
    url_query = urllib.parse.urlencode(data)
    # print(url_query)
    url = url + url_query

    request = urllib.request.Request(url=url, headers=headers)

    response = urllib.request.urlopen(request)
    content = response.read().decode("utf-8")
    # print(type(content))  # <class 'str'>

    fileFile = "%s第%d页.html" % (kw, page)

    with open(fileFile, "w", encoding="utf-8")as fp:
        fp.write(content)

    print("结束下载%d页" % page)

报错：TypeError: a bytes-like object is required, not 'str'
解决：
将
 with open(fileFile, "wb")as fp:
        fp.write(content)
改成w写入，如上所示。

半日闲12138

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬取百度贴吧

#coding: utf-8import urllib.requestimport urllib.parseimport time# http://tieba.baidu.com/f?kw=python&ie=utf-8&pn=0# http://tieba.baidu.com/f?kw=python&ie=utf-8&pn=50# http://ti...
复制链接

扫一扫