python urllib模拟浏览器请求爬虫

最新推荐文章于 2023-07-03 21:50:36 发布

aspiring123

最新推荐文章于 2023-07-03 21:50:36 发布

阅读量1.2k

点赞数 1

分类专栏： Python 爬虫 Python 文章标签： urllib python 爬虫

本文链接：https://blog.csdn.net/qq_39198486/article/details/81502593

版权

本文详细介绍了如何利用Python的urllib库进行模拟浏览器请求，包括设置HTTP头、处理cookies和登录验证，以及在爬虫项目中的实际应用，帮助读者掌握网络数据抓取的基本技巧。

摘要由CSDN通过智能技术生成

import urllib.request
import random

url = "http://baike.baidu.com"


"""
方式1
# 模拟请求头
headers = {
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
    "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8",
}