B站上的爬取淘宝代码-python

最新推荐文章于 2024-05-01 21:42:26 发布

「已注销」

最新推荐文章于 2024-05-01 21:42:26 发布

阅读量574

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/u013059141/article/details/107452453

版权

本文介绍了使用Python的requests和re库爬取淘宝网站的方法。由于淘宝存在反爬机制，需要在请求头中设置登录信息。主要涉及的函数包括获取网页内容的getHTMLText()，解析页面的parsePage()以及格式化输出商品信息的printGoodsList()。在主函数main()中，通过设置深度参数deepth来控制爬取的页面数量。

摘要由CSDN通过智能技术生成

基于requests库和re库，编写的爬取淘宝网站。

import re
import requests
def getHTMLText(url):
    try:
        header = {
            'authority': 's.taobao.com',
            'cache-control': 'max-age=0',
            'upgrade-insecure-requests': '1',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
            'sec-fetch-site': 'same-origin',
            'sec-fetch-mode': 'navigate',
            'sec-fetch-user': '?1',
            'sec-fetch-dest': 'document',
            'referer': 'https://www.taobao.com/',
            'acce

最低0.47元/天解锁文章

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
B站上的爬取淘宝代码-python

基于requests库和re库，编写的爬取淘宝网站。import reimport requestsdef getHTMLText(url): try: header = { 'authority': 's.taobao.com', 'cache-control': 'max-age=0', 'upgrade-insecure-requests': '1', 'user-agen
复制链接

扫一扫