Python爬虫爬一爬血氧仪

MSh_I

已于 2022-12-31 09:04:15 修改

阅读量393

点赞数

分类专栏： Python 文章标签： python 爬虫开发语言

于 2022-12-30 17:14:47 首次发布

本文链接：https://blog.csdn.net/shouqw/article/details/128498006

版权

Python 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

文章目录

策略
源码
结果
引用

奥密克戎比原先的设想要凶猛的多，抢退烧药，抢口罩，抢抗原，现在要抢血氧仪，实际上我是没抢上啥/(ㄒoㄒ)/~~，超前意识太单薄，但愿大家都能平稳度过这一关吧。
血氧仪现在也是抢破头，标称现货的一周发出来就不错了，算了，不抢了。用点爬虫技术爬一爬，看看都是啥。

策略

在平台主页上搜索一下血氧仪，马上就列出了一堆产品，很多页，通过分析网页地址找到规律，可以由python自动访问。商品信息的提取，这里还是简单使用字符串函数来处理，确实太low，效率也低。

源码

部分源码，隐去涉及的版权问题部分o(╥﹏╥)o

import requests
import re
import time

goods = '血氧仪'
pre_url = 'https://xxxxxxxxxx/Search?keyword=' + goods + '&qrst=1&wq=' + goods + '&stock=1&pvid=46acaecdac14432e93eb3cb00fe8abfd&cid3=12587&cid2=9197'
headers = {'User-Agent':
               'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0'}
for p in range(1, 5):
    url = pre_url + '&page=' + str(2 * p - 1) + '&s=' + str((p - 1) * 60 + 1) + '&click=0'
    html = requests.get(url, headers=headers)  # Get方式获取网页数据
    strHtml = html.text
#    print(strHtml)

    try:
        plt = re.findall(r'<em>￥</em><i data-price=.*?\.\d\d</i>', strHtml)  # 获取商品价格,搜索以<em>￥</em><i>开头，以.数字数字结尾的字符串
        for i in range(len(plt)):
            price = plt[i].split('>')[3].split('<')[0]
            goodId = plt[i].split('"')[1]
#            print(plt[i])
#            print(i)
#            print(price)
#            print(goodId)
            goodUrl = 'https://xxxxxxxx/'+str(goodId)+'.html'
            goodHtml = requests.get(goodUrl, headers=headers)  # Get方式获取网页数据
            infoGood = goodHtml.text
#            print(infoGood)
            tlt = re.findall(r'<div class="sku-name">\n.*?</div>',infoGood,re.S)
            name = tlt[0].split('\n')[1].split('<')[0].strip()
            fprt = "{:^5}\t{:^10}\t{:^20}\t{:^20}"  # 设定一个print模板,用大括号{}来定义槽函数
            print(fprt.format((p-1)*30+i,price,name,goodUrl))
            time.sleep(2)
#            print(tlt[i])
    #            price = plt[i].split('<i>')[1]
    #            title = tlt[i]
    #            infoList.append([price, title])  # append() 方法用于在列表末尾添加新的对象。
    except:  # 让程序不会因为异常执行而溢出
        print("分析异常")
    time.sleep(2)