Python 淘宝商品价格爬取

最新推荐文章于 2024-05-15 02:24:53 发布

雪急飞绪

最新推荐文章于 2024-05-15 02:24:53 发布

阅读量1.8k

点赞数 2

分类专栏： python

本文链接：https://blog.csdn.net/qq_38689395/article/details/106314224

版权

python 专栏收录该内容

26 篇文章 4 订阅

订阅专栏

登录淘宝，进入搜索页，F12
选择Network，刷新一下，找到最上方以search？开头的文件，右键
选择copy，copy as cURL（bush）
在https://curl.trillworks.com/，将上一步复制的内容粘贴到curl command窗口
复制右侧的headers内容，在程序中用以变量header保存，作为参数传给 requests.get(url，headers=header)

import requests
import re 

headers = {
    'cookie': '', 
    'User-Agent': 'Mozilla/5.0',
}

def getHtmlText(url):
    try:
        r = requests.get(url,timeout=30,headers=headers)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        print("获取失败\n")

def parsePage(ilt,html):
    try:
        #raw_title "view_price":"1780.00"
        plt = re.findall(r'\"view_price\"\:\"[\d\.]*\"',html)
        tlt = re.findall(r'\"raw_title\"\:\".*?\"',html)
        for i in range(len(plt)):
            price = eval(plt[i].split(':')[1])
            title = eval(tlt[i].split(':')[1])
            ilt.append([price,title])
    except:
        print("解析失败\n")

def printGoodsList(ilt):
    tplt = "{:4}\t{:8}\t{:16}"
    print(tplt.format("序号","价格","商品名称"))
    count = 0
    for g in ilt:
        count += 1
        print(tplt.format(count,g[0],g[1]))

def main():
    goods = '沙发'
    depth = 3
    start_url = 'https://s.taobao.com/search?q=' + goods
    infoList = []
    for i in range(depth):
        try:
            url = start_url + '&s=' + str(44*i)
            html = getHtmlText(url)
            parsePage(infoList,html)
        except:
            continue
    printGoodsList(infoList)

main()

雪急飞绪

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
Python 淘宝商品价格爬取

登录淘宝，进入搜索页，F12选择Network，刷新一下，找到最上方以search？开头的文件，右键选择copy，copy as cURL（bush）在https://curl.trillworks.com/，将上一步复制的内容粘贴到curl command窗口复制右侧的headers内容，在程序中用以变量header保存，作为参数传给 requests.get(url，headers=header)import requestsimport re headers = { '.
复制链接

扫一扫