全网最新！最简单版本python爬取淘宝商品信息代码（需手动登陆）

ZERWW

于 2024-07-28 16:52:45 发布

阅读量338

点赞数 9

分类专栏： python 爬虫数据分析文章标签： python 爬虫数据分析

本文链接：https://blog.csdn.net/ZERWW/article/details/140752507

版权

python 同时被 3 个专栏收录

3 篇文章 1 订阅

订阅专栏

爬虫

1 篇文章 0 订阅

订阅专栏

数据分析

1 篇文章 0 订阅

订阅专栏

本人最近写课程设计报告，主题是关于爬虫数据分析的，看了一圈网上关于爬虫爬取淘宝商品信息的代码，发现很多都已经过时了，无法适用于现在最近的淘宝网页，于是自己重写了一份代码分享给大家。

完整代码在文章最后，麻烦大家点赞收藏评论支持。

1.打开网页

这一步是最简单的，就采用最简单的谷歌驱动完成。先进入到淘宝网页，自动跳转到登陆页面，这里使用扫码登陆，比较方便。在该页面停留30秒后开始下滑加载页面剩余内容（这一部分可以自己更改停留秒数）。

如果要翻页，修改网址中的page参数，加个for循环就行。

# 获取用户输入的商品名
search_keyword = input("请输入要搜索的商品名称：")
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(f'https://s.taobao.com/search?commend=all&localImgKey=&page=1&q={search_keyword}&tab=all')
time.sleep(30)
html = driver.page_source

scroll_height = 0
while scroll_height < 10000:
    driver.execute_script("window.scrollBy(0, 200);")
    scroll_height += 200
    time.sleep(0.5)

html = driver.page_source

2.爬取网页信息

先用谷歌浏览器的css工具查看了所要爬取内容的Xpath之类的参数，然后再修改代码中item.find()中对应的参数，这里参考了现成的代码进行修改。使用for循环将放到数组里，方便后续导出。（price分为整数和小数两部分，淘宝页面就是分开搞的，满奇怪）

要注意！淘宝的页面参数是会更改的，运行前一定要自己查看一下。

doc = pq(html)
items = doc(
    'div.PageContent--contentWrap--mep7AEm > div.LeftLay--leftWrap--xBQipVc > div.LeftLay--leftContent--AMmPNfB > div.Content--content--sgSCZ12 > div > div').items()

data = []
for item in items:
    title = item.find('.Title--title--jCOPvpf').text()
    price_int = item.find('.Price--priceInt--ZlsSi_M').text()
    price_float = item.find('.Price--priceFloat--h2RR0RK').text()
    if price_int and price_float:
        price = float(f"{price_int}{price_float}")
    else:
        price = 0.0
    deal = item.find('.Price--realSales--FhTZc7U').text()
    shop = item.find('.ShopInfo--shopName--rg6mGmy').text()

    data.append({'title': title, 'price': price, 'deal': deal, 'shop': shop})

3.将数据导出为xlsx文件

这一部分没什么好说的，将数据先变成数据字典格式，再导出为xlsx格式。

df = pd.DataFrame(data)
df.to_excel(f'{search_keyword}_products.xlsx', index=False)

driver.quit()

附录（完整代码）

from pyquery import PyQuery as pq
import time
from selenium import webdriver
import pandas as pd

# 获取用户输入的商品名
search_keyword = input("请输入要搜索的商品名称：")
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(f'https://s.taobao.com/search?commend=all&localImgKey=&page=1&q={search_keyword}&tab=all')
time.sleep(30)
html = driver.page_source

scroll_height = 0
while scroll_height < 10000:
    driver.execute_script("window.scrollBy(0, 200);")
    scroll_height += 200
    time.sleep(0.5)

html = driver.page_source

doc = pq(html)
items = doc(
    'div.PageContent--contentWrap--mep7AEm > div.LeftLay--leftWrap--xBQipVc > div.LeftLay--leftContent--AMmPNfB > div.Content--content--sgSCZ12 > div > div').items()

data = []
for item in items:
    title = item.find('.Title--title--jCOPvpf').text()
    price_int = item.find('.Price--priceInt--ZlsSi_M').text()
    price_float = item.find('.Price--priceFloat--h2RR0RK').text()
    if price_int and price_float:
        price = float(f"{price_int}{price_float}")
    else:
        price = 0.0
    deal = item.find('.Price--realSales--FhTZc7U').text()
    shop = item.find('.ShopInfo--shopName--rg6mGmy').text()

    data.append({'title': title, 'price': price, 'deal': deal, 'shop': shop})

df = pd.DataFrame(data)
df.to_excel(f'{search_keyword}_products.xlsx', index=False)

driver.quit()

辛苦看到这里的小伙伴给我点点赞。本人第一次独立完成爬虫代码，实属不易。

ZERWW

关注

9
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
全网最新！最简单版本python爬取淘宝商品信息代码（需手动登陆）

本人最近写课程设计报告，主题是关于爬虫数据分析的，看了一圈网上关于爬虫爬取淘宝商品信息的代码，发现很多都已经过时了，无法适用于现在最近的淘宝网页，于是自己重写了一份代码分享给大家。
复制链接

扫一扫

专栏目录