2.4 playwright 实战-爬取某宝商品信息

第四节:电商信息爬取项目实战项目

课程目标

  • 学习如何通过playwright完成某宝商品信息爬取

课程内容

编码实现
import json
from playwright.sync_api import sync_playwright
import time
from tqdm import tqdm
import pandas as pd
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    with open("cookies.json", 'r') as f:
        storage_state = json.load(f)
    page = browser.new_page()
    page.goto("https://uland.taobao.com/sem/tbsearch?localImgKey=&page=1&q=%E5%B7%A5%E8%A3%85%E8%A3%A4&tab=all")
    # time.sleep(5) # 写死等待
    page.wait_for_selector(".Card--doubleCardWrapper--L2XFE73")
    # page.mouse.wheel(0, 8000) # 将滚动条拉到最下保证数据加载 
    for i in range(10):
        page.mouse.wheel(0, 1000) 
        time.sleep(0.3)
    boxes = page.locator(".Card--doubleCardWrapper--L2XFE73").all()
    # box = boxes[0]
    goods_infos  = []
    for box in tqdm(boxes):
    #     # xxx处理逻辑
        title_item = box.locator(".Title--title--jCOPvpf") # 标题节点
        title = title_item.inner_text() # 标题
        img_item = box.locator(".MainPic--mainPic--rcLNaCv") # 商品图片节点
        img = img_item.get_attribute("src") # 商品图片链接
        price_int_item = box.locator(".Price--priceInt--ZlsSi_M")
        price_float_item = box.locator(".Price--priceFloat--h2RR0RK") 
        price_int = price_int_item.inner_text()
        price_float = price_float_item.inner_text()
        price = price_int + price_float
        price = float(price)
        city_items = box.locator(".Price--procity--_7Vt3mX").all() # 发货地点
        if len(city_items) == 2:
            father_city = city_items[0].inner_text()
            son_city = city_items[1].inner_text()
        else:
            father_city = ""
            son_city = city_items[0].inner_text()
        shop_name_item = box.locator(".ShopInfo--shopName--rg6mGmy")
        shop_name = shop_name_item.inner_text()
        goods_info = {
            "商店名称":shop_name,
            "发货省":father_city,
            "发货市":son_city,
            "价格":price,
            "商品图片":img,
            "商品标题":title,
        }
        goods_infos.append(goods_info)
    # print(goods_infos)
    df = pd.DataFrame(goods_infos) # [{},{},{}]
    df.to_excel("淘宝-工装裤.xlsx",index=False)

测试与调试
  • 测试
    • 测试程序的各个功能是否正常工作。
  • 调试
    • 根据测试结果,调整和优化代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值