设计思路:
进入京东首页;
搜索关键字;
进入商品页面;
抓当前页面的商品信息;
点击下一页;
重复步骤4,步骤5;
到最后一页结束爬取。
from selenium import webdriver
import time, random
driver = webdriver.Edge()
url = 'https://www.jd.com/'
driver.get(url)
tb_input = driver.find_element_by_css_selector('#key')
search_btn = driver.find_element_by_css_selector('.button')
tb_input.send_keys('手机')
time.sleep(2)
search_btn.click()
time.sleep(2)
for page in range(5):
driver.execute_script('window.scrollTo(0,document.body.scrollHeight);')
time.sleep(random.random()+1)
ls = driver.find_element_by_css_selector('.gl-item')
for info in ls:
title = info.find_element_by_css_selector('.p-name.p-name-type-2 a').text.strip()
print('title:', title)
price = info.find_element_by_css_selector('div.p-price > strong > i').text.strip()
print('price:', price)
shop = info.find_element_by_css_selector('span.J_im_icon > a').text.strip()
print('shop:', shop)
comments = info.find_element_by_css_selector('div.p-commit > strong > a').text.strip()
print('comments:', comments)
print('='*200)
with open('./jd.txt',mode='a',encoding='utf-8') as fp:
fp.write(f'商品名:{title},价格:{price},店铺名:{shop},销量:{comments}')
time.sleep(random.random()*2)
btn_next = driver.find_element_by_css_selector('a.pn-next')
btn_next.click()
driver.close()
driver.quit()
有个问题,在于你必须提前登录京东,浏览器要有账户信息的缓存,否则就会停留在登录界面,无法继续。