python爬虫 -12- splash(简单介绍和爬取京东)

文档

  👉 官方文档

安装

docker pull scrapinghub/splash
docker run -it -d  -p 8050:8050 --rm scrapinghub/splash

使用

  1. 在浏览器输入ip+host,并请求京东

  2. 可以看到

  3. 输入http://localhost:8050/render.html?url=https://search.jd.com/Search?keyword=%E5%B0%8F%E7%B1%B310&enc=utf-8&suggest=1.def.0.V08–38s0&wq=%E5%B0%8F%E7%B1%B3&pvid=c18d37ab55764cc4ac71e124bc496035


cmd使用

  1. curl "http://codekiller.top:8050/render.html?url=https://search.jd.com/Search?keyword=%E5%B0%8F%E7%B1%B310&enc=utf-8&suggest=1.def.0.V08--38s0&wq=%E5%B0%8F%E7%B1%B3&pvid=c18d37ab55764cc4ac71e124bc496035" -o 小米.html
    
  2. 打开htm文件

  3. 操作(获取所有价格)

    from lxml import etree
    
    file = open('C:\\Users\\MyPC\\小米.html', "r", encoding="UTF-8")
    text = file.read()
    selector = etree.HTML(text)
    prices = selector.xpath("//div[@class='p-price']/strong/i/text()")
    print(prices)
    

爬取京东

from urllib.parse import urlparse, urlencode, quote
from lxml import etree
import requests

ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
headers = {"User-Agent": ua}

keyword = "小米"
params = dict(
    keyword=keyword,
    enc="utf-8",
    wq=keyword,
    pvid="57486a4adb40455dbba829de75133672"
)
query_string = "&".join(("%s=%s" % (K, V)) for K, V in params.items())
jd_url = "https://search.jd.com/Search?" + query_string
url = "http://codekiller.top:8050/render.html?url=" + quote(jd_url)

r = requests.get(url, headers=headers)

selector = etree.HTML(r.text)

price_list = selector.xpath("//div[@class='p-price']/strong/i/text()")
name_list = selector.xpath("//div[contains(@class,'p-name')]/a/em/text()")

for name, price in zip(price_list, name_list):
    print(name, price)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值