Python-天猫数据爬虫

最新推荐文章于 2024-06-25 19:48:46 发布

小果冻子呀

最新推荐文章于 2024-06-25 19:48:46 发布

阅读量1.3k

点赞数 1

本文链接：https://blog.csdn.net/hahaha_ilili/article/details/103993022

版权

本文详细介绍了如何使用Python进行网络爬虫，从天猫网站抓取商品信息，包括商品名称、价格、评价等关键数据。通过实例代码解析了爬虫的实现过程，涉及requests、BeautifulSoup等库的运用。

摘要由CSDN通过智能技术生成

import gevent
import gevent.monkey
gevent.monkey.patch_all()
import requests
from lxml import html
import re
import json
import csv


word_lists=["麦斯威尔"]
# key_word="资生堂"
for j in range(len(word_lists)):
# def main(word):
    url="https://list.tmall.com/search_product.htm?q={}".format(word_lists[j])
    print(url)
    # try:
    htmll=requests.get(url).text
    # print(html)
    etree = html.etree
    html1=etree.HTML(htmll)
    print(len(html1.xpath('//div[@class="product-iWrap"]')))
    print("++++++++++++++++++")
    for i in range(3):
    # for i in range(len(html1.xpath('//div[@class="product-iWrap"]'))):
        product_url=html1.xpath('//div[@class="product-iWrap"]//*[@class="productTitle" or contains(@class,"productTitle")]/a[1]/@href')[i]
        print(product_url)
        product_id=re.findall('[&\?]id=(.*?)&',product_url,re.S)[0]
        print(product_id)
        sell_id=re.findall('user_id=(.*?)&',product_url,re.S)[0]
        url1 = "https://detail.tmall.com/item.htm?id&#

最低0.47元/天解锁文章

小果冻子呀

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
Python-天猫数据爬虫

import geventimport gevent.monkeygevent.monkey.patch_all()import requestsfrom lxml import htmlimport reimport jsonimport csvword_lists=["麦斯威尔"]# key_word="资生堂"for j in range(len(word_lists...
复制链接

扫一扫