上篇我们只爬了一个牌子的文胸,这次我们来多爬几个牌子的
1.爬取不同牌子的url
得到id
其实可以直接爬那个href,但我发现有的带了https有的没带就索性直接取id拼接了
import requests
import json
import threading
import time
import re
from lxml import etree
class cup:
def __init__(self):
self.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'}
self.url = 'https://item.jd.com/'
def vari_cpu(self):#取到不同类型文胸的评论json
url_list = []
url = 'https://search.jd.com/Search?keyword=%E6%96%87%E8%83%B8&enc=utf-8&spm=2.1.1'
html = requests.get(url,headers = self.headers).text
html = etree.HTML(html)
cpu_link = html.xpath('//div[@class="p-ico