#淘宝店铺详情 抓取4张图片 import requests from bs4 import BeautifulSoup from lxml import etree headers = { "accept":"text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01", "x-requested-with":"XMLHttpRequest", "accept-encoding":"gzip, deflate, br" } def get_picture(shop_id): ''' :param shop_id_list:接受一个店铺id :return: 返回一个列表,每个列表里面为该店铺的4张宝贝图片 ''' # print('店铺id为:',shop_id) # for shop_id in shop_id_list: url = 'https://shop%s.taobao.com/asynSearch.htm?orderType=hotsell_desc&search=y&path=/search.htm'%shop_id response = requests.get(url, headers=headers) html = etree.HTML(response.text) html_data = html.xpath('//*[@class="J_TModule"]/div/div[2]/div/div[1]/ul') # pict_list = [] for li in html_data: pic = li.xp
用xpath爬取html页面
最新推荐文章于 2024-07-31 00:53:52 发布