淘宝爬取商品信息，并写入csv

最新推荐文章于 2021-12-15 20:06:14 发布

qq_42795281

最新推荐文章于 2021-12-15 20:06:14 发布

阅读量912

点赞数 2

本文链接：https://blog.csdn.net/qq_42795281/article/details/103560539

版权

淘宝爬取基本商品信息
有价格,商品名称,购买人数,店铺,地点
最后写入csv
整体爬取没什么，时间原因没有去爬评论和评分的详细信息，淘宝网站更新后需要浏览器的cookie，我这里就不展示我的cookie了，太长。本文写的时候借鉴了https://blog.csdn.net/holyjesus/article/details/100835712?utm_source=app。

下面是完整代码

import requests
import re
import numpy as np
#获取网页
def getHTMLText(url):
    header={
            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36',
            'cookie': ' 。。。。'}
    r=requests.get(url,headers=header,timeout=30)
    return r.text


#解析获取的网页
def parsePage(ilt,html):

    plt=re.findall(r'\"view_price\"\:\"[\d\.]*"',html)
    tlt=re.findall(r'\"raw_title\"\:\".*?"',html)
    slt=re.findall(r'\"view_sales\"\:\".*?"',html)
    nlt=re.findall(r'\"nick\"\:\".*?"',html)
    llt=re.findall(r'\"item_loc\"\:\".*?"',html)
    count=0
    for i in range(len(plt)):
        count=count+1
        price=eval(plt[i].split(':')[1])   #价格
        title=eval(tlt[i].split(':')[1])   #商品名称
        people=eval(slt[i].split(':')[1])   #购买人数
        nick=eval(nlt[i].split(':')[1])    #所属店铺
        loc=eval(llt[i].split(':')[1])     #地点
        ilt.append([count,price,title,people,nick,loc])
        
#数据写入csv
def writeAll(num):
    np.savetxt('new.csv',num,fmt='%s',delimiter=',')
#主函数
def main():
    goods='手机'
    priceMin='1500'
    priceMax='3500'
    deep=1
    start_url='https://s.taobao.com/search?q='+goods+'&filter=reserve_price%5B'+priceMin+'%2C'+priceMax+'%5D'
    indolist=[]  #输出结果
    elem_list=[]
    elem_list.append("序号")
    elem_list.append("价格")
    elem_list.append("商品名称")
    elem_list.append("购买人数")
    elem_list.append("店铺")
    elem_list.append("地点")
    indolist.append(elem_list)
    for i in range(deep):
        try:
            
            url=start_url + '&s=' + str(i)
            print(url)
            html=getHTMLText(url)
            print(html)
            parsePage(indolist,html)
        except:
            continue
   
    writeAll(indolist);
    #printScreen(indolist)
main()

qq_42795281

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
淘宝爬取商品信息，并写入csv

淘宝爬取基本商品信息有价格,商品名称,购买人数,店铺,地点最后写入csv整体爬取没什么，时间原因没有去爬评论和评分的详细信息，淘宝网站更新后需要浏览器的cookie，我这里就不展示我的cookie了，太长。本文写的时候借鉴了https://blog.csdn.net/holyjesus/article/details/100835712?utm_source=app。下面是完整代码imp...
复制链接

扫一扫