Python爬取网易车型库

最新推荐文章于 2021-12-09 00:59:04 发布

nodoself

最新推荐文章于 2021-12-09 00:59:04 发布

阅读量982

点赞数 1

分类专栏： Python爬虫文章标签： python爬虫车型库

本文链接：https://blog.csdn.net/nodoself/article/details/81566343

版权

本文介绍了使用Python爬取网易车型库的详细步骤，包括获取品牌链接、车型链接，遍历数据并存储，以及数据清洗和导入Excel。由于网易汽车数据无需渲染即可爬取，相比搜狐汽车更高效。此外，还分享了数据处理技巧，如利用VBA转置数据，并提供了代码修正和更新。

摘要由CSDN通过智能技术生成

跟搜狐车库的爬取思路是一样的。首先找到每个车型的连接，然后遍历每个车型的连接去爬取所需的数据。不过网易车型库相较于搜狐车库而言是爬取的时间是远远少于搜狐汽车的。毕竟网易汽车的数据是不用渲染就可以爬取下来的，而搜狐汽车的数据需要渲染之后才可以爬取下来。

步骤1：获得品牌的连接

import requests
import re

url = 'http://product.auto.163.com/'
def getHtml(url):
    data={'test':'data'}
    headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
             'Accept - Encoding': 'gzip, deflate',
             'Accept - Language': 'zh - CN, zh;q = 0.9'
    }
    html=requests.get(url,headers=headers,params=data)
    html.encoding='GBK'
    return html.text

def cutstr(html):
    pattern=re.compile('<a.*?id="(.*?)".*?_seriseId=.*?</a>')
    strs=re.findall(pattern,html)
    return strs

def gotoFile():
    html = getHtml(url)
    with open('wangyicar3.txt','w',encoding='utf-8') as f:
        for i in cutstr(html):
            str='http://product.auto.163.com/series/'+i+'.html#008B00'
            f.write(str+'\n')
    f.close()


gotoFile()

步骤2：获得每个车型的连接

import requests
import re

# url = 'http://product.auto.163.com/series/16979.html#008B00'
def getHtml(url):
    data={'test':'data'}
    headers={'User-Agent'

最低0.47元/天解锁文章

nodoself

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Python爬取网易车型库

跟搜狐车库的爬取思路是一样的。首先找到每个车型的连接，然后遍历每个车型的连接去爬取所需的数据。不过网易车型库相较于搜狐车库而言是爬取的时间是远远少于搜狐汽车的。毕竟网易汽车的数据是不用渲染就可以爬取下来的，而搜狐汽车的数据需要渲染之后才可以爬取下来。步骤1：获得品牌的连接import requestsimport reurl = 'http://product.auto.163...
复制链接

扫一扫

专栏目录