python爬虫淘宝简书_Python爬虫之模拟淘宝搜索物品信息

最新推荐文章于 2023-06-19 13:19:26 发布

weixin_39581972

最新推荐文章于 2023-06-19 13:19:26 发布

阅读量216

点赞数

文章标签： python爬虫淘宝简书

写在前面

环境：pycharm

用到的库：re、requests

过程

找到URL

在搜索框里输入关键字，可以发现url发生了变化，我们把一些不需要的参数给去掉，试试网页还能不能正常返回(别问我怎么知道哪些需要哪些不需要)

然后整理得到最终的URL是这个样子的

分析网页源代码

这里我们查看网页的源代码，随便搜索一个物品的名称，发现是在raw_title这里面

同理，我们可以找到价格的位置存放在raw_price里，这样我们就可以获取到名称与价格了

实现过程

首先导入我们所需要的库

import re

import requests

接下来是获取网页的源代码

def getHTMLText(url):

try:

response = requests.get(url, timeout = 30)

response.raise_for_status()

response.encoding = 'utf-8'

return response.text

except:

return ''

然后就是解析网页，并获取宝贝的价格和标题了

def parseHtml(html):

try:

re_title = re.compile(r'"raw_title":"(.*?)"', re.S)

re_price = re.compile(r'"view_price":"(.*?)"', re.S)

raw_title = re.findall(re_title, html)

view_price = re.findall(re_price, html)

for title, price in zip(raw_title, view_price):

print(title, price)

except:

return ''

基本大功告成了，再稍稍的添加一下功能，比如分页效果之类的(写在main函数里了)

def main():

url = 'https://s.taobao.com/search?q='

goods = input('查询物品名称：')

deeps = int(input('查询页数：'))

print('-' * 30)

for i in range(deeps):

html = getHTMLText(url + goods + "&s=" + str(44 *i))

parseHtml(html)

nice，完整代码贴上

import requests

import re

#获取网页源代码

def getHTMLText(url):

try:

response = requests.get(url, timeout = 30)

response.raise_for_status()

response.encoding = 'utf-8'

return response.text

except:

return ''

#解析网页，并获取宝贝的价格和标题

def parseHtml(html):

try:

re_title = re.compile(r'"raw_title":"(.*?)"', re.S)

re_price = re.compile(r'"view_price":"(.*?)"', re.S)

raw_title = re.findall(re_title, html)

view_price = re.findall(re_price, html)

for title, price in zip(raw_title, view_price):

print(title, price)

except:

return ''

def main():

url = 'https://s.taobao.com/search?q='

goods = input('查询物品名称：')

deeps = int(input('查询页数：'))

print('-' * 30)

for i in range(deeps):

html = getHTMLText(url + goods + "&s=" + str(44 *i))

parseHtml(html)

if __name__ == '__main__':

main()

完成

附一张效果图

weixin_39581972

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
python爬虫淘宝简书_Python爬虫之模拟淘宝搜索物品信息

写在前面环境：pycharm用到的库：re、requests过程找到URL在搜索框里输入关键字，可以发现url发生了变化，我们把一些不需要的参数给去掉，试试网页还能不能正常返回(别问我怎么知道哪些需要哪些不需要)然后整理得到最终的URL是这个样子的分析网页源代码这里我们查看网页的源代码，随便搜索一个物品的名称，发现是在raw_title这里面同理，我们可以找到价格的位置存放在raw_price里，...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。