Python爬虫商品信息

最新推荐文章于 2024-04-23 15:15:59 发布

在努力的望舒7

最新推荐文章于 2024-04-23 15:15:59 发布

阅读量1.3k

点赞数 1

文章标签：爬虫 python pandas

本文链接：https://blog.csdn.net/qq_53336761/article/details/129816773

版权

首先，我们需要确认爬取的网站地址。在本例中，我们要爬取的是京东电脑商品信息，因此我们需要找到京东网站的搜索页面地址：https://search.jd.com/Search?keyword=电脑&enc=utf-8

import requests

url = 'https://search.jd.com/Search?keyword=电脑&enc=utf-8'

response = requests.get(url)

html = response.text

接着，我们用 Python 代码发送 HTTP 请求，获取搜索页面的 HTML 内容。这里我们使用 requests 模块来发送请求：

import requests url = 'https://search.jd.com/Search?keyword=电脑&enc=utf-8' response = requests.get(url) html = response.text

然后，我们使用 Beautiful Soup 模块来解析 HTML 内容，提取我们需要的信息。我们可以通过浏览器的页面审查工具来确定需要提取的标签和类名（这里我只爬取了电脑名称，价格，和商品链接）：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

items = soup.select('.gl-item')

for item in items:

name = item.select('.p-name em')[0].text.strip()

price = item.select('.p-price i')[0].text.strip()

link = 'http:' + item.select('.p-img a')[0]['href']

print('商品名称：', name)

print('商品价格：', price)

print('商品链接：', link)

print('\n')

最后，我们就可以按照自己的需求来进行数据处理和存储。例如，我们可以将爬取到的商品信息保存到 Excel 或者数据库中，或者进行数据可视化分析。

# 存储数据

df = pd.DataFrame(data, columns=['名称', '价格', '链接'])

df.to_excel('jd_computer.xlsx', index=False)

最后完整代码我是把数据存储在一个Excel文件里

import requests

from bs4 import BeautifulSoup

import pandas as pd

# 参数配置

url = 'https://search.jd.com/Search?keyword=电脑&enc=utf-8'

headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

# 发送 HTTP 请求

response = requests.get(url, headers=headers)

html = response.text

# 解析 HTML 内容

soup = BeautifulSoup(html, 'html.parser')

items = soup.select('.gl-item')

data = []

for item in items:

name = item.select('.p-name em')[0].text.strip()

price = item.select('.p-price i')[0].text.strip()

link = 'http:' + item.select('.p-img a')[0]['href']

data.append([name, price, link])

# 存储数据

df = pd.DataFrame(data, columns=['名称', '价格', '链接'])

df.to_excel('jd_computer.xlsx', index=False)

在努力的望舒7

关注

1
点赞
踩
13

收藏

觉得还不错? 一键收藏
2
评论
Python爬虫商品信息

keyword=电脑&enc=utf-8' response = requests.get(url) html = response.text。在本例中，我们要爬取的是京东电脑商品信息，因此我们需要找到京东网站的搜索页面地址：https://search.jd.com/Search?df = pd.DataFrame(data, columns=['名称', '价格', '链接'])df = pd.DataFrame(data, columns=['名称', '价格', '链接'])
复制链接

扫一扫