Python爬虫入门：手把手教你抓取新闻资讯网站

最新推荐文章于 2025-01-25 22:50:08 发布

vvbgcc

最新推荐文章于 2025-01-25 22:50:08 发布

阅读量1k

点赞数 2

文章标签： python 爬虫

本文链接：https://blog.csdn.net/vvbgcc/article/details/145191100

版权

一、准备工作

安装 Python：
确保你的电脑上安装了Python，建议使用Python 3.x版本。可以在Python官方网站下载并安装。
安装必要的库：
使用pip安装以下库：
```
pip install requests beautifulsoup4
```

二、了解目标网站

选择一个新闻资讯网站，例如一个简单的新闻页面（如“https://news.ycombinator.com/”），我们将从中提取新闻标题和链接。

三、编写爬虫代码

接下来，我们将编写一个简单的爬虫来抓取新闻标题和链接。

import requests
from bs4 import BeautifulSoup

# 目标网址
url = 'https://news.ycombinator.com/'

# 发送请求
response = requests.get(url)

# 检查响应状态码
if response.status_code == 200:
    # 解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 找到所有新闻条目
    items = soup.find_all('tr', class_='athing')

    # 遍历每个新闻条目
    for item in items:
        title = item.find('a', class_='storylink').text  # 新闻标题
        link = item.find('a', class_='storylink')['href']  # 新闻链接
        print(f'Title: {title}')
        print(f'Link: {link}\n')
else:
    print("Failed to retrieve the webpage.")