中国大学排名网页信息爬取

最新推荐文章于 2024-10-11 17:30:36 发布

江泱

最新推荐文章于 2024-10-11 17:30:36 发布

阅读量399

点赞数 5

文章标签： python 数据挖掘

本文链接：https://blog.csdn.net/qq_35951239/article/details/134897366

版权

本文介绍了如何使用Python的requests库和BeautifulSoup库抓取指定URL（如https://www.shanghairanking.cn/rankings/bcur/2023.html）的网页内容，并提取出所有的链接和标题。

摘要由CSDN通过智能技术生成


import requests
from bs4 import BeautifulSoup

url = 'https://www.shanghairanking.cn/rankings/bcur/2023.html'  # 请将URL替换为你要爬取的网站的URL

response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'html.parser')



# 找到网页中的所有链接和标题
links = soup.find_all('a')
for link in links:
    href = link.get('href')
    title = link.text
    print(title, href)