Beautifulsoup分别用find、find_all和select爬取论文信息

最新推荐文章于 2024-04-15 21:58:50 发布

温润如你

最新推荐文章于 2024-04-15 21:58:50 发布

阅读量814

点赞数 1

文章标签： Beautifusoup Python soup.find soup.find_all soup.select

本文链接：https://blog.csdn.net/weixin_45671979/article/details/101150051

版权

本文介绍了如何利用Beautifulsoup的find、find_all和select方法来抓取网页上的论文信息。find仅获取第一条匹配结果，而find_all和select则能获取所有匹配的内容。示例中，作者展示了这些方法在处理论文链接时的运用。

摘要由CSDN通过智能技术生成

分别用find、find_all和select爬取论文信息

find只会爬取到第一条满足条件的信息，而find_all和select会爬取所有满足条件的信息

论文链接
find和find_all方法：

import requests
from bs4 import BeautifulSoup
from requests import RequestException
def get_html(url):
	try:
		headers = {
			'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36'
		}
		respons = requests.get(url, headers=headers)
		if respons.status_code == 200:
			respons.encoding = respons.apparent_encoding
			return respons.text
		return None
	except RequestException as e:
		print(e)
		return None
		
if __name__ == "__main__":
	url = 'http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=zgszyx201807023'
	h