【python爬虫】class和class_

解决BeautifulSoup class属性匹配问题

最新推荐文章于 2022-11-04 18:54:42 发布

原创最新推荐文章于 2022-11-04 18:54:42 发布 · 7.2k 阅读

18 ·

CC 4.0 BY-SA版权

文章标签：

#python #爬虫 #BeautifulSoup #class_

python 专栏收录该内容

14 篇文章

订阅专栏

本文介绍如何在使用BeautifulSoup库时正确匹配class属性，避免Python关键字冲突导致的错误。通过实例展示了正确的属性匹配方式及其它有效方法。

在使用BeautifulSoup库的find_all()方法定位所需要的元素时，当匹配条件为 class时，会编译报错：
这里写图片描述

这时候应该使用 class_ 就不报错了。

soup.find_all('div', class_ = 'iimg-box-meta')

原因：

class在 python 中是关键保留字，不能再将这些字作为变量名或过程名使用，所以class_ 应运而生。

python中共有35个保留关键字

1	2	3	4	5
False	True	None	and	break
as	assert	async	await	class
continue	def	yield	del	elif
else	except	finally	for	from
global	if	import	in	is
lambda	nonlocal	not	or	pass
raise	return	try	while	with

import requests
from bs4 import BeautifulSoup
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'}
res = requests.get('http://www.cnplugins.com/',headers = headers) #get方法中加入请求头
soup = BeautifulSoup(res.text, 'html.parser') #对返回的结果进行解析
#print (soup.prettify())
 # BeautifulSoup库是一个非常流行的Python模块
 # 可以轻松地解析Requests库请求的网页，并把网页源代码解析位Soup文档，以便过滤提取数据。
 # BeautifulSoup官方推荐使用lxml作为解析器，因为效率高
print (soup.find_all('div', "iimg-box-meta")) # 查找 div class='iimg-box-meta'
print (soup.find_all('div', class_ = 'iimg-box-meta'))
print (soup.find_all('div', attrs = {"class": "iimg-box-meta"}))
print (soup.find_all('a', href = "/tool/save-as-mht.html"))   #可以
print (soup.find_all('a', href_ = "/tool/save-as-mht.html"))  #不行
print (soup.find_all('a', attrs = {"href": "/tool/save-as-mht.html", "target": "_blank"}))