我发现方法BeautifulSoup.find()将空格分隔类属性.
在那种情况下,我不能使用正则表达式,如下面的代码所示.
有人可以帮助我找到所有“树童”元素的正确方法:
import re
from bs4 import BeautifulSoup
r_html = "
"
"
"
"
bs_tab = BeautifulSoup(r_html, "html.parser")
workspace_box_visible = bs_tab.findAll('div', {'class':'tree children1'})
print workspace_box_visible # result: [
workspace_box_visible = bs_tab.findAll('div', {'class':re.compile('^tree children\d')})
print workspace_box_visible # result: [] >>>> empty array because
#class name was splited by whitespace character<<<<
# >>>>>> print all element classes <<<<<<<
def print_class(class_):
print class_
return False
workspace_box_visible = bs_tab.find('div', {'class': print_class})
# expected:
# root
# tree children1
# tree children2
# tree children3
# actual:
# root
# tree
# children1
# tree
# children2
# tree
# children3
提前致谢,
====评论==========
stackoverflow网站不允许添加注释超过500个字符,
所以我在这里添加了评论:
上面是一个示例,展示了BeautifulSoup如何查找所需的类.
但是,如果我有DOM结构,例如:
r_html = "
"
"
"
"
"
以及何时需要选择具有类属性的控件:“树孩子”和“树孩子优先”,
您(Padraic Cunningham)帖子中描述的所有方法均无效.
我找到了使用正则表达式的解决方案:
controls = bs_tab.findAll('div')
for control in controls:
if re.search("^tree children|^tree children first", " ".join(control.attrs['class'] if control.attrs.has_key('class') else "")):
print control
另一个解决方案:
bs_tab.findAll('div', class_='tree children') + bs_tab.findAll('div', class_='tree children first')
我知道,这不是很好的解决方案.我希望BeautifulSoup模块具有适当的方法.