python中的beautiful_在Python中使用BeautifulSoup解析数据

最新推荐文章于 2021-11-17 15:54:33 发布

到处挖坑蒋玉成

最新推荐文章于 2021-11-17 15:54:33 发布

阅读量201

点赞数

文章标签： python中的beautiful

本文链接：https://blog.csdn.net/weixin_29688535/article/details/113502545

版权

BeautifulSoup HTML解析作者提取 Python 链接标签

关键词由CSDN通过智能技术生成

我试图使用BeautifulSoup解析DOM树并提取作者的名字。下面是一段HTML代码，它显示了我要擦掉的代码的结构。

Authors:

Authors:

我的困惑在于，当我执行soup.find时，它会找到我正在搜索的div标记的第一个匹配项。之后，我搜索所有的“a”链接标签。在此阶段，如何从每个链接标记中提取作者姓名并将其打印出来？有没有办法使用BeautifulSoup或者我需要使用Regex？如何继续遍历其他div标记并提取作者姓名？import re

import urllib2,sys

from BeautifulSoup import BeautifulSoup, NavigableString

html = urllib2.urlopen(address).read()

soup = BeautifulSoup(html)

try:

authordiv = soup.find('div', attrs={'class': 'list-authors'})

links=tds.findAll('a')

for link in links:

print ''.join(link[0].contents)

#Iterate through entire page and print authors

except IOError:

print 'IO error'