写了个爬爬去标题和内容,用bs4获取a标签的href和text时,获取不到,网页源代码如下去
解决办法:利用re
main_page = BeautifulSoup(resp.text,"html.parser")
#audit_div = main_page.find("div",attrs={"class":"ny-list"})
audit_div = main_page.find("div", attrs={"id": "4009681"})
obj = re.compile(r"<a href='(?P<href>.*?)' class='bt_link' title='(?P<title>.*?)'>",re.S)
result = obj.finditer(str(audit_div))
child_href_list = []
for it in result:
child_href = domain + it.group('href')
title = it.group('title')
child_href_list.append(child_href)
print(child_href,title)
效果如下