python get text,Python Beautifulsoup get_text（）没有获取所有文本

最新推荐文章于 2023-08-05 23:51:00 发布

长安的雨

最新推荐文章于 2023-08-05 23:51:00 发布

阅读量519

点赞数

文章标签： python get text

本文探讨了如何在BeautifulSoup 4.4.0中遇到的获取HTML标签文本问题，通过实例说明了为何`get_text()`方法仅获取首段内容，并给出了使用lxml解析器替代html.parser以获取全部内容的解决方案。

摘要由CSDN通过智能技术生成

I'm trying to get all text from a html tag using beautifulsoup get_text() method. I use Python 2.7 and Beautifulsoup 4.4.0. It works for most of the times. However, this method can only get first paragraph from a tag sometimes. I can't figure out why. Please see the following example.

from bs4 import BeautifulSoup

import urllib2

job_url = "http://www.indeed.com/viewjob?jk=0f5592c8191a21af"

site = urllib2.urlopen(job_url).read()

soup = BeautifulSoup(site, "html.parser")

text = soup.find("span", {"class": "summary"}).get_text()

print text

I want to get all content from this indeed job description. Basically, I want to get all text in . However, utilize the code above, I can only get "Please note that this is a 1 year contract assignment. Candidates cannot start an assignment until background check and drug test is completed". Why I'm losing the rest of text? How can I get all text from this tag without specifying sub-tags?

Thanks a lot.

解决方案

Try it with a different parser like the lxml parser instead of the html.parser parser:

Replace:

soup = BeautifulSoup(site, "html.parser")

with:

soup = BeautifulSoup(site, "lxml")

长安的雨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python get text,Python Beautifulsoup get_text（）没有获取所有文本

I'm trying to get all text from a html tag using beautifulsoup get_text() method. I use Python 2.7 and Beautifulsoup 4.4.0. It works for most of the times. However, this method can only get first par...
复制链接

扫一扫