我正在尝试使用Python将html块转换为文本。
输入:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
Consectetuer adipiscing elit. Some Link Aenean commodo ligula eget dolor. Aenean massa
Aenean massa.Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
Consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
所需输出:Lorem
ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo
ligula eget dolor. Aenean massa
Consectetuer adipiscing elit.
Some
Link Aenean commodo ligula eget dolor. Aenean massa
Aenean
massa.Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean
commodo ligula eget dolor. Aenean massa
Lorem ipsum dolor sit
amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.
Aenean massa
Consectetuer adipiscing elit. Aenean commodo
ligula eget dolor. Aenean massa
我尝试过使用html2text模块,但没有成功(我对python很陌生:)
以下是我尝试过的:#!/usr/bin/env python
import urllib2
import html2text
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://example.com/page.html').read())
txt = soup.find('div', {'class' : 'body'})
print html2text.html2text(txt)
“txt”对象生成上面的html块。我想把它转换成文本并打印在屏幕上。
任何关于这段代码的帮助都将非常感谢。