Python 使用nltk和BeautifulSoup进行数据清理 (去除html tag和转换html entities)

from nltk import clean_html
from BeautifulSoup import BeautifulStoneSoup

content = '''Is anyone else having troubles with Bluetooth on a Moto X?
\u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
 the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. \u00a0And oddly, it only happens some of the time. \u00a0And other uses
 of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior.<br /><br />It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. \u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that.<br /><br />And yes, I've tried forgetting and
re-initiating the bluetooth connection.\ufeff'''

# clean_html removes tags and
# BeautifulStoneSoup converts HTML entities
def cleanHtml(html):
    if html == "": return ""
    return BeautifulStoneSoup(clean_html(html),
        convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]

print content
print 
print cleanHtml(content)
Is anyone else having troubles with Bluetooth on a Moto X?
\u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
 the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. \u00a0And oddly, it only happens some of the time. \u00a0And other uses
 of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior.<br /><br />It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. \u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that.<br /><br />And yes, I've tried forgetting and
re-initiating the bluetooth connection.\ufeff

Is anyone else having troubles with Bluetooth on a Moto X?
\u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
 the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. \u00a0And oddly, it only happens some of the time. \u00a0And other uses
 of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior. It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. \u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that. And yes, I've tried forgetting and
re-initiating the bluetooth connection.\ufeff


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值