html文件定时存成txt,我想将解析的HTML文件保存为TXT文件

最新推荐文章于 2022-01-01 23:11:49 发布

电影魔鬼-教程王

最新推荐文章于 2022-01-01 23:11:49 发布

阅读量161

点赞数

文章标签： html文件定时存成txt

我解析了显示文章的网页。我想保存解析数据转换成文本文件，但我的Python壳显示这样的错误：我想将解析的HTML文件保存为TXT文件

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 107: ordinal not in range(128)

，这里是我的代码的一部分

search_result = urllib.urlopen(url)

f = search_result.read()

#xml parsing

parsedResult = xml.dom.minidom.parseString(f)

linklist = parsedResult.getElementsByTagName('link') #extracting links

extractedURL = linklist[3].firstChild.nodeValue #pick one link

page = urllib.urlopen(extractedURL).read()

#making html file

g= open('yyyy.html', 'w')

g.write(page)

g.close()

#reading html file and parsing html to get pure text of article

g= open('yyyy.html', 'r')

bs = BeautifulSoup(g,fromEncoding="utf-8")

g.close()

article = bs.find(id="articleBody")

content = article.get_text()

#save as a text file

h= open('yyyy.txt', 'w')

h.write(content)

h.close()

我要补充，使这项工作？

电影魔鬼-教程王

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
html文件定时存成txt,我想将解析的HTML文件保存为TXT文件

我解析了显示文章的网页。我想保存解析数据转换成文本文件，但我的Python壳显示这样的错误：我想将解析的HTML文件保存为TXT文件UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 107: ordinal not in range(128)，这里是我的代码的一部分search_resul...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。