在编写将中文输出到html时候,不仅要在输出时生成uft-8编码的网页,如下两句的encode('utf-8'):
fout.write("<td>%s</td>" % data['title'].encode('utf-8'))
fout.write("<td>%s</td>" % data['summary'].encode('utf-8'))
在浏览器读取原内容时也应该设置为utf-8格式,否则会乱码,如下句:
fout.write("<head><meta charset='utf-8'></head>")
下面实现了一个输出到HTML的类,collect_data方法接收data参数是一个字典,含有字段'url','title'和'summary'
# coding:utf-8
#输出到html
class HtmlOutputer(object):
def __init__(self):
self.datas = []
def collect_data(self,data):
if data is None:
return
self.datas.append(data)
def output_html(self):
fout = open('output.html','w')
fout.write("<html>")
fout.write("<head><meta charset='utf-8'></head>")
fout.write("<body>")
fout.write("<table>")
for data in self.datas:
fout.write("<tr>")
fout.write("<td>%s</td>" % data['url'])
fout.write("<td>%s</td>" % data['title'].encode('utf-8'))
fout.write("<td>%s</td>" % data['summary'].encode('utf-8'))
fout.write("</tr>")
fout.write("</html>")
fout.write("</body>")
fout.write("</table>")