python爬虫结果写入html时编码错误

最新推荐文章于 2024-07-29 02:32:41 发布

QcQ____

最新推荐文章于 2024-07-29 02:32:41 发布

阅读量2.5k

点赞数 3

本文链接：https://blog.csdn.net/QcQ____/article/details/79114177

版权

html一般采用的都是utf-8的编码，但是默认的编码时GBK。

1. open方法时采用 encoding=“utf-8”

fout = open("output.html",mode="w",encoding="utf-8")

结果：可以生成html文件但是打开后乱码如下

2. 对中文语句后加 .encode（“utf-8”）

fout.write("<td>%s</td>"% data["title"].encode("utf-8"))

结果：

乱码如下

3. 如果采用format方法括号位置不对还会出现报错

 fout.write("<td>{}</td>".format(data["title"]).encode("utf-8"))

TypeError: write() argument must be str, not bytes

解决方法

1. html文件前方写入"<meta charset="utf-8">

(浏览器也需要先规定编码格式，此语句可在网页源代码最始端找到)

2. 设置encoding参数为“utf-8”，不使用 .encode（）

代码如下：

def output_html(self):
        fout = open("output.html",mode="w",encoding="utf-8")
        fout.write("<meta charset='utf-8'>")
        fout.write("<html>")
        fout.write("<body>")
        fout.write("<table>")

        for data in self.datas:
            fout.write("<tr>")
            # fout.write("<td>%s</td>"% (data["title"]))    
            fout.write("<td>{}</td>".format(data["title"]))
            fout.write("</tr>")

        fout.write("</table>")
        fout.write("</body>")
        fout.write("</html>")

        fout.close()