解决Python爬取百度页面出现中文乱码问题

一只迷人的坩埚

于 2022-06-16 17:01:36 发布

阅读量2.9k

点赞数 8

文章标签： python 开发语言爬虫

本文链接：https://blog.csdn.net/bashine/article/details/125306542

版权

学爬虫碰到如下情况

# coding:utf-8
# 爬虫：通过编写程序来获取到互联网上的资源
# 需求：用程序模拟浏览器，输入一个网址，从该网址中获取到资源或者内容
from urllib.request import urlopen

url = "http://www.baidu.com"
resp = urlopen(url)

# print(resp.read().decode("utf-8"))
with open("mybaidu.html", mode="w") as f:
     f.write(resp.read().decode("utf-8"))
print("over!")

按以上代码运行，中文部分显示乱码

<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta content="always" name="referrer"><meta name="theme-color" content="#ffffff"><meta name="description" content="ȫ��

最低0.47元/天解锁文章