python第一个爬虫小程序以及遇到问题解决（中文乱码)+批量爬取网页并保存至本地

最新推荐文章于 2024-05-24 11:36:37 发布

glimmer_it

最新推荐文章于 2024-05-24 11:36:37 发布

阅读量414

点赞数

本文链接：https://blog.csdn.net/qq_15783243/article/details/78042468

版权

今天自己看了一下python试着写了一个爬虫小程序

原始代码：

from urllib import request
request.encoding = "utf-8"
response = request.urlopen("http://www.baidu.com")  # 打开网站
html =str(response.read(),'utf-8')
f=open('C:/Users/lenovo/Desktop/11.html','w+')
page = f.write(html)
f.close()

起初在将爬取得网页保存到本地的时候出现错误：

：UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 0: illegal multibyte seque

应该f的编码格式是GBK的，但是其它的是UTF-8的。所以指定一下编码格式即可。

f=open('C:/Users/lenovo/Desktop/11.html','w+',encoding='utf-8')

修改后代码:

from urllib import request
request.encoding = "utf-8"
response = request.urlopen("http://www.baidu.com")  # 打开网站
html =str(response.read(),'utf-8')
f=open('C:/Users/lenovo/Desktop/11.html','w+',encoding='utf-8')
page = f.write(html)
f.close()

批量爬取网页并保存至本地

from urllib import request
request.encoding = "utf-8"
fr = open("C:/Users/lenovo/Desktop/url.txt", "r").readlines()
count = 0
print(fr)
for line0 in fr:
    line = line0.strip('\n')
    line = line.strip('\'')
    print(line+"===========================")
    response = request.urlopen(line)
    html = str(response.read(), 'utf-8')

    fw = open("C:/Users/lenovo/Desktop/%d.html" % count, "w", encoding='utf-8')
    count+=1
    page = fw.write(html)
    fw.close()

glimmer_it

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python第一个爬虫小程序以及遇到问题解决（中文乱码)+批量爬取网页并保存至本地

今天自己看了一下python试着写了一个爬虫小程序原始代码：from urllib import requestrequest.encoding = "utf-8"response = request.urlopen("http://www.baidu.com") # 打开网站html =str(response.read(),'utf-8')f=open('C:/Users
复制链接

扫一扫