[5]个人学习python：伪装成浏览器

最新推荐文章于 2024-05-04 13:59:48 发布

tangxiaoguodong

最新推荐文章于 2024-05-04 13:59:48 发布

阅读量556

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/deepmountain/article/details/80472771

版权

python 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

代码如下：

# -*- coding: utf-8 -*
import urllib.request,requests,io,sys
def save(data,filename,flag):
    path=r'C:\Users\admin\Desktop\{}.txt'.format(filename)
    if flag=='wb':
        f=open(path,mode='wb')
    elif flag=='w':
        f=open(path, mode='w')
    f.write(data)
    f.flush()
    f.close()
    return
url=r'http://www.baidu.com/'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
request=urllib.request.Request(url=url,headers=headers)
reponse=urllib.request.urlopen(request).read()
html0=reponse
save(html0,'html0','wb')
encoding0=requests.get(url).encoding
print (encoding0)

html1=reponse.decode('utf-8')
#save(html1,'html1','w')
print (html1)

结果如下：

小结：

1、url地址必须在末尾加个'/'，不然就会报错，暂不清楚是什么原因；
2、伪装浏览器：先构造请求头，然后发送请求-urllib.request.Request()，接着接收回应-urllib.request.urlopen()；
3-1、urlopen()方法同样可请求站点数据，类似get(url)，但必须加read()，且数据一般都是未解码的；
3-2、因此，open()打开或创建文件时，权限要注意：当字符串包含bytes格式时，要加上'b'二进制格式。
3-3、str转成bytes用encode，bytes转成str用decode。这里因为是urlopen()方法，数据未解码，包含bytes格式，所以只能用decode()方法。
4-1、转化成'utf-8'，写入会报错，提示‘UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 29392: illegal multibyte sequence’
4-2、这个提示是指写入的时候，特殊字符的处理问题，暂未找到解决方法。试了sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')，并且换成gb18030、gbk，都不行。

tangxiaoguodong

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[5]个人学习python：伪装成浏览器

代码如下：# -*- coding: utf-8 -*import urllib.request,requests,io,sysdef save(data,filename,flag): path=r'C:\Users\admin\Desktop\{}.txt'.format(filename) if flag=='wb': f=open(path,mode='...
复制链接

扫一扫