爬虫简单基础代码

最新推荐文章于 2022-09-16 11:45:13 发布

生产的驴

最新推荐文章于 2022-09-16 11:45:13 发布

阅读量102

点赞数

文章标签：爬虫

原文链接：http://www.cnblogs.com/creative-work/p/8969562.html

版权

以下代码可以去掉注释单独运行：

 1 import urllib.request
 2 
 3 url = 'http://www.jianshu.com/'
 4 response = urllib.request.urlopen(url=url) #第一个参数是要打开的url 第二个是data表示post请求时 使用的 print(type(response))\
 5 #   #返回的是一个HTTPResponse对象
 6 # print(response.read())   #读取了所有网页的内容 包括换行符和制表符，获取的二进制数据
 7 # print(response.read().decode('utf-8')) #解码后进行输出  #字符串-》字节：编码  encode()  字节-》字符串：解码  decode()
 8 # print(response.readline())  #读取一行
 9 # print(response.readlines())  #读取全部返回一个列表
10 # print(response.getheaders())  #返回一个响应头信息，列表里面有元组
11 # urllib.request.urlretrieve(url=url,filename='baidu.html') #将文件下载到本地并命名，可以下载网页 图片 视频等
12 # urllib.parse #处理url的urllib.parse.urlencode 介绍post请求的时候再说这个函数
13 print(response.getheaders())
14 #编码:因为浏览器并不能识别你请求里面的中文字符
15 # 编码
16 # string = urllib.parse.quote('http://www.baidu.com?username=狗蛋&password=123')
17 # print(string)
18 #解码
19 # string = urllib.parse.unquote('http%3A//www.baidu.com%3Fusername%3D%E7%8B%97%E8%9B%8B%26password%3D123')
20 # print(string)

转载于:https://www.cnblogs.com/creative-work/p/8969562.html

生产的驴

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫简单基础代码

以下代码可以去掉注释单独运行： 1 import urllib.request 2 3 url = 'http://www.jianshu.com/' 4 response = urllib.request.urlopen(url=url) #第一个参数是要打开的url 第二个是data表示post请求时使用的 print(type(response))\ 5 # ...
复制链接

扫一扫