python爬虫系列一——urllib基本请求库

最新推荐文章于 2023-07-13 14:57:31 发布

qq_42787271

最新推荐文章于 2023-07-13 14:57:31 发布

阅读量393

点赞数 1

分类专栏： python爬虫文章标签： python 爬虫 urllib

本文链接：https://blog.csdn.net/qq_42787271/article/details/81559016

版权

python爬虫专栏收录该内容

7 篇文章 3 订阅

订阅专栏

urllib定义：

python内置的http请求库

urllib.request – 请求模块
urllib.error – 异常处理模块
urllib.parse – url解析模块

urllib库的基本使用

常用的抓取网页的方法

post和get数据传送

urllib的爬取网页

import urllib.request
#urlopen():向目标服务器发送一个请求
file=urllib.request.urlopen("http://www.baidu.com")
#获取的是字节流形式数据,有点乱码
#print(file.read())
#转码函数decode(),相应数据格式的字节流--字符串
result=file.read().decode("utf-8")#可以到官网head查看charset
print(result)

answer:
<html>
<head>

    <meta http-equiv="content-type" content="text/html;charset=utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta content="always" name="referrer">
    <meta name="theme-color" content="#2932e1">
    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
    <link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="百度搜索" />
    <link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu_85beaf5496f291521eb75ba38eacbd87.svg">
    ------

urllib-post请求

urllib带data的就是post请求

测试网址：http://www.iqianyue.com/mypost/

import urllib.request#请求网页
import urllib.parse#解析网页

#定义需要提交给表单的data,然后解析转码再发送
#encode()：把字符串转换成相应的数据格式的字节流数据
data=urllib.parse.urlencode({
    "name":"xiao@163.com",
    "pass":"1234"

}).encode("utf-8")
#2.带data发送请求
respose=urllib.request.urlopen("http://www.iqianyue.com/mypost/")
result=respose.read()#字节流
fl=open("1.html","wb")#建立html文件
fl.write(result)
fl.close()

urllib-post超时设置

timeout=20是超时设置，超过这个时间还没有解析出来，就会报错

import urllib.request

respose=urllib.request.urlopen("http://www.ibeifeng.com/",timeout=20)
print(respose.read())

qq_42787271

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录