前期准备
安装必要的库
pip install urllib3 lxml requests beautifulsoup4
学习 get & post
方法 | 描述 |
---|---|
GET | 请求指定的页面信息,并返回实体主体。 |
POST | 向指定资源提交数据进行处理请求(例如提交表单或者上传文件)。数据被包含在请求体中。 POST请求可能会导致新的资源的建立和/或已有资源的修改。 |
GET
- requests做法
import requests
url = 'https://www.baidu.com/'
response = requests.get(url)
print(response.status_code)
# 输出
>>> 200 #表示请求成功。
状态码说明 http://www.runoob.com/http/http-status-codes.html
- urllib做法
import urllib.request
url = 'https://www.baidu.com/'
response = urllib.request.urlopen(url)
print(response.read())
#输出
>>> b'<html>\r\n<head>\r\n\t<script>\r\n\t\tlocation.replace(location.href.replace("https://","http://"));\r\n\t</script>\r\n</head>\r\n<body>\r\n\t<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>\r\n</body>\r\n</html>'
断网情况
import requests
url = 'https://www.baidu.com/'
response = requests.get(url)
print(response.status_code)
#输出为Error
>>> ConnectionError: HTTPSConnectionPool(host='www.baidu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001ABE2CE0BA8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))