1. urllib 基础:urlretrieve()、urlcleanup()、info()、getcode()、geturl()等
1.1 urlretrieve()可以直接将某个网页保存到本地
import urllib.request
url1='https://www.icourse163.org'
urllib.request.urlretrieve(url1,filename='F:\jupyterpycodes\python_pachongfenxi\mooc.html')
('F:\\jupyterpycodes\\python_pachongfenxi\\mooc.html',
<http.client.HTTPMessage at 0x23374d0c240>)
1.2 urlcleanup(): 将urlretrieve()运行产生的缓存清理掉
import urllib.request
urllib.request.urlcleanup()
1.3 info():将基本的环境信息展示出来
import urllib.request
url1='https://www.icourse163.org'
file=urllib.request.urlopen(url1)
file.info()
<http.client.HTTPMessage at 0x24ff0b6e710>
1.4 getcode(): 当前爬取的网页的状态码。常用状态码为200/202:爬取正常;403:爬取不正常
import urllib.request
url1='https://www.icourse163.org'
file=urllib.request.urlopen(url1)
file.getcode()
200
1.5 geturl(): 当前爬取的网页的网址。
import urllib.request
url1='h