天下的网站没有我爬不到的,只有不想爬的(有吹牛逼之嫌)。
Python2慢慢被Python3所代替了,主要以3为主,话不多说,直接看技术点吧
爬取的网站:url = ‘https://www.baidu.com/’
- requests的方法
-
import requests url = 'https://www.baidu.com/' req = requests.get(url) obj = req.content.decode('utf-8') print(obj)
-
- Urllib的方法
-
import urllib.request url = 'https://www.baidu.com/' req = urllib.request.urlopen(url) obj = req.read().decode('utf-8') print(obj) ''' import urllib.request url = 'https://www.baidu.com/' header = {'user-Agent':'Mozilla/5.0'} req = urllib.request.Request(url,headers=header) obj = urllib.requese.urlopen(req) response = obj.read().decode('utf-8') print(response) '''
-
- 基于urllib的request
-
from urllib import request url = 'https://www.baidu.com/' req = request.urlopen(url) obj = req.read().decode('utf-8') print(obj)
-
我习惯用第一种,简单粗暴,有别的方法,请大家留言