第一次看爬虫的代码,隐隐约约有点像自动化测试的样子
记得刚开始selenium+python的时候,第一课就是让使用python导入selenium中的webdriver,然后启动某个浏览器
from selenium import webdriver
driver = webdriver.firefox()
driver.get("http://www.baidu.com")
然后爬虫的第一步,是打开网页,爬取数据
import urllib.request
response=urllib.request.urlopen("http://www.baidu.com")
html=response.read()
html=html.decode('utf-8')
print(html)
从猫网站上下载一个猫咪的图片
'''from selenium import webdriver
driver = webdriver.firefox()
driver.get("http://www.baidu.com")'''
import urllib.request
response=urllib.request.urlopen("http://placekitten.com/500/600") #将open的内容传递给response对象
cat_img=response.read()
with open('cat_500_600.jpg','wb') as f: #图片属于二进制文件,所以要用wb打开文件
f.write(cat_img)
print(response.geturl())
print(response.info())
print(response.getcode())
结果:
下载成功图片并打印:
http://placekitten.com/500/600
Date: Fri, 13 Mar 2020 10:47:09 GMT
Content-Type: image/jpeg
Transfer-Encoding: chunked
Connection: close
Set-Cookie: __cfduid=d4f323f3acad3d96ed059f7d239bfe4a01584096429; expires=Sun, 12-Apr-20 10:47:09 GMT; path=/; domain=.placekitten.com; HttpOnly; SameSite=Lax
Cache-Control: public, max-age=86400
Expires: Thu, 31 Dec 2020 20:00:00 GMT
Vary: User-Agent, Accept-Encoding
Access-Control-Allow-Origin: *
CF-Cache-Status: HIT
Age: 14431
Server: cloudflare
CF-RAY: 57352cde1d7b9b9d-SJC
200