python学习笔记，我的第一个爬虫

最新推荐文章于 2023-10-17 00:05:02 发布

Molly_DD

最新推荐文章于 2023-10-17 00:05:02 发布

阅读量186

点赞数 2

分类专栏： Python学习笔记文章标签： python

本文链接：https://blog.csdn.net/MoLi_D/article/details/104846885

版权

Python学习笔记专栏收录该内容

18 篇文章 0 订阅

订阅专栏

第一次看爬虫的代码，隐隐约约有点像自动化测试的样子
记得刚开始selenium+python的时候，第一课就是让使用python导入selenium中的webdriver，然后启动某个浏览器

from selenium import webdriver
driver = webdriver.firefox()
driver.get("http://www.baidu.com")

然后爬虫的第一步，是打开网页，爬取数据

import urllib.request
response=urllib.request.urlopen("http://www.baidu.com")
html=response.read()
html=html.decode('utf-8')
print(html)

从猫网站上下载一个猫咪的图片

'''from selenium import webdriver
driver = webdriver.firefox()
driver.get("http://www.baidu.com")'''
import urllib.request
response=urllib.request.urlopen("http://placekitten.com/500/600") #将open的内容传递给response对象
cat_img=response.read()
with open('cat_500_600.jpg','wb') as f: #图片属于二进制文件，所以要用wb打开文件
    f.write(cat_img)

print(response.geturl())
print(response.info())
print(response.getcode())

结果：
下载成功图片并打印：

http://placekitten.com/500/600

Date: Fri, 13 Mar 2020 10:47:09 GMT
Content-Type: image/jpeg
Transfer-Encoding: chunked
Connection: close
Set-Cookie: __cfduid=d4f323f3acad3d96ed059f7d239bfe4a01584096429; expires=Sun, 12-Apr-20 10:47:09 GMT; path=/; domain=.placekitten.com; HttpOnly; SameSite=Lax
Cache-Control: public, max-age=86400
Expires: Thu, 31 Dec 2020 20:00:00 GMT
Vary: User-Agent, Accept-Encoding
Access-Control-Allow-Origin: *
CF-Cache-Status: HIT
Age: 14431
Server: cloudflare
CF-RAY: 57352cde1d7b9b9d-SJC

200

Molly_DD

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python学习笔记，我的第一个爬虫

第一次看爬虫的代码，隐隐约约有点像自动化测试的样子记得刚开始selenium+python的时候，第一课就是让使用python导入selenium中的webdriver，然后启动某个浏览器from selenium import webdriverdriver = webdriver.firefox()driver.get("http://www.baidu.com")然后爬虫的第一步...
复制链接

扫一扫