Python学习笔记

最新推荐文章于 2023-05-19 16:06:38 发布

一颗正在努力的小星星

最新推荐文章于 2023-05-19 16:06:38 发布

阅读量234

点赞数

分类专栏：小笔记

本文链接：https://blog.csdn.net/weixin_38771884/article/details/89314938

版权

小笔记专栏收录该内容

12 篇文章 0 订阅

订阅专栏

python库安装

idle：
```
 **pip install 库名**
```
Pycharm：

在这里插入图片描述

1. Requests库
使用方法：import Requests
主要使用方法：

response=requests.get(url，params={‘wd’:‘python’},headers={“xxx”:“xxx”},timeout=(等待时间，读取时间))

下面代码但是自己写的爬取yande.re图片网的代码
首先分析它的图片地址规律，https://yande.re/post/show/123123，可以知道他是根据右面数字进行下载的。
使用BeautifulSoup分析返回的context，通过find（attrs=“class”:“image”）【找到所有有次属性的标签】，return xxx[‘src’]，拿到图片的链接。
再请求图片的链接，把返回的二进制content，通过write保存
with open(要保存的图片地址【包含图片名称及后缀】,‘wb’【权限,以二进制形式写入文件，若存在则覆盖】) as f
f.write(content)。
到此写入完成。

在这里插入图片描述

from requests.exceptions import RequestException
from requests.exceptions import ConnectTimeout
from requests.exceptions import ReadTimeout
from requests.exceptions import ConnectionError

from urllib.parse import urlencode
from bs4 import BeautifulSoup
import os
import requests
import re
import json
import time
def openurl(url):
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36'}
        response = requests.get(url,timeout=(30,15), headers=headers)
        if response.status_code == 200:
            return response
        return None
    except RequestException:
        return None
def searchYandeAndKonSrc(content):
    soup=BeautifulSoup(content, "html.parser")
    soup_list=soup.find(attrs={"class" :"image"})
    return soup_list['src']

# url="https://yande.re/post/show/"+q
# address="E:\CatchPic\test"
# r=openurl("https://yande.re/post/show/3453453")
# url1=searchYandeAndKonSrc(r.content)
# r1=openurl(url1)
# with open("E://CatchPic//test//"+str(q)+".jpg", 'wb') as f:
#     f.write(r1.content)
#     print("保存成功")


q = input("请输入要从那张开始爬取（倒叙）:")
while int(q) > 0:
    try:
        print("第"+str(q)+"张：")
        starttimeshow=time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))
        starttime=time.perf_counter()
        print("开始时间："+starttimeshow)
        r =openurl("https://yande.re/post/show/"+str(q))
        src = "E://CatchPic//yande1//" + str(q) + ".jpg"
        q = int(q)-1
        url1=searchYandeAndKonSrc(r.content)
        r1=openurl(url1)
        #print(r1.content)
        with open(src, 'wb') as f:
            f.write(r1.content)
            print("保存成功")
        endTimeshow=time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))
        endTime=time.perf_counter()
        print("结束时间："+endTimeshow)
        print("耗时："+str(endTime-starttime))
        print("-------------------------------")
    except AttributeError:
        print("第"+str(q)+"张图片不存在")
    except ReadTimeout:
        print("第"+str(q)+"张图片，读取超时")
        time.sleep(5)
    except ConnectionError:
        print("第" + str(q) + "张图片，地址链接失败")
    except ConnectTimeout:
        print("连接超时")
        time.sleep(60)
    except TypeError:
        print("该连接没有符合条件的图片")

百度信息爬取：
思路：首先要访问https://www.baidu.com/s?wd=xxxx&pn=50，xxx为所查询信息，通过pn遍历没每一页（百度的查询结果每页pn增加10）。每一个标题都在</h3 class=“t ???”></a href=“xxxx”></a/>标题<//h3>。首先分析标题，通过soup find到h3标签,拿到它的内容。再获取h3标签内的a标签，再获取a标签的href属性，拿到链接。
问题1：
如何拿到标题下的内容简述？
它的类型有：纯文字，图片加文字，视频，翻译，百科等等格式。学习soup标签遍历，把各类型的都

一颗正在努力的小星星

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python学习笔记

python库安装idle： **pip install 库名**Pycharm：1. Requests库使用方法：import Requests主要使用方法：response=requests.get(url，params={‘wd’:‘python’},headers={“xxx”:“xxx”},timeout=(等待时间，读取时间))下面代码但是自己写的爬...
复制链接

扫一扫