python.day5

最新推荐文章于 2020-12-16 10:09:37 发布

Nnnnnniipla

最新推荐文章于 2020-12-16 10:09:37 发布

阅读量218

点赞数

文章标签： python

本文链接：https://blog.csdn.net/qq_43290184/article/details/105171311

版权

1.pathlib模块
（1）Path类
.exist检查目录是否存在 在这里插入图片描述
.mkdir创建新目录

.glob寻找当前路径中的目录和文件
在这里插入图片描述

在这里插入图片描述

pypi.org上有许多函数可以使用，以openpyxl为例
下载：在terminal窗口输入终端指令，例如pip3 install openpyxl

导入文件：Helloworld-右键-在Explorer中显示-将所需文件粘贴到Helloworld中
在这里插入图片描述
处理一个excel文件：

import openpyxl as xl
from openpyxl.chart import BarChart, Reference # BarChart和 Reference是类

wb = xl.load_workbook(“transaction.xlsx”) # 用load_workbook导入exl文件，注意要把exl保存在总文件夹如Helloworld中
sheet = wb[“Sheet1”] # 用方括号导入表格
print(f"共有{sheet.max_row}行") # 显示表格一共有多少行

for row in range(2, sheet.max_row + 1): # range产生的数不包括第二个值，因此要加1，这行的row会在【1，sheet.max_row】之间
cell = sheet.cell(row, 3) # sheet表格中row行3列的单元格
corrected_price = cell.value * 0.9
correct_price_cell = sheet.cell(row, 4) # 创建新的1列，这里是第四列
correct_price_cell.value = corrected_price # 给新的列赋值

Values = Reference(sheet, min_row=2, max_row=sheet.max_row, min_col=4, max_col=4) # Value包含所有第4列的数

chart = BarChart() # 创造一个柱状图
chart.add_data(Values) # 图里的数据来自Values
sheet.add_chart(chart, “e2”) # 在表格里添加这张图

wb.save(“transaction2.xlsx”) # 保存为新文件

处理多个文件：将上述代码定义为一个函数，用filename作为变量替代“transaction.xlsx"
在这里插入图片描述

3.爬虫
使用一个叫做selenium的包

4.libraries:
Numpy
Pandas：数据分析库，数据帧
MatPlotLib:二维绘图库，用于create graphs on plots
Sciikit-Learn:提供算法库，如decision tree, neural network

5.requests库
在这里插入图片描述

（1）get（）应用：
r = requests.get(url)
#get 构造一个向服务器请求资源的Request对象
#r 是Response对象，包含服务器所有相关资源信息

在这里插入图片描述
r.encoding :如果header中不存在charset，则编码为ISO-8859-1，这个编码不能解析中文
r.apparent_encoding : 分析内容找到encoding编码，
当encoding不能正确返回时，
令r.encoding=r.apparent_encoding

requests.get(url, params = None, kwargs)
#url获得页面url链接
#params是url的额外参数
#12个控制访问的参数

import requests
r = requests.get(“https://www.baidu.com/?tn=18029102_3_dg”)
r.status_code # 检测请求的状态码
200 # 200说明访问成功，否则为失败

r.encoding = ‘utf-8’
r.text
r.headers # 返回头部信息

获取网页的源代码

import requests
r = requests.get(“https://python123.io/ws/demo.html”)
r.text

(2)Requests库的异常

在这里插入图片描述
r.raise_for_status( ) # 如果不是200，将产生异常requests.HTTPError

（3）爬取网页的通用框架

在这里插入图片描述

6.htttp协议

在这里插入图片描述
PATCH仅需要提交更改部分
PUT需要更改后的全部信息

在这里插入图片描述

7.Requests库的post()方法

在这里插入图片描述
Requests库的put()方法

在这里插入图片描述

8.requests.request( )

（1）params GET
把键值对增加到url中，使url再次访问时带入了一些参数

import requests

kv = {‘key1’: ‘value1’, ‘key2’: ‘value2’}
r= requests.request(‘GET’, ‘https://search.bilibili.com/all?keyword=python%20mosh’, params = kv)
print(r.url)
https://search.bilibili.com/all?keyword=python%20mosh&key1=value1&key2=value2&rt=V%2FymTlOu4ow%2Fy4xxNWPUZ9n3u3FeQUhkmTW3nVhxVWw%3D

(2）data POST
向服务器提交资源

body = ‘aaaa’

s = requests.request(‘POST’,‘https://editor.csdn.net/md?articleId=105171311’, data = body)

字符串会存到链接对应的位置中

（3）headers

定制头

hd = {'user-agent': 'Chrome/10'} 
 # 模拟Chrome10浏览器发起访问   
 r=requests.request('POST','https://space.bilibili.com/271922391/favlist?fid=285042691&ftype=create', headers = hd)

9.html5是以一系列<>为主的标签封装的信息

10.BeautifulSoup：解析、遍历、维护标签树的库
将标签树转换为BeautifulSoup类

from bs4 import BeautifulSoup  # BeautifulSoup是类
soup = BeautifulSoup(' data', 'html.parser')  
soup = BeautifulSoup(open("D://demo.html", 'html.parser')  #通过打开文档的方式

（1）检查bs4是否安装成功

import requests
r = requests.get(“https://python123.io/ws/demo.html”)
r.text
‘This is a python demo page\r\n\r\n

The demo python introduces several python courses.
\r\n
Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:\r\nBasic Python and Advanced Python.
\r\n’

demo = r.text
from bs4 import BeautifulSoup
soup = BeautifulSoup(demo, “html.parser”)

将网页熬成一锅soup 😄

对demo 进行html 解析

print(soup.prettify())

This is a python demo page

The demo python introduces several python courses.

Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses: Basic Python and Advanced Python .

       # 查看是否成功解析

（2）bs4库的基本元素
Tag

<p class = "title"> .... </p >组成标签对  # p是标签的名称

在这里插入图片描述

（3）4种解析器

在这里插入图片描述

（4）

import requests
r = requests.get(“https://python123.io/ws/demo.html”)
r.text
‘This is a python demo page\r\n\r\n

The demo python introduces several python courses.
\r\n
Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:\r\nBasic Python and Advanced Python.
\r\n’

demo = r.text
from bs4 import BeautifulSoup
soup = BeautifulSoup(demo, “html.parser”)
soup.title

<title>This is a python demo page</title>
>>> # 页面的标题
>>> tag = soup.a  # .a表示链接标签
>>> tag
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>
>>> soup.tag
>>> print(soup.tag)
None
>>> soup.a
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>
>>> soup.a.name  #获得a标签的名字
'a'
>>> soup.append.parent.name  #a的父亲的名字
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    soup.append.parent.name  #a的父亲的名字
AttributeError: 'function' object has no attribute 'parent'
>>> soup.a.parent.name
'p'
>>> soup.a.parent.parent.name  #a的父亲的父亲的名字
>>> tag.attrs

>>> tag = soup.a
>>> tag.attrs
{'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}

tag.attrs[‘class’]
[‘py1’]

获得class 属性的值

tag.attrs[‘href’] #获得链接
‘http://www.icourse163.org/course/BIT-268001’

type(tag.attrs) # 查看标签属性的类型
<class ‘dict’>

type(tag) #查看标签类型
<class ‘bs4.element.Tag’>

soup.a.string #标签中间的字符串
‘Basic Python’

soup.p

soup.p.string
‘The demo python introduces several python courses.’

type(soup.p.string)
<class ‘bs4.element.NavigableString’>

# 不包含<b>，说明NavigableString可以跨越标签层次
 newsoup = BeautifulSoup("<b><!--This is a comment--></b><p>This is not a comment</p>", "html.parser")
newsoup.b.string
'This is a comment'
 # 打印出了注释内容
 print(newsoup.b.string)
This is a comment
 type(newsoup.b.string)
<class 'bs4.element.Comment'>
 #注释的类型
 type(newsoup.p.string)
<class 'bs4.element.NavigableString'>
#p的字符串的类型

Nnnnnniipla

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python.day5

1.pathlib模块（1）Path类.exist检查目录是否存在.mkdir创建新目录.glob寻找当前路径中的目录和文件pypi.org上有许多函数可以使用，以openpyxl为例下载：在terminal窗口输入终端指令，例如pip3 install openpyxl导入文件：Helloworld-右键-在Explorer中显示-将所需文件粘贴到Helloworld中...
复制链接

扫一扫