Python基础+数据科学入门（九）标准库

最新推荐文章于 2024-03-10 11:04:35 发布

小明同学的杂货铺

最新推荐文章于 2024-03-10 11:04:35 发布

阅读量205

点赞数

分类专栏： python基础（数据学科入门）

本文链接：https://blog.csdn.net/qq_38425288/article/details/113349176

版权

python基础（数据学科入门）专栏收录该内容

11 篇文章 2 订阅

订阅专栏

声明：该博客参考深度之眼的视频课程，如有侵权请联系小编删除博文，谢谢！若总结有所失误，还请见谅，并欢迎及时指出。
在这里插入图片描述

python标准库

1 time库

python处理时间的标准库

获取现在的时间
（1）time.localtime() 本地时间
（2）time.gmtime() UTC世界统一时间
北京时间比UTC早8个小时

import time

local_time = time.localtime()
UTC_time = time.gmtime()
print("local_time = ",local_time)
print("UTC_time = ",local_time)
time.ctime()   #返回本地时间的字符串，更加直观  'Thu Jan 28 16:56:01 2021'

时间戳与计时器
（1）time.time() 返回自纪元以来的秒数，记录sleep
（2）time.perf_counter() 随意选取一个时间点，记录现在时间到该时间点的间隔秒数，记录sleep
（3）time.process_time() 随意选取一个时间点，记录现在时间到该时间点的间隔秒数，不记录sleep
perf_counter()精度较time()更高一些，一般time()就够用了

t1_start = time.time()
t2_start = time.perf_counter()
t3_start = time.process_time()

print(t1_start)
print(t2_start)
print(t3_start)

res = 0
for i in range(1000000):
    res += i
    
time.sleep(5)  #停止5s
t1_end = time.time()
t2_end = time.perf_counter()
t3_end = time.process_time()

print("time方法：{:.3f}秒".format(t1_end-t1_start))
print("perf_counter方法：{:.3f}秒".format(t2_end-t2_start))
print("process_time方法：{:.3f}秒".format(t3_end-t3_start))

格式化
time.strftime 自定义格式化输出

lctime = time.localtime()
time.strftime("%Y-%m-%d %A %H:%M:%S", lctime)   #'2021-01-28 Thursday 17:22:49'

睡觉觉
time.sleep() #以秒为单位，程序停留时间

2 random库

随机数在计算机应用中非常常见
python通过random库提供各种伪随机数
基本可以用于除加密解密算法外的大多数工程应用

随机种子——seed(a=None)
（1）相同的随机种子会产生相同的随机数
（2）如果不设置随机种子，以系统当前时间为默认值

from random import *
seed(10)
print(random())  # 0.5714025946899135
seed(10)
print(random())  # 0.5714025946899135

from random import *
# 不设置随机种子，以当前时间产生随机数
print(random())  # 0.4288890546751146

产生随机整数
（1）randint(a, b)——产生[a,b]之间的随机整数

numbers = [randint(0, 10) for i in range(10)]
numbers   #[9, 0, 3, 7, 7, 4, 10, 2, 0, 8]

（2）randrange(a)——产生[0,a)之间的随机整数

numbers = [randrange(10) for i in range(10)]
numbers   #[7, 5, 1, 3, 5, 0, 6, 2, 9, 5]

（3）randrange(a, b, step)——产生[a,b)之间以step为步长的随机整数

numbers = [randrange(1, 10, 2) for i in range(10)]
numbers   #[7, 5, 1, 3, 5, 0, 6, 2, 9, 5]

产生随机浮点数
（1）random()——产生[0.0,1.0)之间的随机浮点数

numbers = [random() for i in range(10)]
numbers 
'''[0.8981962445391883,
 0.31436380495645067,
 0.5489821840124055,
 0.43603095762412225,
 0.06499417612685054,
 0.5845462257019302,
 0.8440678976619022,
 0.1564189183874064,
 0.2242989686860415,
 0.41287020771484073]'''

（2）uniform(a, b)——产生[a,b]之间的随机浮点数

numbers = [uniform(2.3, 4.5) for i in range(10)]
numbers 
'''[2.381234711928984,
 3.3925276969564058,
 4.09956230174519,
 3.7473600652700396,
 3.4736642184360984,
 4.18127662970192,
 2.629313896937083,
 3.5479180490988727,
 3.1231845990457954,
 3.6228700036500068]
'''

序列用函数
（1）choice(seq)——从序列类型中随机返回一个元素

choice(['a','b','c'])  #从列表中随机选择一个元素  'a'
choice("python") #从字符串中随机选取一个元素  'p'

（2）choices(seq, weights=None,k)——对序列类型进行k次重复采样，可设置权重

choices(['a','b','c'], k = 5)  #['b', 'c', 'a', 'b', 'c']
choices(['a','b','c'],[4,4,2], k = 5)  #['b', 'b', 'a', 'b', 'c']  每个元素对应一个权重，权重数值越大，被选中的概率越高

（3）shuffle(seq)——将序列类型中的元素随机排列，返回打乱后的序列

numbers = ['a','b','c']
shuffle(numbers)
numbers  #['a', 'c', 'b']

（4）sample(pop, k)——从pop类型中随机选取k个元素，以列表类型返回

numbers = ['a','b','c','d']
a = sample(numbers, k=3)
a  #['b', 'd', 'c']

概率分布——以高斯分布为例
gauss(mean, std)——生产一个符合高斯分布的随机数

number = gauss(0, 1)
number  #-0.3351949808801888

#多生成几个，画图
import matplotlib.pyplot as plt
res = [gauss(0, 1) for i in range(100000)]
plt.hist(res, bins = 1000)
plt.show()

运算结果：
在这里插入图片描述
例1：用random库实现简单的微信红包分配

import random

def red_packet(total, num):   #金额，个数
    for i in range(1, num):
        per = random.uniform(0.01, total/(num-i+1)*2)   #保证每个人获得红包的期望是total/num
        total = total - per
        print("第{}位红包金额：{:.2f}元".format(i, per))
    else:
        print("第{}位红包金额：{:.2f}元".format(num, total))

red_packet(10, 5)
'''
第1位红包金额：3.18元
第2位红包金额：0.37元
第3位红包金额：1.60元
第4位红包金额：3.94元
第5位红包金额：0.91元'''

import random
import numpy as np

def red_packet(total, num):   #金额，个数
    ls = []
    for i in range(1, num):
        per = random.uniform(0.01, total/(num-i+1)*2)   #保证每个人获得红包的期望是total/num
        ls.append(per)
        total = total - per
    else:
        ls.append(total)
        
    return ls
        
# 重复发10万次红包，统计每个位置的平均值（约等于期望）
res = []
for i in range(100000):
    ls = red_packet(10, 5)
    res.append(ls)

res = np.array(res)
np.mean(res, axis = 0)
#结果array([2.00938802, 2.00648529, 1.99881148, 1.99475799, 1.99055721])

例2：生产4位由数字和英文字母构成的验证码

import random
import string

print(string.digits)
print(string.ascii_letters)

s = string.digits + string.ascii_letters
v = random.sample(s, 4)
print(v)
print(''.join(v))  #聚合输出
'''
0123456789
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
['X', 'b', '7', 'f']
Xb7f
'''

3 collections库——容器数据类型

namedtuple——具名元组
点的坐标，仅看数据，很难知道表达的是一个点的坐标
构建一个新的元组子类
定义方法如下：typename是元组名字，field_names是域名

#collections.namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)
import collections
Point = collections.namedtuple("Point",["x","y"])
p = Point(1, y=2)
p  #Point(x=1, y=2)

可以调用属性

print(p.x)  #1
print(p.y)  #2

有元组的性质

print(p[0])  #1
print(p[1])  #2
x, y = p
print(x)     #1
print(y)     #2

确实是元组的子类

print(isinstance(p, tuple))  #True

例：模拟扑克牌（除大小王）

import collections
Card = collections.namedtuple("Card",["rank","suit"])
ranks = [str(n) for n in range(2, 11)] +list("JQKA")
suits = "spades diamods clubs hearts".split()
print("ranks", ranks)
print("suits", suits)
cards = [Card(rank, suit) for rank in ranks for suit in suits]
cards
'''
结果
ranks ['2', '3', '4', '5', '6', '7', '8', '9', '10', 'J', 'Q', 'K', 'A']
suits ['spades', 'diamods', 'clubs', 'hearts']
[Card(rank='2', suit='spades'),
 Card(rank='2', suit='diamods'),
 Card(rank='2', suit='clubs'),
 Card(rank='2', suit='hearts'),
 Card(rank='3', suit='spades'),
 Card(rank='3', suit='diamods'),
 Card(rank='3', suit='clubs'),
 Card(rank='3', suit='hearts'),
 Card(rank='4', suit='spades'),
 Card(rank='4', suit='diamods'),
 Card(rank='4', suit='clubs'),
 Card(rank='4', suit='hearts'),
 Card(rank='5', suit='spades'),
 Card(rank='5', suit='diamods'),
 Card(rank='5', suit='clubs'),
 Card(rank='5', suit='hearts'),
 Card(rank='6', suit='spades'),
 Card(rank='6', suit='diamods'),
 Card(rank='6', suit='clubs'),
 Card(rank='6', suit='hearts'),
 Card(rank='7', suit='spades'),
 Card(rank='7', suit='diamods'),
 Card(rank='7', suit='clubs'),
 Card(rank='7', suit='hearts'),
 Card(rank='8', suit='spades'),
 Card(rank='8', suit='diamods'),
 Card(rank='8', suit='clubs'),
 Card(rank='8', suit='hearts'),
 Card(rank='9', suit='spades'),
 Card(rank='9', suit='diamods'),
 Card(rank='9', suit='clubs'),
 Card(rank='9', suit='hearts'),
 Card(rank='10', suit='spades'),
 Card(rank='10', suit='diamods'),
 Card(rank='10', suit='clubs'),
 Card(rank='10', suit='hearts'),
 Card(rank='J', suit='spades'),
 Card(rank='J', suit='diamods'),
 Card(rank='J', suit='clubs'),
 Card(rank='J', suit='hearts'),
 Card(rank='Q', suit='spades'),
 Card(rank='Q', suit='diamods'),
 Card(rank='Q', suit='clubs'),
 Card(rank='Q', suit='hearts'),
 Card(rank='K', suit='spades'),
 Card(rank='K', suit='diamods'),
 Card(rank='K', suit='clubs'),
 Card(rank='K', suit='hearts'),
 Card(rank='A', suit='spades'),
 Card(rank='A', suit='diamods'),
 Card(rank='A', suit='clubs'),
 Card(rank='A', suit='hearts')]
'''
# 利用random进行洗牌
from random import *
shuffle(cards)  #洗牌
choice(cards)   #随机抽一张牌
k=sample(cards, k = 5)  #随机抽5张牌

Counter——计数器工具

from collections import Counter
s = "牛奶奶找刘奶奶买牛奶"
colors = ["red", "blue", "red", "green", "blue", "blue"]
cnt_str = Counter(s)
cnt_color = Counter(colors)
print(cnt_str)
print(cnt_color)
'''
Counter({'奶': 5, '牛': 2, '找': 1, '刘': 1, '买': 1})
Counter({'blue': 3, 'red': 2, 'green': 1})
'''

是字典的一个子类

print(isinstance(Counter(), dict))   #True

最常见统计——most_common(n)
提供n个频率最高的元素和计数

cnt_color.most_common(2)  #[('blue', 3), ('red', 2)]

元素展开——elements()

list(cnt_str.elements())  #['牛', '牛', '奶', '奶', '奶', '奶', '奶', '找', '刘', '买']

其他一些加减操作

c = Counter(a=3, b=1)
d = Counter(a=1, b=2)
c+d  #Counter({'a': 4, 'b': 3})

例：从一副牌中抽取10张，大于10的比例有多少

cards = collections.Counter(tens=16, low_card=36)
seen = sample(list(cards.elements()), k=10)
print(seen)
seen.count('tens')/10

deque——双向队列
列表访问数据非常快
但插入和删除操作非常慢——通过移动元素位置来实现
特别是insert(0,v)和pop(0),在列表开始进行的插入和删除操作
双向队列可以方便的在队列两边高效的增加和删除元素

from collections import deque

d = deque('cde')
d  #deque(['c', 'd', 'e'])

d.append('f')   #默认加在右边
d.append('g')
d.appendleft('b')   #加在队列左边
d.appendleft('a')
d #deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])

deque 其他用法可参考官方文档

4 itertools库——迭代器

排列组合迭代器
（1）product——笛卡尔积

import itertools
for i in itertools.product('ABC', '01'):   #ABC中的一个跟01中的一个排列组合
    print(i)
    '''
('A', '0')
('A', '1')
('B', '0')
('B', '1')
('C', '0')
('C', '1')
    '''
    
for i in itertools.product('ABC',repeat=3):  #相当于有3个ABC去排列组合
    print(i)

（2）permutations——排列

for i in itertools.permutations('ABCD', 3):    #3是排列的长度
    print(i)
    
for i in itertools.permutations(range(3)):    #0、1、2进行排列
    print(i)
'''
('A', 'B', 'C')
('A', 'B', 'D')
('A', 'C', 'B')
('A', 'C', 'D')
('A', 'D', 'B')
('A', 'D', 'C')
('B', 'A', 'C')
('B', 'A', 'D')
('B', 'C', 'A')
('B', 'C', 'D')
('B', 'D', 'A')
('B', 'D', 'C')
('C', 'A', 'B')
('C', 'A', 'D')
('C', 'B', 'A')
('C', 'B', 'D')
('C', 'D', 'A')
('C', 'D', 'B')
('D', 'A', 'B')
('D', 'A', 'C')
('D', 'B', 'A')
('D', 'B', 'C')
('D', 'C', 'A')
('D', 'C', 'B')
(0, 1, 2)
(0, 2, 1)
(1, 0, 2)
(1, 2, 0)
(2, 0, 1)
(2, 1, 0)
'''

（3）combinations——组合

for i in itertools.combinations('ABCD', 2):   #2是组合的长度,与排列不同
    print(i)

for i in itertools.combinations(range(4), 3):   #2是组合的长度,与排列不同
    print(i)
'''
('A', 'B')
('A', 'C')
('A', 'D')
('B', 'C')
('B', 'D')
('C', 'D')
(0, 1, 2)
(0, 1, 3)
(0, 2, 3)
(1, 2, 3)
'''

（4）combinations_with_replacement——元素可重复

for i in itertools.combinations_with_replacement('ABC', 2):   
    print(i)
'''
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')
'''

for i in itertools.product('ABC',repeat=2):  #相当于有2个ABC去排列组合,与上面的做区分
    print(i)
'''
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'B')
('B', 'C')
('C', 'A')
('C', 'B')
('C', 'C')
'''

拉链
（1）zip——短拉链

for i in zip('ABC','012','xyz'):   #注意zip是内置的，不需要加itertools
    print(i)
'''
('A', '0', 'x')
('B', '1', 'y')
('C', '2', 'z')
'''

长度不一时，执行到最短的对象处就停止
（2）zip_longest——长拉链
长度不一时，执行到最长的对象处就停止，缺省元素用None或者指定字符代替

 import itertools
for i in itertools.zip_longest('ABC','012345'):
    print(i)
'''
('A', '0')
('B', '1')
('C', '2')
(None, '3')
(None, '4')
(None, '5')
'''

import itertools
for i in itertools.zip_longest('ABC','012345',fillvalue='!'):
    print(i)
'''
('A', '0')
('B', '1')
('C', '2')
('!', '3')
('!', '4')
('!', '5')
'''

无穷迭代器
（1）count(start=0,step=1)——计数
创建一个迭代器，从start开始，返回均匀间隔的值

itertools.count(1)

（2）cycle(iterable)——循环
创建一个迭代器，返回iterable中的所有元素，无限重复

itertools.cycle('ABC')

（3）repeat(object[.times])——重复

for i in itertools.repeat(10,3):
    print(i)
'''
10
10
10
'''

其他
（1）chain(iterables)——锁链
把一组迭代对象串联起来，形成一个更大的迭代器

for i in itertools.chain('ABC',[1,2,3]):
    print(i)
 '''
A
B
C
1
2
3
 '''

（2）enumerate(iterable,start=0)——枚举（python内置）
产生出由两个元素组成的元组，结构是(index,item)，其中index从start开始，item从iterable中取

for i in enumerate('Python',start=1):
    print(i)
'''
(1, 'P')
(2, 'y')
(3, 't')
(4, 'h')
(5, 'o')
(6, 'n')
'''

（3）groupby(iterable,key=None)——分组
创建一个迭代器，按照key指定的方式，返回iterable中连续的键和组
一般来说，要先对数据进行排序
key为None默认把连续重复元素分组

for key,group in itertools.groupby('AAAAABBCCCDDDDDDAACCCC'):
    print(key,list(group))
'''
A ['A', 'A', 'A', 'A', 'A']
B ['B', 'B']
C ['C', 'C', 'C']
D ['D', 'D', 'D', 'D', 'D', 'D']
A ['A', 'A']
C ['C', 'C', 'C', 'C']
'''

animals = ["duck", "eagle", "rat", "giraffe", "bear", "bat", "dolphin", "shark", "lion"]
animals.sort(key=len)    #先按照单词长度进行排序
print(animals)
for key,group in itertools.groupby(animals, key=len):   #再按照单词长度进行分类
    print(key, list(group))
'''
['rat', 'bat', 'duck', 'bear', 'lion', 'eagle', 'shark', 'giraffe', 'dolphin']
3 ['rat', 'bat']
4 ['duck', 'bear', 'lion']
5 ['eagle', 'shark']
7 ['giraffe', 'dolphin']
'''

animals = ["duck", "eagle", "rat", "giraffe", "bear", "bat", "dolphin", "shark", "lion"]
animals.sort(key=lambda x: x[0])    #先按照单词首字母进行排序
print(animals)
for key,group in itertools.groupby(animals, key=lambda x: x[0]):   #再按照单词首字母进行分类
    print(key, list(group))
'''
['bear', 'bat', 'duck', 'dolphin', 'eagle', 'giraffe', 'lion', 'rat', 'shark']
b ['bear', 'bat']
d ['duck', 'dolphin']
e ['eagle']
g ['giraffe']
l ['lion']
r ['rat']
s ['shark']
'''

小明同学的杂货铺

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Python基础+数据科学入门（九）标准库

声明：该博客参考深度之眼的视频课程，如有侵权请联系小编删除博文，谢谢！若总结有所失误，还请见谅，并欢迎及时指出。python标准库1 time库python处理时间的标准库获取现在的时间（1）time.localtime() 本地时间（2）time.gmtime() UTC世界统一时间北京时间比UTC早8个小时import timelocal_time = time.localtime()UTC_time = time.gmtime()print("local_time =
复制链接

扫一扫