python实战打卡---day9-CSDN博客

本文链接：https://blog.csdn.net/liang0502/article/details/125724942

月第一天

from datetime import date
mydate = date.today()
month_first_day = date(mydate.year, mydate.month, 1)
print(f"当⽉第⼀天:{month_first_day}\n") # 当⽉第⼀天:2022-07-01

月最后一天

from datetime import date
import calendar
mydate = date.today()
_,days = calendar.monthrange(mydate.year, mydate.month)
month_last_day = date(mydate.year, mydate.month, days)
print(f"当⽉最后⼀天:{month_last_day}\n") # 当⽉最后⼀天:2022-07-31

获取当前时间

from datetime import date, datetime
from time import localtime,strftime
today_date = date.today()
print(today_date) # 2022-07-11
today_time = datetime.today()
print(today_time) # 2022-07-11 15:58:11.307528
local_time = localtime()
print(strftime("%Y-%m-%d %H:%M:%S", local_time)) # 2022-07-11 15:58:11

字符时间转时间

from time import strptime

struct_time = strptime('2019-12-22 10:10:08', "%Y-%m-%d %H:%M:%S")
print(struct_time) # time.struct_time(tm_year=2019, tm_mon=12, tm_mday=22, tm_hour=10, tm_min=10, tm_sec=8, tm_wday=6, tm_yday=356, tm_isdst=-1)

时间转字符时间

from time import strftime, strptime, localtime
print(localtime()) # time.struct_time(tm_year=2022, tm_mon=7, tm_mday=11, tm_hour=16, tm_min=1, tm_sec=28, tm_wday=0, tm_yday=192, tm_isdst=0)
print(strftime("%m-%d-%Y %H:%M:%S", localtime())) # 转化为定制的格式 07-11-2022 16:01:40

默认启动主线程

⼀般的，程序默认执⾏只在⼀个线程，这个线程称为主线程，例⼦演⽰如下：导⼊线程相关的模块 threading:
```
import threading
```
threading的类⽅法 current_thread()返回当前线程：
```
t = threading.current_thread()
print(t) # <_MainThread(MainThread, started 17128)>
```
所以，验证了程序默认是在 MainThead中执⾏。

t.getName()获得这个线程的名字，其他常⽤⽅法， getName()获得线程 id, isAlive()判断线程是否存活等。
```
print(t.getName()) # MainThread
print(t.ident) # 17128
print(t.isAlive()) # True
```
以上这些仅是介绍多线程的背景知识，因为到⽬前为⽌，我们有且仅有⼀个实际使用的主线程。

创建线程

# 创建一个线程
my_thread = threading.Thread()
# 创建一个名称为my_thread的线程
my_thread = threading.Thread(name='my_thread')

创建线程的⽬的是告诉它帮助我们做些什么，做些什么通过参数 target传⼊，参数类型为callable，函数就是可调⽤的：

def print_i(i):
    print('打印i:%d'%(i,))
my_thread1 = threading.Thread(target=print_i,args=(1,2,))

my_thread线程已经全副武装，但是我们得按下发射按钮，启动start()，它才开始真正起飞。

my_thread().start() 
'''
打印结果如下，其中 args指定函数 print_i需要的参数i，类型为元祖。

打印i:1
'''

⾄此，多线程相关的核⼼知识点，已经总结完毕。但是，仅仅知道这些，还不够！光纸上谈兵，当然远远不够。接下来，聊聊应⽤多线程编程，最本质的⼀些东西。

交替获得CPU时间片：为了更好解释，假定计算机是单核的，尽管对于 cpython，这个假定有些多余。开辟3个线程，装到 threads中:

import time
from datetime import datetime
import threading
def print_time():
    for _ in range(5): # 在每个线程中打印5次
        time.sleep(0.1) # 模拟打印前的相关处理逻辑
        print('当前线程%s,打印结束时间为:%s'%
                (threading.current_thread().getName(),datetime.today()))
threads = [threading.Thread(name='t%d'%(i,),target=print_time) for i in range(3)]
# 启动三个线程
[t.start() for t in threads]
'''
[None, None, None]
当前线程t2,打印结束时间为:2022-07-11 16:14:55.176615
当前线程t1,打印结束时间为:2022-07-11 16:14:55.177615当前线程t0,打印结束时间为:2022-07-11 16:14:55.178615

当前线程t0,打印结束时间为:2022-07-11 16:14:55.287699
当前线程t1,打印结束时间为:2022-07-11 16:14:55.287699
当前线程t2,打印结束时间为:2022-07-11 16:14:55.287699
当前线程t2,打印结束时间为:2022-07-11 16:14:55.396800当前线程t0,打印结束时间为:2022-07-11 16:14:55.396800
当前线程t1,打印结束时间为:2022-07-11 16:14:55.396800

当前线程t1,打印结束时间为:2022-07-11 16:14:55.506866当前线程t0,打印结束时间为:2022-07-11 16:14:55.506866

当前线程t2,打印结束时间为:2022-07-11 16:14:55.506866
当前线程t2,打印结束时间为:2022-07-11 16:14:55.617172当前线程t0,打印结束时间为:2022-07-11 16:14:55.617172
当前线程t1,打印结束时间为:2022-07-11 16:14:55.617172
'''

多线程抢夺同一个变量

多线程编程，存在抢夺同⼀个变量的问题。⽐如下⾯例⼦，创建的10个线程同时竞争全局变量 a :
```
import threading
a = 0
def add1():
    global a
    a += 1
    print('%s adds a to 1: %d'%(threading.current_thread().getName(),a))
threads = [threading.Thread(name='t%d'%(i,),target=add1) for i in range(10)]
[t.start() for t in threads]
'''
t0 adds a to 1: 1
t1 adds a to 1: 2
t2 adds a to 1: 3
t3 adds a to 1: 4
t4 adds a to 1: 5
t5 adds a to 1: 6
t6 adds a to 1: 7
t7 adds a to 1: 8
t8 adds a to 1: 9
t9 adds a to 1: 10
[None, None, None, None, None, None, None, None, None, None]
'''
```
结果⼀切正常，每个线程执⾏⼀次，把 a 的值加1，最后 a 变为10，⼀切正常。运⾏上⾯代码⼗⼏遍，⼀切也都正常。所以，我们能下结论：这段代码是线程安全的吗？NO！多线程中，只要存在同时读取和修改⼀个全局变量的情况，如果不采取其他措施，就⼀定不是线程安全的。尽管，有时，某些情况的资源竞争，暴露出问题的概率极低极低：本例中，如果线程0 在修改a后，其他某些线程还是get到的是没有修改前的值，就会暴露问题。但是在本例中， a = a + 1 这种修改操作，花费的时间太短了，短到我们⽆法想象。所以，线程间轮询执⾏时，都能get到最新的a值。所以，暴露问题的概率就变得微乎其微。
代码稍作修改，问题暴露出来

只要弄明⽩问题暴露的原因，叫问题出现还是不困难的。想象数据库的写⼊操作，⼀般需要耗费我们可以感知的时间。为了模拟这个写⼊动作，简化期间，我们只需要延长修改变量 a 的时间，问题很容易就会还原出来:
```
import threading
import time
a = 0
def add1():
    global a
    tmp = a + 1
    time.sleep(0.2) # 延时0.2秒，模拟写⼊所需时间
    a = tmp
    print('%s adds a to 1: %d'%(threading.current_thread().getName(),a))
threads = [threading.Thread(name='t%d'%(i,),target=add1) for i in range(10)]
[t.start() for t in threads]
'''
[None, None, None, None, None, None, None, None, None, None]
t9 adds a to 1: 1t8 adds a to 1: 1
t6 adds a to 1: 1
t0 adds a to 1: 1t1 adds a to 1: 1
t4 adds a to 1: 1t7 adds a to 1: 1
t2 adds a to 1: 1

t3 adds a to 1: 1


t5 adds a to 1: 1
'''
```
看到，10个线程全部运⾏后， a 的值只相当于⼀个线程执⾏的结果。下⾯分析，为什么会出现上⾯的结果：这是⼀个很有说服⼒的例⼦，因为在修改a前，有0.2秒的休眠时间，某个线程延时后，CPU⽴即分配计算资源给其他线程。直到分配给所有线程后，根据结果反映出，0.2秒的休眠时长还没耗尽，这样每个线程get到的a值都是0，所以才出现上⾯的结果。
加上一把锁，避免出现以上情况

知道问题出现的原因后，要想修复问题，也没那么复杂。通过python中提供的锁机制，某段代码只能单线程执⾏时，上锁，其他线程等待，直到释放锁后，其他线程再争锁，执⾏代码，释放锁，重复以上。创建⼀把锁 locka:
```
import threading
import time
locka = threading.Lock()
```
通过 locka.acquire() 获得锁，通过 locka.release()释放锁，它们之间的这些代码，只能单线程执⾏。
```
a = 0
def add1():
    global a
    try:
        locka.acquire() # 获得锁
        tmp = a + 1
        time.sleep(0.2) # 延时0.2秒，模拟写⼊所需时间
        a = tmp
    finally:
        locka.release() # 释放锁
    print('%s adds a to 1: %d'%(threading.current_thread().getName(),a))
threads = [threading.Thread(name='t%d'%(i,),target=add1) for i in range(10)]
[t.start() for t in threads]
'''
[None, None, None, None, None, None, None, None, None, None]
t0 adds a to 1: 1
t1 adds a to 1: 2
t2 adds a to 1: 3
t3 adds a to 1: 4
t4 adds a to 1: 5
t5 adds a to 1: 6
t6 adds a to 1: 7
t7 adds a to 1: 8
t8 adds a to 1: 9
t9 adds a to 1: 10
'''
```
一切正常，其实这已经是单线程顺序执⾏了，就本例⼦⽽⾔，已经失去多线程的价值，并且还带来了因为线程创建开销，浪费时间的副作⽤。程序中只有⼀把锁，通过 try…finally还能确保不发⽣死锁。但是，当程序中启⽤多把锁，还是很容易发⽣死锁。注意使⽤场合，避免死锁，是我们在使⽤多线程开发时需要注意的⼀些问题。
1分钟掌握time模块

time 模块提供时间相关的类和函数记住⼀个类： struct_time，9 个整数组成的元组记住下⾯ 5 个最常⽤函数⾸先导⼊ time模块：
```
import time
seconds=time.time()
seconds # 1657527834.3223767
local_time = time.localtime(seconds)
local_time # time.struct_time(tm_year=2022, tm_mon=7, tm_mday=11, tm_hour=16, tm_min=23, tm_sec=54, tm_wday=0, tm_yday=192, tm_isdst=0)
str_time = time.asctime(local_time)
str_time # 'Mon Jul 11 16:23:54 2022'
format_time = time.strftime('%Y-%m-%d %H:%M:%S',local_time)
format_time # '2022-07-11 16:23:54'
str_to_struct = time.strptime(format_time,'%Y-%m-%d %H:%M:%S')
str_to_struct # time.struct_time(tm_year=2022, tm_mon=7, tm_mday=11, tm_hour=16, tm_min=23, tm_sec=54, tm_wday=0, tm_yday=192, tm_isdst=-1)
```
最后再记住常⽤字符串格式常⽤字符串格式 %m：⽉ %M: 分钟:

%Y Year with century as a decimal number.
%m Month as a decimal number [01,12].
%d Day of the month as a decimal number [01,31].
%H Hour (24-hour clock) as a decimal number [00,23].
%M Minute as a decimal number [00,59].
%S Second as a decimal number [00,61].
%z Time zone offset from UTC.
%a Locale’s abbreviated weekday name.
%A Locale’s full weekday name.
%b Locale’s abbreviated month name.

4G内存处理10G大小的文件

4G 内存处理 10G ⼤⼩的⽂件，单机怎么做？下⾯的讨论基于的假定：可以单独处理⼀⾏数据，⾏间数据相关性为零。

⽅法⼀：仅使⽤ Python 内置模板，逐⾏读取到内存。使⽤ yield，好处是解耦读取操作和处理操作:

def python_read(filename):
    with open(filename,'r',encoding='utf-8') as f:
        while True:
            line = f.readline()
            if not line:
                return
            yield line

以上每次读取⼀⾏，逐⾏迭代，逐⾏处理数据：

if __name__ == '__main__':
    g = python_read(文件地址)
    for c in g:
        print(c)

⽅法⼀有缺点，逐⾏读⼊，频繁的 IO 操作拖累处理效率。是否有⼀次 IO ，读取多⾏的⽅法？pandas 包 read_csv 函数，参数有 38 个之多，功能⾮常强⼤。关于单机处理⼤⽂件， read_csv 的 chunksize 参数能做到，设置为 5 ，意味着⼀次读取 5 ⾏。

def pandas_read(filename,sep=',',chunksize=5):
    reader = pd.read_csv(filename,sep,chunksize=chunksize)
    while True:
        try:
            yield reader.get_chunk()
        except StopIteration:
            print('---Done---')
            break

使⽤如同⽅法⼀：

if __name__ == '__main__':
    g = pandas_read(文件地址,sep="::")
    for c in g:
        print(c)

以上就是单机处理⼤⽂件的两个⽅法，推荐使⽤⽅法⼆，更加灵活。除了⼯作中会⽤到，⾯试中也有时被问到。