Tod__

M1ss_He

已于 2023-02-16 17:24:53 修改

阅读量1.1k

点赞数

分类专栏： python 文章标签： python

于 2018-11-26 20:46:25 首次发布

本文链接：https://blog.csdn.net/Rubik_Wong/article/details/84556154

版权

python 专栏收录该内容

36 篇文章 0 订阅

订阅专栏

md，感觉自己不会的太tm多了，https://zhuanlan.zhihu.com/p/52656990

with open 语法的close 底层如何实现

：实现了 _enter_() 和 _exit_() 方法的对象都可称之为上下文管理器，上下文管理器对象可以使用 with 关键字。显然，文件（file）对象也实现了上下文管理器。ref

gil锁和多线程的关系

我的总结+++：gil保证的是在同一时间点内只有一个线程执行bytescode，不会存在被多个线程修改的问题，但是，多线程下如果没有上锁，一个语句是几个bytecodes代码很有可能执行到其中某一行就换了另一个线程执行，这样的情况下，会造成数据的不安全，stackoverflow里说gil保证的是解释器内部的一致，而更大一点的一致书需要的，也就是我们所需要的上锁，之前面试的时候他说，你说的这个num+1 -1跟gil没有什么关系，是因为gil是解释器内部的一致性，而不是更高层面上的，更高层面上的仍然需要加lock来进行。（为什么x±1会出问题呢看下面的****的解释，现在就只剩下gil的引用计数层面的解释了，是race condition 还是ref counting），一来是如果遇到io的等待就会切换线程，有可能操作这个变量，二来，时间一到，线程也会释放gil然后其他线程再次争夺gil，如果某个线程抢到gil，开始执行，那么发现需要的变量有lock那么就阻塞，等待，切换线程直到第一个lock的线程处理完，然后释放lock，其他线程才可以开始操作这个变量。
~~~：在python的91建议那本书说如果改变ref count 会造成错误，这里我想和下面那个stackoverflow的回答的加法其实差不多，然后综合别人的说法，我猜测，gil是保证bytecodes层面上的原子操作，就是说，一条bytecode语句，其中可能包括两个步骤一个是减少ref count，一个是检测它是否为0，0即释放。这里gil保证bytecode肯定是执行一条或者不执行一条。在上层的就是python statements上的lock对单个变量的锁定，中间不许打断，让它变成类似原子操作。（存疑，11/29）

：An interpreter that uses GIL always allows exactly one thread to execute at a time, even if run on a multi-core processor.
：Applications running on implementations with a GIL can be designed to use separate processes to achieve full parallelism, as each process has its own interpreter and in turn has its own GIL. Otherwise, the GIL can be a significant barrier to parallelism. 如果想要实现并行，只能使用多进程了，每个进程/解释器有单独的gil.
: 使用多线程来实现并行计算来缩短计算时间时，只要在多核CPU下才行，单核CPU下启用多线程最终总的计算计算一样，因为CPU在同一时间，只能服务于一个线程.

python翻译成字节码后每一条语句是bytescodes，如果没有gil也许会导致语句结果发生一些变化，在字节码执行的中间切换了线程会出现数据不准确的问题，就是所谓的线程不安全（引用计数可能会发生变化）（这句话是存疑，还不能完全的确定），不是原子操作。

在单核CPU下运用多线程仅仅能实现快速响应用户的请求，避免因io或网络阻塞而导致界面停留卡顿。
关于为什么多线程加锁***重点看
！(https://stackoverflow.com/questions/40072873/why-do-we-need-locks-for-threads-if-we-have-gil/40072999#40072999)
python原子操作
 单核多线程和多核多线程
 python gil
关于gil锁的字节码解释
 有gil后多线程还需要上锁吗
 谈python的GIL、多线程、多进程
 gil和lock
+++ GIL 只保证单个的 PyObject 的状态是一致的，但是你写程序的时候，会有一些 PyObject 作为整体的状态和它们之间的关系需要保证一致性，就会需要为此专门加个锁了。
+++ GIL 保证的是 PyObject 的 ref count 不会有问题。它并没有在更大的尺度上做任何保证（不然你多个对象之间需要同步，这除了你自己谁知道要怎么保证同步？）

~~~> 这里关于ref counting 有了新的认识
Why does CPython need a global lock?
Reference counting and garbage collection in Python
上面这篇有如下描述

Python memory allocation is not thread safe (it should be easy to make it thread safe)
The reference counter of each object is protected by the GIL.
CPython has a lot of global C variables. Examples:
- interp is a structure which contains variables of the Python + interpreter: —modules, list of Python threads, builtins, etc.
- int singletons (-5…255)
- str singletons (Python 3: latin1 characters)
Some third party C libraries and even functions the C standard library are not thread safe: the GIL works around this limitation.
~~~> 我的理解，内存管理不是线程安全，如果没有gil，多核可以调用同一对象，那么ref counting 会出错，导致内存释放。(这里没懂，就算是有gil如果在一句bytescode中减少了引用计数，那么另一个线程操作这个变量的时候，不还是会出错？) 在这片文章中，指明pyhton中的 ref count 是指 bytecode中的引用数量(占用数量)，但是这更解决不了我上面的疑惑了。不过gil 如上面的列表所说，cpython中有许多 global var 也需要gil 来保证它们不会被乱改…(待续…感觉还要看python的源码分析…)

多线程跑爬虫
：IO多路复用，协程

csrf 比防护更重要的它如何会发生在浏览器中其他网站跳往一个网站会拿到cookie么，一个网站的cookie和另一个网站的cookie的能跨网站拿到么

装饰器和公共放的区别（闭包）
：方便在一个函数前后干点与被装饰函数无关的东西，打log或者 starttime endtime等，如果用公共方法，那么就可能在调用的时候，上下加上，就要重新写一个funcc 对比于func。
：其实一样，只不过把写法简化了，就是用函数名代替包装函数(函数名)这样，返回一个func obj 然后正常调用，因为可以传参数进去，如果装饰器函数要传参数进去就在包一层，因为闭包包含外部引用的变量，所以可以。
：ref ref2

我的理解：装饰本质上是语法糖，利用闭包这种可以携带外部引用的机制实现的
装饰原本函数，并且通过原本函数名进行调用,函数本身也是对象。
作用是做一些与logic无关的工作，看起来就是简化代码了，而且可以复用，
本质上是语法糖，就是方便了点。打log或者算运行时间这样。

'''
def decrator(func):
    def fooo(number):
        def wrapper(args):
            print ('<**>',number,type(number))
            ---> print  number is a foo func obj,
            所以这个语法糖是解析成了29行的样子
            for x in range(number):
                print ('befor call func  args-->',args)
                func(args)
                print ('after call func  args-->',args)
        return wrapper
    return fooo
'''

def decrator(number):
    def fooo(func):
        def wrapper(args):
            print ('<**>',number,type(number))
            for x in range(number):
                print ('befor call func  args-->',args)
                func(args)
                print ('after call func  args-->',args)
        return wrapper
    return fooo

@decrator(3)
def foo(name):
    print ('i am lilith',name)
# foo = decrator(args)(foo)
# foo = decrator(foo)
# 带参数的装饰器(不是被装饰函数，是decraotr func)，需要在普通的decrator上再套一层闭包，跟line29一样。而不是注释
# 中的那种，这点要注意
foo('rubik')
-------------------
import functools
def decrator(func):
        @functools.wraps(func)
        # Without the use of this decorator factory, the name of the example function would have been 'wrapper',
        # and the docstring of the original example() would have been lost.
        # 这里简单一点，就是为了保护 foo func 的 __name__, __doc__ 不丢失
        def wrapper(args):
            print ('befor call func  args-->',args)
            func(args)
            print ('after call func  args-->',args)
        return wrapper

@decrator
def foo(num):
    """ foo's doc"""
    print ('i am lilith',num)

foo(3)
print (foo.__name__,foo.__doc__)
('befor call func  args-->', 3)
('i am lilith', 3)
('after call func  args-->', 3)
('foo', " foo's doc")

闭包还需要看
：闭包解释
装饰器的单例模式
函数装饰器没理解为什么一个新的函数也能保存那个字典(_instance)(此处为todo)
https://zhuanlan.zhihu.com/p/37534850 (其中的函数实现)
ref中类装饰器的看懂了，本质上是一个类做一个manager，然后把cls传进来，一位内这个manager的存在，可以维护一个dict 通过cls的类地址来判断类实例存在否，不存在创建，存在就return ，然后重写__call__方法即可。

deepcopy() copy()
: list的切片(slice)操作是浅拷贝，拷贝一个对象出来->copy()
copy是拷贝原来的对象，但是如果对象里中有复合对象，那么尽量copy它的引用尽进来
deepcopy拷贝对象，会递归的把真正的每个对象拷贝进来（并且解决了递归拷贝中递归引用的问题，详情见文档）

dict的hash O1是如何做的
现在才发现，之前那家问的也不准确，dict 固然可以通过key的形式执行O(1)的检索查找删除效率，前提是那是有意义的数据，如果是像list里那种无意义数据很多，那么就很难执行O1的操作了。dict的key 一般是存储的val的一部分，也是（大部分时候）其中的唯一标识。key 通过hash func 算出address，在地址中 (key+存储数据)的二元组，这样在有冲突的时候，就可以通过比对key来判断是否有冲突，是否要解决冲突（开地址法或者用桶散列(就是地址中存储的是个链表节点引用，这样link list的val不对就查找next就可以了)），这种情况就是有可能删除的某个地址位置是其他碰撞会遇到的，这个时候，删除的位置放一个特殊的标志位，来表明hash的时候遇到这个标志位就继续进行解决冲突的计算。因为是用过 key 计算出的address，所以直接访问，所以才是O1的时间复杂度。
dict这中数据结构，空间越大，hash分布的越合理，查找效率越高，因为冲突越小，这是一个很典型的空间换时间的例子，而且当负载因子超过2/3的时候会扩容和rehash，打好提前量，一般是现在元素个数的4倍的存储空间 —> ref <数据结构和算法python实现.> 裘宗燕。

可变对象和不可变对象
：Python中有可变对象和不可变对象之分。可变对象创建后可改变但地址不会改变，即变量指向的还是原来的变量；不可变对象创建之后便不能改变，如果改变则会指向一个新的对象。 ref
Python中dict、list是可变对象，str、int、tuple、float是不可变对象。
byte是不可变的序列对象，bytearray是可变的序列对象。序列对象的方法差不多都能用，切片赋值等等。

set 和tuple
：不可变对象，该对象所指向的内存中的值不能被改变。当改变某个变量时候，由于其所指的值不能被改变，相当于把原来的值复制一份后再改变，这会开辟一个新的地址，变量再指向这个新的地址。
可变对象，该对象所指向的内存中的值可以被改变。变量（准确的说是引用）改变后，实际上是其所指的值直接发生改变，并没有发生复制行为，也没有开辟新的出地址，通俗点说就是原地改变。
Python中，数值类型（int和float）、字符串str、元组tuple都是不可变类型。而列表list、字典dict、集合set是可变类型。

集合（set）是一个无序的不重复元素序列。
a = {x for x in ‘abracadabra’ if x not in ‘abc’} 集合能用列表推导。

tuple元组是不可变的列表，没有删除单个元素的功能，只能整体del，不过其中的item仍然可以是可变对象，但是tuple本身不可操作，只能通过其他的引用来操作。

堆排序

数据库表的设计

yeild（生成器）
文档ref
list expression 把[ ] ~> ( ) 就变成了一个生成器，generator，可以用next(generator)来调用，也可以for in 迭代。遇到yeild就会返回然后next()后会继续执行。
一个函数如果带yeild那就是一个generator
可以被next()函数调用并不断返回下一个值的对象称为迭代器：Iterator。
集合数据类型如list、dict、str等是Iterable但不是Iterator，不过可以通过iter()函数获得一个Iterator对象。

When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of the generator function. The execution starts when one of the generator’s methods is called. At that time, the execution proceeds to the first yield expression, where it is suspended again, returning the value of expression_list to the generator’s caller. By suspended, we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, the internal evaluation stack, and the state of any exception handling. When the execution is resumed by calling one of the generator’s methods, the function can proceed exactly as if the yield expression were just another external call. The value of the yield expression after resuming depends on the method which resumed the execution. If next() is used (typically via either a for or the next() builtin) then the result is None. Otherwise, if send() is used, then the result will be the value passed in to that method.
当调用一个生成器的方法时，执行开始。此时，执行继续到第一个.表达式，再次挂起它，将._list的值返回给生成器的调用程序。

In [106]: def func():
     ...:     a,n=0,0
     ...:     while 1:
     ...:         c=yield n
     ...:         print ('before c:{} n:{}'.format(c,n))
     ...:         n+=1
     ...:         print ('after c:{} n:{}'.format(c,n))

遇到yield的时候给caller返回yield 后面表达式的值，初次调用next/send(None)的时候 yield的整个语句表达式的值为None，(next调用 yield 整个exp的值就是None)，如果用send，则整个yield的语句的表达式的值为send的arg，即上面的c的值，可以通过send进行传递，而yield后面的exp是return 给caller的value 即 n。这样的话也许在携程中会有更好的应用。

静态方法和类方法
：静态方法，虽然没用到cls/inst的东西，但是和cls本身联系紧密，不想暴露给外部，如果是个许多类都需要的一个函数，那么就放在utils tools 才对。
字符串的翻转
：切片用栈 append pop list.reverse 等

WSGI
：server和framework之间的协议，pep333 3333描述自己实现wsgi和tornado，django的使用
~~ 我的理解：server收到http request后要到server这里，server把一些envir处理一下加上cgi wsgi的东西，这是一个dict。然后server要调用(创建一个)app，这是一个可iter的对象（这个app就是wsgi定义的app/framework端的对象实现，需要两个参数），然后iter的return value（此处是http response body），看下面的appcls，在return value之前就要调用start response，把environ（也许会再加一点框架需要的东西）和status code 扔进去，然后server那就得到(因为就是在server层面调用的函数（app）并把environ 和 start res当作参数传进去) res line 和 header，然后server又得到return的value，拼装在一起，就可以给上一层，ngnix 或者什么给client
~> msg是通过socket传过来，socket是tcp层，web server要处理原生的bytes然后通过split进行分割，args要写进environ中。然后调用app进行交互，然后return value 最后通过socket进行通信，（这里如果不通过socket，那么就是server去调用nginx的接口把str传给nginx让它来进行这些的处理。）
这里的缓冲区在内核的还有哪里的管道，这个不懂，等待以后的补全吧。

class AppClass:
	def __init__(self, environ, start_response):
	    self.environ = environ
	    self.start = start_response
	def __iter__(self):
	    status = '200 OK'
	    response_headers = [('Content-type', 'text/plain')]
	    self.start(status, response_headers)
	    yield HELLO_WORLD

gateway或者server会调用这个cls/func 传入 environ和start_res 进行调用。
gateway/server 本身运行另一个函数并且把app 当作参数传了进去，client的http访问通过os.environ传了进来，每一次的请求都会去调用func来进行处理，返回的app func返回的内容是个可迭代的流。
如果应用程序返回的可迭代者有close()方法，则不管该请求是正常结束还是由于错误而终止，服务器/gateway都必须在结束该请求之前调用这个方法，（这是用来支持应用程序对资源的释放，This protocol is intended to complement PEP 325’s generator support, and other common iterables with close() methods.）

(Note: the application must invoke the start_response() callable before the iterable yields its first body bytestring, so that the server can send the headers before any body content. However, this invocation may be performed by the iterable’s first iteration, so servers must not assume that start_response() has been called before they begin iterating over the iterable.)
iter method的时候，在一点点的return string bytes的时候一定要先调用start_res方法，即上面的start(status-code,start_res) -> start_res

app func
的第一个参数：environ 字典被用来包含这些在Common Gateway Interface specification 中定义了的CGI环境变量还有一些wsgi自己特有的变量
第二个参数：

The second parameter passed to the application object is a callable of the form start_response(status, response_headers, exc_info=None). (As with all WSGI callables, the arguments must be supplied positionally, not by keyword.) The start_response callable is used to begin the HTTP response, and it must return a write(body_data) callable

Servers should check for errors in the headers at the time start_response is called, so that an error can be raised while the application is still running.
在return str 之前就raise error

However, the start_response callable must not actually transmit the response headers. Instead, it must store them for the server or gateway to transmit only after the first iteration of the application return value that yields a non-empty bytestring, or upon the application’s first invocation of the write() callable. In other words, response headers must not be sent until there is actual body data available, or until the application’s returned iterable is exhausted. (The only possible exception to this rule is if the response headers explicitly include a Content-Length of zero.)

start res 不传输res header 而是存储在server或者gatway 如果在首次调用iter的时候 return 非空字符串的value的时候。换句话说，在有可用的实际正文数据之前，或者在应用程序返回的可迭代结果耗尽之前，不得发送响应头。
这样延迟的头信息传输是为了确保有缓存或异步的应用程序能用出错信息替换掉原来可能要发送的数据，直到最后一刻。例如应用程序可能会替换到头状态’200 OK’为’500 Internal Error’, 如果当body数据是有应用程序缓存构成的但发送了错误。

python 2 3区别
print 从语句编程函数，接受字符串参数。
原来1/2（两个整数相除）结果是0，现在是0.5了，正常除法，//为舍去余数的整除。
3中range()生成类似迭代器的对象（存疑），不会一次放入内存中。
只有byte和unicode的str了。不考虑编码问题。
True False 2中是全局变量，可以被赋值成其他，T=F这样，3变成关键字，不能被重新复制。
nonlocal 非局部变量成为可能。

def func():
    c = 1
    def foo():
        nonlocal c
        c = 12
    foo()
    print(c)
func()   # 12

同步异步，阻塞非阻塞
参考链接同步异步阻塞非阻塞分层的，有进程线程通信的层面，也有系统层面，往上还有用户感知的层面。
我的理解：同步就像线性执行，要等待返回结果的，比如遇到io语句，是要等待io语句的返回结果的，结果没回来之前就是阻塞的，而异步(这里是wiki的解释)：

计算机编程中的异步是指独立于主程序流的事件的发生以及处理此类事件的方式。这些可能是“外部”事件，例如信号的到达，或程序发起的与程序执行同时发生的行为，而程序不会阻止等待结果。[1]异步输入/输出是后一种异步原因的一个示例，并允许程序向处理器继续执行程序时为这些请求提供服务的存储或网络设备发出命令。这样做提供了一定程度的并行性。

就是调用了其他线程(在我做的项目中，也许也可以是其他进程)，不等待返回结果 --> 就是上面引用说的独立于 main program 的处理(func)的方式，是外部的。
这里轮询应该是指非阻塞，就是轮询的查询。(这里存疑)
阻塞是会挂起进程，或者说就是在等待，非阻塞是轮询或者直接执行下去。
~> 我的理解2.0：同步还是一样，顺序执行，然后有io等待io执行到某一部，一定要等待返回结果。异步就是不顺序执行，等外部程序/线程完成后，通过回调函数/ 或者信号通知等方法通知主程序。
阻塞就是线程挂起，或者说等待，在process等待kernel的时候，是挂起了么？还是在等待，这里待续》…？？
update：
参考链接
单线程会阻塞，因为串行，但是多个多线程不受限制，所以可以把一些io操作放在另外的线程，就是只发送一个信号，然后让其他线程去执行这个事情。

搜索原理

redis mecached
redis key value db 能提供持久化，能存到本地，打开服务在从本地读取
tips：str 可以不用加双引号，不过数字如果想当作str 那需要加上双引号
一文学redis操作（记录向）
Python—操作redis
Redis 数据类型介绍
python与redis交互
Redis的各项功能解决了哪些问题？

> set mykey somevalue
OK
> get mykey
"somevalue"
> set mykey somevalue
OK
> get mykey
"somevalue"
## set key value 这种模式 来设置字符串  如果key 对应的value是一个整数，那么可以执行incr操作，INCR 命令将字符串值解析成整型，不能的直接报错 (nil)
## 类似的命令有INCRBY, DECR 和 DECRBY
## INCR是原子操作
## GETSET命令，：他为key设置新值并且返回原值。
127.0.0.1:6379> lrange llist
(error) ERR wrong number of arguments for 'lrange' command
127.0.0.1:6379> lrange llist 0 10
1) "mongodb"
2) "reids"
127.0.0.1:6379> rpush llist cjsb
(integer) 3
## list类型就是  l/r push  然后 l/r pop 
127.0.0.1:6379> lrange llist 0 10
1) "mongodb"
2) "reids"
3) "cjsb"
127.0.0.1:6379> set lilith 222222
OK
127.0.0.1:6379> incr test
(integer) 2334
127.0.0.1:6379> set test fff
OK
127.0.0.1:6379> get test
"fff"
127.0.0.1:6379> incr test
(error) ERR value is not an integer or out of range
127.0.0.1:6379> set test 233
OK
127.0.0.1:6379> incrby test 23333
(integer) 23566
127.0.0.1:6379> getset test 555
"23566"
===========
## 为减少等待时间，也可以一次存储或获取多个key对应的值，使用MSET和MGET命令:
## MGET 命令返回由值组成的数组。
> mset a 10 b 20 c 30
OK
> mget a b c
1) "10"
2) "20"
3) "30"
===========
##有些指令不是针对任何具体的类型定义的，而是用于和整个键空间交互的。因此，它们可被用于任何类型的键。
##使用EXISTS命令返回1或0标识给定key的值是否存在，使用DEL命令可以删除key对应的值，DEL命令返回1或0标识值是被删除(值存在)或者没被删除(key对应的值不存在)。
## 就是  返回值1代表存在 0 代表不存在 del 进行删除
> set mykey hello
OK
> exists mykey
(integer) 1
> del mykey
(integer) 1
> exists mykey
(integer) 0
============
##type 返回 key 对应value的值，不存在就是 none 存在返回类型
> set mykey x
OK
> type mykey
string
> del mykey
(integer) 1
> type mykey
none
============
##  设置过期时间 默认以秒为单位 过期删除
> expire key 5
(integer) 1
============
## ttl用来统计剩余时间，用persist可以提清除过期时间 time to live
> set key 100 ex 10
OK
> ttl key
(integer) 9
============
##  list 这里用 链表实现，因为db更多的时候需要，在大数数据前后增加删除元素，是O(1)操作。
##  l/r push / pop   pop 空的时候 返回 nil(null)  
> rpush mylist A
(integer) 1
> rpush mylist B
(integer) 2
> lpush mylist first
(integer) 3
> lrange mylist 0 -1
1) "first"
2) "A"
3) "B"
##  lrange key  start_index end_index
##  l(ist)range   endindex 会把最后一个包括进去  lrange list 0 10 包含11个元素，这个和python的range是不一样的。
这里index 为 -1 代表最后一个元素，-2 代表倒数第二个。
##  push 可以同时添加多个value
> rpush mylist 1 2 3 4 5 "foo bar"
(integer) 9
##################
List的常用案例
正如你可以从上面的例子中猜到的，list可被用来实现聊天系统。还可以作为不同进程间传递消息的队列。关键是，你可以每次都以原先添加的顺序访问数据。这不需要任何SQL ORDER BY 操作，将会非常快，也会很容易扩展到百万级别元素的规模。
例如在评级系统中，比如社会化新闻网站 reddit.com，你可以把每个新提交的链接添加到一个list，用LRANGE可简单的对结果分页。
在博客引擎实现中，你可为每篇日志设置一个list，在该list中推入博客评论，等等。
##################
## 可以使用LTRIM把list从左边截取指定长度。
## 截取后后面的全部给剪掉了。
## 也许删除博客分页评论能用到。
> rpush mylist 1 2 3 4 5
(integer) 5
> ltrim mylist 0 2
OK
> lrange mylist 0 -1
1) "1"
2) "2"
3) "3"

python 多线程问题，一般用高级库 threading，一种通过构造函数，就是直接传参进去，另一种就是子类化，然后重写run func

import time, threading

# 新线程执行的代码:
def loop():
    print('thread %s is running...' % threading.current_thread().name)
    n = 0
    while n < 5:
        n = n + 1
        print('thread %s >>> %s' % (threading.current_thread().name, n))
        time.sleep(1)
    print('thread %s ended.' % threading.current_thread().name)

print('thread %s is running...' % threading.current_thread().name)
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print('thread %s ended.' % threading.current_thread().name)

这是一种非常传统的通过构造函数进行的创建，target是一个callable的函数，也是线程所要执行的func
另一种是这样，通过子类化，然后重写run()

链接：https://zhuanlan.zhihu.com/p/34150765

class Producer(threading.Thread):

    # 产生随机数，将其加入整数列表
    def __init__(self, lock, integer_list):
        threading.Thread.__init__(self)
        self.lock = lock
        self.integer_list = integer_list

    def run(self):
        while True: # 一直尝试获得锁来添加整数
            random_integer = random.randint(0, 100)
            with self.lock:
                self.integer_list.append(random_integer)
                print('integer list add integer {}'.format(random_integer))
            time.sleep(1.2 * random.random()) # sleep随机时间，通过乘1.2来减慢生产的速度

class Consumer(threading.Thread):

    def __init__(self, lock, integer_list):
        threading.Thread.__init__(self)
        self.lock = lock
        self.integer_list = integer_list

    def run(self):
        while True: # 一直尝试去消耗整数
            with self.lock:
                if self.integer_list: # 只有列表中有元素才pop
                    integer = self.integer_list.pop()
                    print('integer list lose integer {}'.format(integer))
                    time.sleep(random.random())
                else:
                    print('there is no integer in the list')

def main():
    integer_list = []
    lock = threading.Lock()
    th1 = Producer(lock, integer_list)
    th2 = Consumer(lock, integer_list)
    th1.start()
    th2.start()

if __name__ == '__main__':
    main()

我知道是能通过这两种方法创造线程，不过我想知道为什么可以通过这两种方法来，就是说我想知道怎么样能搜索到知识的源头，然后我搜到了threading的doc，然后通过doc找到了 dot py file 这里这里给出了明确的指示，看到这里又发现 子类化 这里有点点问题就是
threading.Thread.__init__(self) 这里的继承问题，这里待续

class Thread:
    """A class that represents a thread of control.
    This class can be safely subclassed in a limited fashion. There are two ways
    to specify the activity: by passing a callable object to the constructor, or
    by overriding the run() method in a subclass."""
    
    pass

这里指明了两种激活或者创建线程的方法threading. py，子类化重写run()，或者传入可调用的callable的 target func。

def run(self):
        """Method representing the thread's activity.
        You may override this method in a subclass. The standard run() method
        invokes the callable object passed to the object's constructor as the
        target argument, if any, with sequential and keyword arguments taken
        from the args and kwargs arguments, respectively.
        """
        try:
            if self._target:
                self._target(*self._args, **self._kwargs)
        finally:
            # Avoid a refcycle if the thread is running a function with
            # an argument that has a member that points to the thread.
            del self._target, self._args, self._kwargs

已经写的很明白了，如果没有重写，就是调用target func 罢了。然后start()的时候，会调用 bootstarp --> bootstarp inner 这里会调用 run() 来进行启动。
threading.py 这里文件不大，觉得可以有时间仔细看一下。

with mutex:   #with表示自动打开自动释放锁
    for i in range(1000000): #锁定期间，其他人不可以干活
        num+=1
        
#上面的和下面的是等价的
if mutex.acquire(1):#锁住成功继续干活，没有锁住成功就一直等待，1代表独占
    for i in range(1000000): #锁定期间，其他线程不可以干活
        num+=1

这就是上面with lock的由来
All of the objects provided by this module that have acquire() and release() methods can be used as context managers for a with statement. The acquire() method will be called when the block is entered, and release() will be called when the block is exited.
找到了更核心的解释，在3.7的文档中
这里 lock cls 来自threading.py _thread 中， pyi中，locktype，这个只能等我有时间研究python源码的时候再了解lock的本质了(++++++待续)。
+++++++:我的理解：读了conditon部分的code，理解了wait()：wait函数本身所谓的等待也是利用了lock对象的阻塞，调用的时候首先 new lock() 然后获得lock 然后再次获得，即是阻塞，wait()函数就停止在这里了，然后把这个lock对象放在一个双端队列(deque)中名为 waiters，通过try acquire阻塞在这，然后等其他地方调用notify 释放这个waiters中的 lock 这边再进行继续执行，
这里要记住的wait()和notify()都是先acquire后才进行操作的。关键字 _is_owned ()

finally:
    self._acquire_restore(saved_state)
    # 这里要明白，之前是阻塞住的，线程这里被唤醒后，要重新获得lock锁，所以要acquire，并且是非阻塞的，否则又阻塞住就出问题了。+++这里待续
    # 然后这个 waiter 中的 等待lock对象，没用了，直接remove 然后gc掉了。
    if not gotit:
        try:
            self._waiters.remove(waiter)
        except ValueError:
            pass

notify 就是遍历waiters，然后遍历释放lock，这样之前阻塞的wait()中的函数又可以执行，这样继续执行thread中的exp了。

下面来自这里

可以把Condiftion理解为一把高级的琐，它提供了比Lock, RLock更高级的功能，允许我们能够控制复杂的线程同步问题。threadiong.Condition在内部维护一个琐对象（默认是RLock），可以在创建Condigtion对象的时候把琐对象作为参数传入。Condition也提供了acquire, release方法，其含义与琐的acquire, release方法一致，其实它只是简单的调用内部琐对象的对应的方法而已。Condition还提供wait方法、notify方法、notifyAll方法(特别要注意：这些方法只有在占用琐(acquire)之后才能调
用，否则将会报RuntimeError异常。)：

acquire()/release()：获得/释放 Lock

wait([timeout]):线程挂起，直到收到一个notify通知或者超时（可选的，浮点数，单位是秒s）才会被唤醒继续运行。wait()必须在已获得Lock前提下才能调用，否则会触发RuntimeError。调用wait()会释放Lock，直至该线程被Notify()、NotifyAll()或者超时线程又重新获得Lock.

notify(n=1):通知其他线程，那些挂起的线程接到这个通知之后会开始运行，默认是通知一个正等待该condition的线程,最多则唤醒n个等待的线程。notify()必须在已获得Lock前提下才能调用，否则会触发RuntimeError。notify()不会主动释放Lock。

notifyAll(): 如果wait状态线程比较多，notifyAll的作用就是通知所有线程（这个一般用得少）

class Producer(threading.Thread):
    def __init__(self, condition, integer_list):
        threading.Thread.__init__(self)
        self.condition = condition
        self.integer_list = integer_list
    def run(self):
        while True:
            random_integer = random.randint(0, 100)
            with self.condition:
                self.integer_list.append(random_integer)
                print('integer list add integer {}'.format(random_integer))
                self.condition.notify()
            time.sleep(1.2 * random.random())

class Consumer(threading.Thread):
    def __init__(self, condition, integer_list):
        threading.Thread.__init__(self)
        self.condition = condition
        self.integer_list = integer_list
    def run(self):
        while True:
            with self.condition:
                if self.integer_list:
                    integer = self.integer_list.pop()
                    print('integer list lose integer {}'.format(integer))
                    time.sleep(random.random())
                else:
                    print('there is no integer in the list')
                    self.condition.wait()

def main():
    integer_list = []
    condition = threading.Condition()
    th1 = Producer(condition, integer_list)
    th2 = Consumer(condition, integer_list)
    th1.start()
    th2.start()

if __name__ == '__main__':
    main()

这里是新的代码，有condition了，之前那个code很容易想到，先线程里生产者和消费者 status ，因为是线程，while true 会不断轮询，是浪费cpu，尤其这里还进行了操作，不只是打log这样。
todo；https://www.cnblogs.com/chengd/articles/7770898.html

这里还有一个重要的问题死锁，有两种情况

一个lock 再次lock( 非Rlock)
两个锁互相等待

情况其实会复杂些，不过这里只举例。
~：这个是第二种，第一种很简单，直接acquire后acquire就可以了。

lock1 = threading.Lock()
lock2 = threading.Lock()
class Producer(threading.Thread):
    def __init__(self, condition, integer_list):
        threading.Thread.__init__(self)
        self.condition = condition
        self.integer_list = integer_list

    def run(self):
        lock2.acquire()
        time.sleep(1)
        print ('thread2  get lock 2')
        lock1.acquire()
        print ('thread2 get lock1')
        lock1.release()
        lock2.release()

class Consumer(threading.Thread):
    def __init__(self, condition, integer_list):
        threading.Thread.__init__(self)
        self.condition = condition
        self.integer_list = integer_list

    def run(self):
        # while True:
        lock1.acquire()
        time.sleep(1)
        print ('thread1 get lock1')
        lock2.acquire()
        print ('threading get lock2')
        lock2.release()
        lock1.release()
def main():
    integer_list = []
    condition = threading.Condition()
    th1 = Consumer(condition, integer_list)
    th2 = Producer(condition, integer_list)
    # th2 = Consumer(condition, integer_list)
    th1.start()
    th2.start()

if __name__ == "__main__":
    main()

join() threading 中thread的方法

“”"Wait until the thread terminates.
等待线程终止
This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception or until the optional timeout occurs.
这里写的很清楚，阻塞调用线程，(就是现在这个，一般是主线程，调用其他线程，阻塞主线程继续往下执行，这里这句话的意识说，阻塞调用线程直到这个线程(who call join meth terminate )这里是个倒装修饰)
When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call isAlive() after join() to decide whether a timeout happened – if the thread is still alive, the join() call timed out.

When the timeout argument is not present or None, the operation will block until the thread terminates. A thread can be join()ed many times.

join() raises a RuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to join() a thread before it has been started and attempts to do so raises the same exception.

线程相关的还有协程个gevent greenlet gunicorn 等等，还有几个ref

assert expression
接上面的threading 本来是想print threading.thread.name()的结果发现报错，说的str obj isnot callbale 然后看源码，发现是这样

@property
def name(self):
    """A string used for identification purposes only.
    It has no semantics. Multiple threads may be given the same name. The
    initial name is set by the constructor.
    """
    assert self._initialized, "Thread.__init__() not called"
    return self._name

看到是用了装饰器，搜索了一下发现这个还很常用，property是一个类，然后这么装饰后，就可以通过直接 dot name 来进行获取名字，因为thread cls中也实现了@name.setter 所以是可读写的属性，如果再加上delter就是可删除的属性，主要用来对类的属性的一些外部随意赋值的问题，可以添加一些限制，通过 @name.setter
如果没@name.setter 就是只读属性@property内部已经做了处理。

上面用到了一个keyword assert 用法是这样的，定义看这里
assert_stmt ::= "assert" expression ["," expression]这个exp等价于下面

if __debug__:
    if not expression1: raise AssertionError(expression2)

所以就很清楚了，因为threading.thread这个类长用于子类化重写run func，所以有可能没有调用self._init这里做了判断，如果没有，就raise 后面这str。
~~> 文档中的这个语法叫做 Backus–Naur form 具体请自行google。

poll select epoll
fd 文件描述符：

文件描述符在形式上是一个非负整数。实际上，它是一个索引值，指向内核为每一个进程所维护的该进程打开文件的记录表。当程序打开一个现有文件或者创建一个新文件时，内核向进程返回一个文件描述符。在程序设计中，一些涉及底层的程序编写往往会围绕着文件描述符展开。但是文件描述符这一概念往往只适用于UNIX、Linux这样的操作系统。

所以就是一个索引值，能找到这个文件，并且通过文件描述符进行一系列的相关操作。—> unix的定义，一切皆文件。
select中一般传的都是网络io相关，就是socket 具体看这两篇文章，Select 模型简介
 简明网络I/O模型我在看这两篇文章，尤其是select那篇的时候想为什么要后调用accpet，想着不是建立了tcp3此握手后，socket才能使用么，然后查找到了下面两篇文章，当一个tcp连接请求发来的时候(第一次握手)，这个时候select返回的fd或者说新的socket就已经是可用的了，所以不阻塞select func，直接进行到true。，这里写的不对，请看下面，listen后，如果有发connect进来，就会直接完成3次握手，所以socket可用，然后accpet是从待连接队列中去出第一个，然后下面也体现了，就算调用socket.accpet() 也是等待coming connection，直到有请求连接进来，如下：

import socket  
# 这里是server 会一直等待在accpet.也就是网络io.
iport=('127.0.0.1',9999)
s = socket.socket()
s.bind(iport)
s.listen(5)

while True:
    print ('server wating')
    conn,addr = s.accept()
    client_data = conn.recv(1024)
    print ('client_data',client_data)
    conn.sendall(b'this is a res')
    conn.close()

import socket
# client
iport=('127.0.0.1',9999)
s = socket.socket()
s.connect(iport)

s.sendall(b'woshi lilith')
data=s.recv(1024)
print ('datatata',data)

Python之socket（套接字） ~~> 上面的代码参考
这就是最简单的tcp通信了。accpet的时三次握手已经完成。参考在这里 tcp connect会阻塞么 TCP的accept发生在三次握手的哪个阶段 tcp 三次握手顺序

Linux中的 select，poll，epoll 都是IO多路复用的机制。
I/O多路复用指：通过一种机制，可以监视多个描述符，一旦某个描述符就绪（一般是读就绪或者写就绪），能够通知程序进行相应的读写操作。

fd_set* readfds是指向fd_set结构的指针，这个集合中应该包括文件描述符，我们是要监视这些文件描述符的读变化的，即我们关心是否可以从这些文件中读取数据了，如果这个集合中有一个文件可读，select就会返回一个大于0的值，表示有文件可读，如果没有可读的文件，则根据timeout参数再判断是否超时，若超出timeout的时间，select返回0，若发生错误返回负值。可以传入NULL值，表示不关心任何文件的读变化。

————上面这告诉我们，select 和poll 返回的只是一个值，告诉进程是否可读写。但是 python中的select返回的是直接的可读写的fd，调用了底层的select

于Linux 2.5.44首度登场的epoll是Linux内核的可扩展I/O事件通知机制[1]。它设计目的旨在取代既有POSIX select(2)与poll(2)系统函数，让需要大量操作文件描述符的程序得以发挥更优异的性能(举例来说：旧有的系统函数所花费的时间复杂度为O(n)，epoll的时间复杂度O(log n))

select
select最早于1983年出现在4.2BSD中，它通过一个select()系统调用来监视多个文件描述符的数组，当select()返回后，该数组中就绪的文件描述符便会被内核修改标志位，使得进程可以获得这些文件描述符从而进行后续的读写操作。
select目前几乎在所有的平台上支持，其良好跨平台支持也是它的一个优点，事实上从现在看来，这也是它所剩不多的优点之一。

select的一个缺点在于单个进程能够监视的文件描述符的数量存在最大限制，在Linux上一般为1024，不过可以通过修改宏定义甚至重新编译内核的方式提升这一限制。

另外，select()所维护的存储大量文件描述符的数据结构，随着文件描述符数量的增大，其复制的开销也线性增长。同时，由于网络响应时间的延迟使得大量TCP连接处于非活跃状态，但调用select()会对所有socket进行一次线性扫描，所以这也浪费了一定的开销。

poll
poll在1986年诞生于System V Release 3，它和select在本质上没有多大差别，但是poll没有最大文件描述符数量的限制。

poll和select同样存在一个缺点就是，包含大量文件描述符的数组被整体复制于用户态和内核的地址空间之间，而不论这些文件描述符是否就绪，它的开销随着文件描述符数量的增加而线性增大。
另外，select()和poll()将就绪的文件描述符告诉进程后，如果进程没有对其进行IO操作，那么下次调用select()和poll()的时候将再次报告这些文件描述符，所以它们一般不会丢失就绪的消息，这种方式称为水平触发（Level Triggered）。

epoll
直到Linux2.6才出现了由内核直接支持的实现方法，那就是epoll，它几乎具备了之前所说的一切优点，被公认为Linux2.6下性能最好的多路I/O就绪通知方法。

epoll可以同时支持水平触发和边缘触发（Edge Triggered，只告诉进程哪些文件描述符刚刚变为就绪状态，它只说一遍，如果我们没有采取行动，那么它将不会再次告知，这种方式称为边缘触发），理论上边缘触发的性能要更高一些，但是代码实现相当复杂。

epoll同样只告知那些就绪的文件描述符，而且当我们调用epoll_wait()获得就绪文件描述符时，返回的不是实际的描述符，而是一个代表就绪描述符数量的值，你只需要去epoll指定的一个数组中依次取得相应数量的文件描述符即可，这里也使用了内存映射（mmap）技术，这样便彻底省掉了这些文件描述符在系统调用时复制的开销。

另一个本质的改进在于epoll采用基于事件的就绪通知方式。在select/poll中，进程只有在调用一定的方法后，内核才对所有监视的文件描述符进行扫描，而epoll事先通过epoll_ctl()来注册一个文件描述符，一旦基于某个文件描述符就绪时，内核会采用类似callback的回调机制，迅速激活这个文件描述符，当进程调用epoll_wait()时便得到通知。

https://www.cnblogs.com/fanweibin/p/5081912.html

port 端口这个概念我以为是pid，后来发现是一个地址ip只是能定位到这台计算机，如果像定位到更精确的进程就需要端口了，wiki上没有写的更加明确port是什么。

In computer networking, a port is an endpoint of communication. At the software level, within an operating system, a port is a logical construct that identifies a specific process or a type of network service.

The software port is always associated with an IP address of a host and the protocol type of the communication. It completes the destination or origination network address of a message. Ports are identified for each protocol and address combination by 16-bit unsigned numbers, commonly known as the port number.

这里写了，port是一种端点用来交流，是一个逻辑结构(logic construct)用来标识唯一的一个进程或者网络服务。 os中有类似表一样的东西，当传输到tcp层的时候通过port 在表中查找，然后把data从kernel复制到相应的socket(进程中)中。
？？？这里待续，对什么是端口的定义还不明确。

算法排列组合，git blame，mqtt 的消息服务器，和服务器，__get__, __set__是干什么的。数据库外键的一个部门 -》 id departmentname \ employee -> id name age 选择大于50岁的。部门的名称， unicode 编码 utf8 utf16

分布式和集群，集群是多个节点(server)干同样的事，通过controller决定分配到哪个节点，比如后端是缓存，in memory，可以是一个集群redis，这样挂掉一个redis后，还有其他的可用，不会打到服务器。
分布式系统是不同server干不同的事，相互之间通过rpc等方式进行通信，这样能分离可变业务和多变业务，新老业务也可以拆分，尤其是logic独立的更是，也更方便不同的团队进行分别维护，它们之间是协作模式，微服务一般采用的就是分布式的方式进行部署。
如何实现一个简单的RPC 分布式与集群的区别是什么？这两个是ref