通过非阻塞IO实现HTTP请求
这个案例并未提高访问效率,无非是将阻塞式IO换成了用while True轮询状态的过程,还不如直接阻塞
import socket
from urllib.parse import urlparse
def get_url(url):
url = urlparse(url)
host = url.netloc
path = url.path
if path == "":
path = "/"
client = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
socket.setblocking(False)
try:
client.connect((host,80))
except BlockingIOError as e:
pass
while True:
try:
client.send("GET{}HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf8"))
except OSError as e:
pass
data = b""
while True:
try:
d = client.recv(1024)
except BlockingIOError as e:
continue
if d:
data+=d
else:
break
data= data.decode("utf8")
html_data = data.split("\r\n\r\n")[1]
print(html_data)
client.close()
if __name__ == "__main__":
get_url("http://www.baidu.com")
使用select完成http请求_目前都是单线程,可使用IO多路复用,实现聊天群(作业)
DefaultSelector包装了select,会更好用,且会自动根据平台去选择用epoll还是poll
而且提供了注册的机子
from selectors import DefaultSelector,EVENT_WRITE,EVENT_READ
select= DefaultSelector()
urls = ["http://www.baidu.com"]
stop = False
class Fetcher:
def connected(self,key):
selector.unregister(key.fd)
self.client.send("GET{}HTTP/1.1\r\nHost:{}\r\nConnection:close\r\n\r\n".format(path, host).encode("utf8"))
selector.register(self.client.fileno(),EVENT_READ,self.readable)
def readable(self,key):
d = self.client.recv(1024)
if d:
self.data+=d
else:
selector.unregister(key.fd)
data= self.data.decode("utf8")
html_data = data.split("\r\n\r\n")[1]
print(html_data)
self.client.close()
urls.remove(self.spider_url)
if not urls:
global stop
stop = True
def get_url(self,url):
self.spider_url = url
url = urlparse(url)
self.host = url.netloc
self.path = url.path
self.data = b""
if self.path == "":
self.path = "/"
self.client = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
self.socket.setblocking(False)
try:
self.client.connect((self.host,80))
except BlockingIOError as e:
pass
selector.register(self.client.fileno(),EVENT_WRITE,self.connected)
register注册后的回调函数是由循环loop函数来执行的,因此也叫做事件循环,
不停地请求socket的状态,并调用对应的回调函数
事件循环在IO多路复用中都会存在:回调+事件循环+select(poll\epoll)
def loop():
import socket
while not stop:
ready = slector.select()
for key,mask in ready:
call_back = key.data
call_back(key)
if __name__ == "__main__":
import time
start_time = time.time()
for url in range(20):
url="http://shop.projectsedu.com/goods/{}/".format(url)
urls.append(url)
fetcher = Fetcher()
fetcher.get_url(url)
loop()
print("所耗费时间{}".format(time.time()-start_time))
回调之痛!!!!
1、代码割裂(维护困难)——connect成功后注册EVENT_SEND的selector,回调函数是connected,进入connected后的send操作,也要注册回调,并在send后注册EVENT_READ的selector,回调函数是readable。并设计一个loop事件循环函数,不停地获取可返回状态的回调函数进行运行
痛点:
#1、如果回调函数执行不正常该如何?(因为是在loop函数中抛异常,就定位不到异常具体位置,很头疼!)
#2、如果回调里面还要嵌套回调怎么办?要嵌套很多层怎么办?(get_url->connected->readable,回调里嵌套回调,很头疼!)
#3、如果嵌套了很多层,其中某个环节出错了会造成什么后果?
#4、如果有个数据需要被每个回调都处理怎么办?(用类实例属性的方式,尽量少用全局变量来维护,但实例变量就很多)
#5、怎么使用当前函数中的局部变量
"""总结:
1、可读性差
2、共享状态管理困难
3、异常处理困难
"""