网站
https://browser.engineering/
Part 1
Method & Path & HTTP version & header & value
GET /index.html HTTP/1.0
Host: example.org
(blank line after the host line -> tells the host that you are done with headers.)
After the first line, each line contains a header, which has a name (like Host) and a value (like example.org)
server’s response
(start line)
HTTP version | Response code | Response description
HTTP/1.0 200 OK
graphical user interface
先使用thinter创建窗口
import tkinter
window = tkinter.Tk()
tkinter.mainloop()
# 这个循环大致类似于
# while True:
# for evt in pendingEvents():
# handleEvent(evt)
# drawScreen()
# 看似是无限循环,实际由于线进程切换并非如此
可以使用thinter创建Canvas
WIDTH, HEIGHT = 800, 600
window = tkinter.Tk()
canvas = tkinter.Canvas(window, width=WIDTH, height=HEIGHT)
canvas.pack()
Laying out text
先通过split分割单词,再处理每个单词的位置
if isinstance(tok, Text):
for word in tok.text.split():
self.word(word)
Tk 中的坐标是 X 从左到右 和 Y 从上到下(即屏幕右侧和下侧有更大的XY)
定义文字的长宽,记录文字位置,通过将坐标传入self.canvas.create_text实现布局
HSTEP, VSTEP = 13, 18
cursor_x, cursor_y = HSTEP, VSTEP
for c in text:
self.canvas.create_text(cursor_x, cursor_y, text=c)
cursor_x += HSTEP
self.canvas.create_text默认提供的坐标是文字中心,由于我们提供的坐标是左上(方便后续计算),需要是的anchor=‘w’
这里的速度会很慢,因此可以通过跳过绘制屏幕外的字符
for x, y, c in self.display_list:
if y > self.scroll + HEIGHT: continue
if y + self.VSTEP < self.scroll: continue
self.canvas.create_text(x, y - self.scroll, text=c)
Scrolling text
定义一个屏幕坐标和页面坐标
通过window.bind函数记录当前下滚 or 右滑的距离
self.window.bind("<Down>", self.scrolldown)
每次通过canvas.delete删除当前界面然后遍历通过canvas.create_text创建文字
bi_times = tkinter.font.Font(
family="Times",
size=16,
weight="bold",
slant="italic",
) #这里是字体格式
self.canvas.delete("all")
canvas.create_text(x, y - self.scroll, text=c, font=bi_times)
Different size of text
可以通过tkinter.font.Font创建字体
bi_times = tkinter.font.Font(
family="Times",
size=16,
weight="bold",
slant="italic",
)
并通过bi_times.metrics()测量字体尺寸
bi_times.measure("Hi!") # 31
利用font.metrics(“ascent”) 和 font.metrics(“descent”) 分别测量单词相对中线的上升距离和下降距离,方便换行时计算合适的y坐标
metrics = [font.metrics() for x, word, font in self.line]
max_ascent = max([metric["ascent"] for metric in metrics])
baseline = self.cursor_y + 1 * max_ascent
for x, word, font in self.line:
y = baseline - font.metrics("ascent")
self.display_list.append((x, baseline, word, font))
max_descent = max([metric["descent"] for metric in metrics])
self.cursor_y = baseline + 1 * max_descent
self.cursor_x = HSTEP
self.line = []
缓存 (针对 Windows & Linux)
创建字体是很慢的过程,因此使用字典存储所有使用过的字体,做到对每个字体只创建一次
def get_font(size, weight, slant):
key = (size, weight, slant)
if key not in FONTS:
font = tkinter.font.Font(size=size, weight=weight,
slant=slant)
label = tkinter.Label(font=font)
FONTS[key] = (font, label)
return FONTS[key][0]
test
测试 Code 1
python main.py https://browser.engineering/text.html
python main.py https://browser.engineering/examples/example3-sizes.html
something to notice
HTTP version
HTTP 1.1 compared with HTTP < 1.1:
keep alive
HTTP 2.0 compared with HTTP < 2.0:
intended for large and complex web applications
加密
利用python中的ssl:
(s是已经建立连接的套接字)
import ssl
ctx = ssl.create_default_context()
s = ctx.wrap_socket(s, server_hostname=host) # 使用上下文ctx来包装套接字 s
information
滚动速度低于 60Hz 左右非常明显
可以直接在 Telnet 中输入 HTTP 命令
要使用tkinter.font 需要 import tkinter.font 而不是 tkinter
debug
Python有内置的HTTP服务器,可以在本地计算机上提供文件服务,以便在浏览器中测试Web应用程序的功能和性能。
python -m http.server 8000 -d ./
保持运行,浏览器打开 http://localhost:8000/
得到如下界面
Response code:
100s - 信息性状态码:
这些状态码表示请求已被接受,需要客户端继续处理。例如,100 Continue 状态码告诉客户端其请求的初始部分是可以的,应继续发送请求的其余部分。
200s - 成功状态码:
这类状态码表明请求被成功接收、理解和处理。例如,200 OK 是最常见的成功状态码,表示请求已成功处理。另一个例子是 201 Created,表示请求成功并因此创建了新的资源。
300s - 重定向状态码:
这些状态码告诉客户端,为了完成请求,必须采取进一步的操作,通常是重定向到其他URL。例如,301 Moved Permanently 表示请求的资源已永久移动到新位置,而 302 Found 表示请求的资源临时位于其他位置。
400s - 客户端错误状态码:
这类状态码表明请求有错误,服务器无法处理。例如,400 Bad Request 表示请求因格式错误无法被服务器理解。404 Not Found 指请求的资源在服务器上未找到。
500s - 服务器错误状态码:
当服务器在处理请求时发生错误时,会返回这些状态码。例如,500 Internal Server Error 表示服务器遇到了一个预期之外的情况,阻止了它完成请求。503 Service Unavailable 表示服务器目前无法处理请求,通常是由于过载或维护。
套接字:
地址族:AF_INET | AF_BLUETOOTH
type: SOCK_STREAM | SOCK_DGRAM
协议(取决于地址族):IPPROTO_TCP
HTTP & HTTPS:
端口(80 & 443)
Part 2
node tree
将每个标签看作一个节点,那么整个HTML文件将构成一颗树
构建这颗树,定义标签节点和文本节点,文本节点挂载在标签节点下
同时忽略以!开头的标签(包括注释符号<!-- comment text -->和起始符号<!doctype html>)
if tag.startswith("!"): return
那么解析过程就是建立标签队列,在Part 1中已经能够识别标签和文本的基础上,不断将文本挂载到标签队列末尾对应的标签节点下,同时处理新的标签(移除 or 添加 到队列)
if tag.startswith("/"): # 标签出队
if len(self.unfinished) == 1: return
node = self.unfinished.pop()
parent = self.unfinished[-1]
parent.children.append(node)
else: # 标签入队
parent = self.unfinished[-1] if self.unfinished else None
node = Element(tag, attributes, parent)
self.unfinished.append(node)
def add_text(self, text): # 处理节点
if text.isspace(): return # 因为html开头标签后可能有空行,此时没有标签节点,因此会导致崩溃,用跳过的方式简单处理
parent = self.unfinished[-1]
node = Text(text, parent)
parent.children.append(node)
def finish(self): # 末处理
while len(self.unfinished) > 1:
node = self.unfinished.pop()
parent = self.unfinished[-1]
parent.children.append(node)
return self.unfinished.pop()
layout tree
self-closing tags
在HTML中,有一些标签并不包含内容,只有属性就足以表示信息,因此不需要使用开始标签和结束标签来包围内容,HTML默认这个标签在解析后便被自动关闭,因此我们的解析器也需要自动关闭这些标签。
这些标签包括:
SELF_CLOSING_TAGS = [
"area", "base", "br", "col", "embed", "hr", "img", "input", "link", "meta", "param", "source", "track", "wbr"]
关于自动关闭标签的处理,我们只需要先使用空格作为分隔符获得标签名称和其属性,将属性通过字典保存,再判断标签名称在上述列表的标签
something to notice
debug
可以定义__repr__方法使得输出其字符串表示,方便调试
def __repr__(self):
code1
import socket
import ssl
import tkinter
import tkinter.font
WIDTH, HEIGHT = 800, 600
HSTEP, VSTEP = 12, 18
SCROLL_STEP = 100
FONTS = {}
def get_font(size, weight, slant):
key = (size, weight, slant)
if key not in FONTS:
font = tkinter.font.Font(size=size, weight=weight,
slant=slant)
label = tkinter.Label(font=font)
FONTS[key] = (font, label)
return FONTS[key][0]
def lex(body):
out = []
buffer = ""
in_tag = False
for c in body:
if c == "<":
in_tag = True
if buffer: out.append(Text(buffer))
buffer = ""
elif c == ">":
in_tag = False
out.append(Tag(buffer))
buffer = ""
else:
buffer += c
if not in_tag and buffer:
out.append(Text(buffer))
return out
class Text:
def __init__(self, text):
self.text = text
class Tag:
def __init__(self, tag):
self.tag = tag
class Layout:
def __init__(self, tokens):
self.cursor_x = HSTEP
self.cursor_y = VSTEP
self.weight = "normal"
self.style = "roman"
self.size = 16
self.line = []
self.display_list = []
for tok in tokens:
self.token(tok)
self.flush()
def word(self, word):
font = get_font(self.size, self.weight, self.style)
w = font.measure(word)
#self.cursor_x += HSTEP
if self.cursor_x + w + font.measure(" ") > WIDTH:
self.flush()
self.line.append((self.cursor_x, word, font))
self.cursor_x += w + font.measure(" ")
def flush(self):
if not self.line: return
metrics = [font.metrics() for x, word, font in self.line]
max_ascent = max([metric["ascent"] for metric in metrics])
baseline = self.cursor_y + 1 * max_ascent
for x, word, font in self.line:
y = baseline - font.metrics("ascent")
self.display_list.append((x, baseline, word, font))
max_descent = max([metric["descent"] for metric in metrics])
self.cursor_y = baseline + 1 * max_descent
self.cursor_x = HSTEP
self.line = []
def token(self, tok):
if isinstance(tok, Text):
for word in tok.text.split():
self.word(word)
elif tok.tag == "i":
self.style = "italic"
elif tok.tag == "/i":
self.style = "roman"
elif tok.tag == "b":
self.weight = "bold"
elif tok.tag == "/b":
self.weight = "normal"
elif tok.tag == "small":
self.size -= 4
elif tok.tag == "/small":
self.size += 4
elif tok.tag == "big":
self.size += 4
elif tok.tag == "/big":
self.size -= 4
elif tok.tag == "br":
self.flush()
elif tok.tag == "/p":
self.flush()
self.cursor_y += VSTEP
return self.display_list
class URL:
def __init__(self, url):
self.scheme, url = url.split("://", 1)
assert self.scheme in ["http", "https"]
if "/" not in url:
url = url + "/"
self.host, url = url.split("/", 1)
self.path = "/" + url
if self.scheme == "http":
self.port = 80
elif self.scheme == "https":
self.port = 443
if ":" in self.host:
self.host, port = self.host.split(":", 1)
self.port = int(port)
def request(self):
s = socket.socket(
family=socket.AF_INET,
type=socket.SOCK_STREAM,
proto=socket.IPPROTO_TCP,
)
if self.scheme == "https":
ctx = ssl.create_default_context()
s = ctx.wrap_socket(s, server_hostname=self.host)
s.connect((self.host, self.port))
request = "GET {} HTTP/1.0\r\n".format(self.path)
request += "Host: {}\r\n".format(self.host)
request += "\r\n"
s.send(request.encode("utf8"))
response = s.makefile("r", encoding="utf8", newline="\r\n")
statusline = response.readline()
version, status, explanation = statusline.split(" ", 2)
response_headers = {}
while True:
line = response.readline()
if line == "\r\n": break
header, value = line.split(":", 1)
response_headers[header.casefold()] = value.strip()
assert "transfer-encoding" not in response_headers
assert "content-encoding" not in response_headers
content = response.read()
s.close()
return content
class Browser:
def __init__(self):
self.window = tkinter.Tk()
self.canvas = tkinter.Canvas(
self.window,
width=WIDTH,
height=HEIGHT
)
self.canvas.pack()
self.scroll = 0
self.window.bind("<Down>", self.scrolldown)
def scrolldown(self, e):
self.scroll += SCROLL_STEP
self.draw()
def draw(self):
self.canvas.delete("all")
bug = True
for x, y, c, f in self.display_list:
if y > self.scroll + HEIGHT: continue
if y + VSTEP < self.scroll: continue
self.canvas.create_text(x, y - self.scroll, text=c, font=f, anchor='w')
def load(self, url):
body = url.request()
tokens = lex(body)
self.display_list = Layout(tokens).display_list
self.draw()
if __name__ == "__main__":
import sys
Browser().load(URL(sys.argv[1]))
tkinter.mainloop()