字体反爬
一,字体反爬原理
1,字符串编码
1)通过查看源代码
2)通过Ctrl+f 输入font-face 搜索font-face
3)搜索完成后出现 font-face 而后将base4后括号里的字符串解码
2,url编码
1)通过查看源代码
2)通过Ctrl+f 输入font-face 搜索font-face
3)搜索后出现font-face 找到url进行下载
二,分析字体
1,分析
1)将字体转化为xml文件,而后查看cmap和glyf的属性,其中cmap存储的是code和name的映射,而glyf下存储的是每个name下的字体绘制规则。
2)从上述可知name对应字体的绘制规则,而不知啥样,而通过FontCreator的软件打开.tff的字体文件,就可以看到每个那么对应的字体呈现效果(FontCreator: http://www.high-logic.com/FontCreatorSetup-x64.exe
30天试用期)
3)则code 的映射关系,通过形状对比来进行判断,得出每个字体对应的文字,保存到字典里,以后请求网页就可以反向解析,现获取字体形状,再通过形状反向获取代号所对应的具起文字内容(code(0xbef1)–》name(uni9ea3)–》字体形状(shape1)–》文字(1))
4)字符串解析
import re
import requests
import base64
import io
from fontTools.ttLib import TTFont
# pip install fontTools
font_face = "AAEAAAALAIAAAwAwR1NVQiCLJXoAAAE4AAAAVE9TLzL4XQjtAAABjAAAAFZjbWFwq8R/YwAAAhAAAAIuZ2x5ZuWIN0cAAARYAAADdGhlYWQYvXGFAAAA4AAAADZoaGVhCtADIwAAALwAAAAkaG10eC7qAAAAAAHkAAAALGxvY2ED7gSyAAAEQAAAABhtYXhwARgANgAAARgAAAAgbmFtZTd6VP8AAAfMAAACanBvc3QFRAYqAAAKOAAAAEUAAQAABmb+ZgAABLEAAAAABGgAAQAAAAAAAAAAAAAAAAAAAAsAAQAAAAEAAOFitoRfDzz1AAsIAAAAAADamxM6AAAAANqbEzoAAP/mBGgGLgAAAAgAAgAAAAAAAAABAAAACwAqAAMAAAAAAAIAAAAKAAoAAAD/AAAAAAAAAAEAAAAKADAAPgACREZMVAAObGF0bgAaAAQAAAAAAAAAAQAAAAQAAAAAAAAAAQAAAAFsaWdhAAgAAAABAAAAAQAEAAQAAAABAAgAAQAGAAAAAQAAAAEERAGQAAUAAAUTBZkAAAEeBRMFmQAAA9cAZAIQAAACAAUDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBmRWQAQJR2n6UGZv5mALgGZgGaAAAAAQAAAAAAAAAAAAAEsQAABLEAAASxAAAEsQAABLEAAASxAAAEsQAABLEAAASxAAAEsQAAAAAABQAAAAMAAAAsAAAABAAAAaYAAQAAAAAAoAADAAEAAAAsAAMACgAAAaYABAB0AAAAFAAQAAMABJR2lY+ZPJpLnjqeo59kn5Kfpf//AACUdpWPmTyaS546nqOfZJ+Sn6T//wAAAAAAAAAAAAAAAAAAAAAAAAABABQAFAAUABQAFAAUABQAFAAUAAAABwAEAAUABgAKAAMACAABAAIACQAAAQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAAAAAAAiAAAAAAAAAAKAACUdgAAlHYAAAAHAACVjwAAlY8AAAAEAACZPAAAmTwAAAAFAACaSwAAmksAAAAGAACeOgAAnjoAAAAKAACeowAAn