现在小年轻都喜欢去影院看电影,但是不知道哪些好看哪些不好看,猫眼国内知名电影平台,看上面的电影评分就知道好不好看了,身为程序员呢,肯能不能在网站上看呀 ,咱的爬下来看,哈哈哈哈
- 爬取的网页
- 查找加密文件
可以看到字体文件文字内容和顺序都是一样的,这就很简单了 ,我们只需要在每次请求的时候解析出name的值就行了
这个很简单,直接上代码
# -*- coding: utf-8 -*-
import re
from fontTools.ttLib import TTFont
import requests
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
}
url= 'https://maoyan.com/board/6'
ret = requests.get(url=url,headers=headers).text
with open('maoyan.html','w',encoding='utf8') as f:
f.write(ret)
with open('maoyan.html','r',encoding='utf8') as f:
ret = f.read()
woff_url = re.findall("url\('(.*?)'\) format\('woff'\);",ret)[0]
response = requests.get(url = 'http:'+woff_url).content
with open('3.woff','wb') as f:
f.write(response)
font = TTFont('3.woff')
# font.saveXML('3.xml')
cmap = font.getBestCmap()
cmap_name = font.getGlyphOrder()
num = [6,3,7,9,1,8,0,4,2,5]
print(cmap)
dic = {}
print(cmap_name)
for k,v in enumerate(cmap_name[2:]):
dic[v]=num[k]
font_dict = {}
for i in cmap:
if cmap[i]=='x':
continue
font_dict['&#x'+hex(i)[2:]+';']=dic[cmap[i]]
print(font_dict)
for font,value in font_dict.items():
ret = ret.replace(font,str(value))
print(ret)