猫眼榜单字体反爬

最新推荐文章于 2022-06-10 15:27:14 发布

爱笑的光头强

最新推荐文章于 2022-06-10 15:27:14 发布

阅读量179

点赞数 1

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/shiguanggege/article/details/118359765

版权

爬虫专栏收录该内容

15 篇文章 5 订阅

订阅专栏

现在小年轻都喜欢去影院看电影，但是不知道哪些好看哪些不好看，猫眼国内知名电影平台，看上面的电影评分就知道好不好看了，身为程序员呢，肯能不能在网站上看呀，咱的爬下来看，哈哈哈哈

爬取的网页
查找加密文件

可以看到字体文件文字内容和顺序都是一样的，这就很简单了，我们只需要在每次请求的时候解析出name的值就行了
在这里插入图片描述
这个很简单，直接上代码

# -*- coding: utf-8 -*-
import re
from fontTools.ttLib import TTFont
import requests

headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
}
url= 'https://maoyan.com/board/6'

ret = requests.get(url=url,headers=headers).text
with open('maoyan.html','w',encoding='utf8') as f:
        f.write(ret)
with open('maoyan.html','r',encoding='utf8') as f:
        ret = f.read()

woff_url = re.findall("url\('(.*?)'\) format\('woff'\);",ret)[0]

response = requests.get(url = 'http:'+woff_url).content
with open('3.woff','wb') as f:
        f.write(response)
font = TTFont('3.woff')
# font.saveXML('3.xml')
cmap = font.getBestCmap()
cmap_name = font.getGlyphOrder()
num = [6,3,7,9,1,8,0,4,2,5]
print(cmap)
dic = {}
print(cmap_name)
for k,v in enumerate(cmap_name[2:]):
        dic[v]=num[k]
font_dict = {}
for i in cmap:
        if cmap[i]=='x':
                continue
        font_dict['&#x'+hex(i)[2:]+';']=dic[cmap[i]]
print(font_dict)
for font,value in font_dict.items():
        ret = ret.replace(font,str(value))
print(ret)

爱笑的光头强

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
猫眼榜单字体反爬

现在小年轻都喜欢去影院看电影，但是不知道哪些好看哪些不好看，猫眼国内知名电影平台，看上面的电影评分就知道好不好看了，身为程序员呢，肯能不能在网站上看呀，咱的爬下来看，哈哈哈哈爬取的网页查找加密文件可以看到字体文件文字内容和顺序都是一样的，这就很简单了，我们只需要在每次请求的时候解析出name的值就行了这个很简单，直接上代码# -*- coding: utf-8 -*-import refrom fontTools.ttLib import TTFontimport reque
复制链接

扫一扫