字体反爬

python爬虫 专栏收录该内容
24 篇文章 0 订阅

字体反爬

  1. 网页开发者自己创造一种字体,因为在字体中每个文字都有其代号,那么以后在网页中不会直接显示这个文字的最终的效果,而是显示他的代号,因此即使获取到了网页中的文本内容,也只是获取到文字的代号,而不是文字本身。
  2. 因为创造字体费时费力,并且如果把中国3000多常用汉字都实现,那么这个字体将达到几十兆,也会影响网页的加载。一般情况下为了反爬虫,仅会针对0-9以及少数汉字进行自己单独创建,其他的还是使用用户系统中自带的字体。

字体反爬解决方法-寻找字体

  1. 一般情况下为了考虑网页渲染性能,通常网页开发者会把字体编码成base64的方式,因此我们可以到网页中找到@font-face属性,然后获取里面的base64代码,再用Python代码进行解码,然后再保存本地。示例: view-source:https://www.shixiseng.com/intern/inn_a7xabqqr4f9u
  2. 如果没有使用base64,还有另外一种方式,就是直接把字体文件放到服务器上,然后前端通过@font-face中的url函数进行加载。示例: https://developer.mozilla.org/zh-CN/docs/Web/CSS/@font-face
  3. 分析字体需要将字体转换成xml文件,然后查看其中的cmap和glyf中的属性。其中cmap存储的是code和name的映射,而glyf下存储的是每个name下的字体绘制规则。
  4. 从第1步中我们知道了name对应的字体的绘制规则,但是还是不知道字体是长什么样子,那么可以通过一款叫做FontCreator的软件来打开.tff的字体文件,这样就可以看到每个name对应的字体最终的呈现效果。(FontCreator是一款制作字体的工具,下载地址:https://www.high-logic.com/FontCreatorSetup-x64.exe 这款软件有30天的试用期)。
    在这里插入图片描述

字体反爬解决方法-根据映射关系获取真实内容

  1. 在网页中,直接显示的是字体的code,而不是name。并且网页开发者为了增加爬虫的难度,有可能在多次请求之间code->name->最终字体的映射会发生改变。但是最终字体的形状是不会改变的,因此我们可以通过形状对比来进行判断。

  2. 我们可以通过分析字体,得出每个字体形状对应的文字,然后保存到一个字典中。以后再请求网页的时候,就进行反向解析,先获取字体的形状,再通过字体形状反向获取代号所对应的具体文字内容。

在这里插入图片描述
在这里插入图片描述

实例 58同城:

import requests
import re
from lxml import etree
import io
import base64
from fontTools.ttLib import TTFont

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
    'Referer': 'https://www.shixiseng.com/interns?k=python&p=1',
}

# font_face代表的是经过base64编码后的字符串,他本是是一个字体文件
font_face = 'd09GRgABAAAAACigAAsAAAAAO9wAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAABHU1VCAAABCAAAADMAAABCsP6z7U9TLzIAAAE8AAAARAAAAFZtBmV/Y21hcAAAAYAAAAPCAAAJzPmBWVZnbHlmAAAFRAAAHlgAACfUsfAWzGhlYWQAACOcAAAAMQAAADYYHJmBaGhlYQAAI9AAAAAgAAAAJBCpBlFobXR4AAAj8AAAALQAAAGQUfP/MmxvY2EAACSkAAAAygAAAMr1ieq+bWF4cAAAJXAAAAAdAAAAIAF4AF9uYW1lAAAlkAAAAVcAAAKFkAhoC3Bvc3QAACboAAABuAAAA4PWD99UeJxjYGRgYOBikGPQYWB0cfMJYeBgYGGAAJAMY05meiJQDMoDyrGAaQ4gZoOIAgCKIwNPAHicY2Bk/cg4gYGVgYNVmD2FgYGxCkKzCjK0MO1kYGBiYGVmwAoC0lxTGBwYKn5c5yj/+4LhM0c5kwRQmBEkBwDJygxNeJzV1stvVHUAxfFvbcEH1MeIWl+tVcGCWrWIrSKi9VGrVCxWa5VatSzZNSEFEuJ/0IaElA0LFoQNXTQhYUMoCzIxTViwIpCQTKft3JnpTDu307kzYQGe4fAHsGiizs2n6b2r+XXOOVNgHVArb0mdbhuoofprTE9r7j2v5bF7z+tqU7q/wl+s5wtGE/WJ04nJxMXZM7MzyVhya3JXsjd5JHl8bnJuau7K/PjC9oVzC/HUeOp8ajV1J6gJWoLB4GBwOJgIMul16aF0PH0z05c5kImyHdk92X3ZkexY9tpi5+LwYpCL5bpz+3OXcsv5Dfmr+VtLO5f2Lh1aPrEcLzQVugoThelwS7gtbAuPhsfCmXB+ZWRldOV2saG4ubijeHJ1anW6VF9qL10oRdGmqD86Fd2ICuWe8kD5bLlYaa60Vi5Xrt+9q9M86CnG1vgUjWt6iv/7q0apetArvsbX32t0VU8Ro4tH+YZX2MGnvES/ulPHL7zMNjpp5yteYBMfsYVn+ZEenuJXfudpPuQZNvM4bbzNkBrZzKs8zOs8yde8xw/spJGH6KWJn/iY/WznN1r4Vt1s4HN+poP3+YQ/6VZTN/IIb1LPXl7kHbbyHW/wPXvYxQZe4zne5Q/19zMG+ZIB+viA59mnPu/mCVoZ1jHW/5tB+I+8NlZ/1J2/f6e/CqP36S0m6k2fN4nTpk+exKQpAyQumtLA7BlTLpidMSWEZMyUFZJbTakhucuUH5K9Vt3o5BGr7nTyuCldzE2acsbclClxzF0xZY/5cVMKWdhuyiML50zJZCFuyiipMVNaSY2bckvqvCnBpFZNWSZ1x5RqghpTvglaTEknGDRlnuCgKf0Eh009IJgwNYIgY+oG6XWmlpAeMvWFdNzUHNI3TR0i02fV77fMAVOvyESmhpHtMHWN7B5T68juM/WP7IipiWTHTJ0ke83UThY7TT1lcdjUWBYDU3fJxUwtJtdt6jO5/aZmk7tk6ji5ZVPbyW8w9Z78VdMCkL9l2gKWdppWgaW9Vv1OXzpkWgqWT5g2g+W4aT0oNJp2hEKTaVEodJm2hcKEaWUoTJv2hnCLVf9HCLdZtT1hm2mNCI+adonwmGmhCGdMW0U4b1otVkZM+8XKqGnJWLlt2jSKDaZ1o7jZtHMUd5gWj+JJ0/axOmVaQVanTXtIqd60jJTaTRtJ6YJpLSlFpt0k2mRaUKJ+05YSnTKtKtEN074SFUxLS7nHtLmUB0zrS/msaYcpF02LTKXZtM1UWk0rTeWyUX123Rj+BwzUpPwAAHicbVoLXFTV9j5rn8cgIvLGBynD0wwJmQdESERIRIRcJDIlM0MlM0Uln6REhEQjISIRkhEB4SNCJUIuohIimu+8hKRmRohKXFNMnJmz+O9zZkDvvX/nd+bMHM7ss/d6fN+31pYBhhnsZfwZB4YwTIDK0WGCwySG/qPfBru5VwVvxpKxYxh7G0bpamvDuLvRj9InpescsAECvXhL34tiGzwFajyPR0gPFEOFsRvX4BuQA++Ln5KV5COGAWlQTiP4MiMY5nHQKG15jaetktMYs2EOlhBvmL2TK1pUW6pfb77Xiz7bWb4XQkCj9nJ3U2i0Kn8nRweiAEclN8YYB80NGwp0Zxq/v3vuzvYO/IK07oIjDRdTMzfXHNt87VAe3mrHrzlpPLqWsXQ8JV2Jk8pfq7FRumtUrrZq5dCgNrzjBHB0oAv0/uSLttJ/4pz318O7+PtXeUU/tt7GE1U/4E/6S5uAXf95JniVgcPg0gOzzlTg2Vc4u+OF5waZONlmdN7JwiiTzcDd1t1WqQGVrUppqw5wFxTAJbc2ib7km99bMY3jrKegCvJxKeRvZI1GV/LR9Nle/xBnmOerpOOMoT6R5msjrd5W6ai0pZPl6DSVbt57S3aWthw7OKeY6MXGV3zugR12onhxad/C/fB4uTVr/w1G8QOdv+HN0GF/zqE2cGLc6NzogIRV0dHs5EUz1LdKf2dwkK0jGVtg9Wszj/cTfulfx2/jrZ978S68Aa4Vc8XYrzall376cWY5HxGCZXjuX6i/+DteguUwk/r+9ylG+KyzsaB0X61kDpMvX6JrERhmBLiDUqPkXhLLfiShRj37b/64PoCf9w3DsPL8ouj8HJmJzOP0hx4atYfZM5ytA+fu5qGx8QR5ao4OzlqgZjBNFurwFnAnDtzGX2Em3tCfQoRoKMlYlpx/dwCN+FvVpg/K2Sffv7f75xOliB9z67DzyB/XamHSJlifvOa9N5oXLMKbqR0peZ+8/dvD2I+Q48/d5AE7W8FboGbSqBmVv2wsN2/7R4x1IL3g7B+ELPzrh0EGxvx6HTiswZ+/Wr68dPO6yopN75Wfj4JI8CWkGazau8ATt2EFzkR/DedS8v3Xq7/+pWnIVoou9g3JGvbU266KloFg9o0aRrYP3lT8IpQyLoynZB9PGls0uqSYUNo60LD18qbT0dAv9A805mzAy5uxpx/c2b8KoFn8m2sVO6HtbMwSS4cUdQqeLKTpalmI1RHBEEwwp6BgHJcOXjk52GlMuyJEx1jPJT36bn4cyTP6iKm4K9wL1CSSDdVfHfYrP5L61ZFhlO70YVJ8u48AFTg5q7QB9MyPvIuVVs48ccOSfsKi1z3wHz+eG+MOoX+gC3neIdjiBbGBtWRZsefJZ7TTiL1oWuevin5hC419L2ay9CBnhZc3xwsKb22ANkBeMVF4enmzjLRCUCkY+jeGp5d5X3A7nxEaj9biQHzC7Pnh1WXiQoWPMfBkMxuduLQZU9DaL4SEQFK7F8wmgcFqsdFYwMUaaiAA8Vpq1OwJXhYelR7Bu7oKChDbJs9dyrvgJLwaGQVecM0f/TtiZ8O4uAJTjOCgIlWwZxQyqlCz05m5g3+Awt2VeoEoBN7L2yNAay/5h3qoQWio0hdW8dZzdRO68W+8xj4ThtfUkWCt1rCYzi/GNMF+oHfNGtaPK79D7ogn912tDwkra8wgE/Sl/DzRI5kxx8EDxSXhS2YkM5ZmCgNKW5WMDNTt9k7OHDWTRooLWy11BLUcKSURe0iUWLdHbKgXLPDw2XkR2AWN1Ti/c75uwx72L1IjxuYZirhk8XByRFtBdvA6dkyeUb2Y7726ZFXlPtMzRUWXsJ0ZR7FD88gzbR7GnckbSpAnYusuRaAiQJ6B2RC2bA+ZXwQEEYMvWdk1JWbgHSz3jYBdYCGeMJaxTB665kFgnm9YSWdW/r4IPOkXCiQqhrsP5ZiYbziBfSQ+wbqADUI7PBkUDCuIE3QaxnHxhlK2ARNQdzKoqn/J4dabfaHhlXW7oMTsp38rCoStNJMDpJlTL4GSsEQpz9yVOsidyKmiNJ3cNaANkA1IcdvR3VZLF0GjjzH2c5OMvaxujNO4QWZWnbVDDrhYOzi0vyreCgyE268OMisxyC8QdiztEx22bMV+sCro709ZwVnjJazjP8/Lp2HZTnPutJAU5bRLD/Xi67g6Qg2vkFdxJP4zKAjGw+4i0e/pULI/D/rRKk/8RKWND8usNGMSzbfvaL5ZMDYMQ83vKsMQa6typReUbdAA8+AnnIJ5134Db3gR9+AtYRQuxK+xCGP4OYYECCPh4DGcuxJPjZRyV4YMOqCtik9uM95ua2Nt2kiDGCGMEl8nXw7dzz6g9/MSH0v3sg+MZ9rIW8IoffkQ7+UN854UATJLedPZyWQLeW1QbwBHbLx7/MD3TbiDPCceFEbd+a0L/7Bgx4mFFcXwhJkrUug4CjnGJPZUcin47HF8jpvOz9FX8HNoONKnMHhP0U1zwJ5mgIYJYkKYMCaCiWJiJBqmKGFKRtl3CjoMRSalRuVozlEFjVyFUlAQd6UcuxKq2CspvdpqWRlSKX7Sd54eoHR8nGiU40LCVrBjMsOjiFVJjt7IxkB1FlRniG4ZG7hOY7MHKbO0criDxVYTzhbkWFtYijGh+Hcn3snIAGu8Ix1UFvngBdgnv8uH6Cadj1tacheysgyF1jaBs1wxFMrUWr6xvt5429d3RdqO+nQ0RiVYrQtZCq6HwQsrx0AL+GaAr3i+oaEB/OrrZZPJ9thA7aGgUR7ORJqsJ62VldbJC97DCxVkXJKR9GGcy8JCttUjRCIoPKkl2EmcT5HIFYVyhZy1Hd7MO21t3Vi0y9KKiH0jXlyIfZND9HHTFVgnqrduwgGwKAAB9UuSLIQrbeFrLOxSQ2blpAk6kSNGstKYmaNTRNxB7BA4LinMpjq/Qdx8uDQoKjt3cbexxU0Jl3WQiWk6jHRxWegbLFZd4uKirRcSW2+PMehl5p3BDxQt/D0pdgOGGdBRGcQuEQWhSiTs0oFVE3ltaan+RMkQly8TFjCjmfFStGvcKY8TjQ3j6kwTnDXrCSdXrYZb1sKtHTw/ANTIC7mWH75NL6n9BibU1p+HJxDs4MUastM45cP9N44f+Pmnz812f6BYJ8yUkfhJRkWnZ6MQFLJ9x1IX0C/DtAzSLKk/NNKLxqLpBTTqYI2PX1FQ4IZJrkLZQG61xZjm5Fx9MXDsLDG0qJwQnA2VJVCJswlnmMsmiR3scuSaetqjgg83V0EZa/GgBZEjQvBCqyw+31gidqWzJ05k19Rkn8gWW8EK+4dyGFZQO7ASeqhsq1uEBQ+2S6qM8tg1YS/jyvgywXIeRUvxI+XIo+Hg/shLSYPHFDjmgyesNkBwFni6cjt7nh4KwtIh7GUGIBd4a7EqgEwtUs+1HhceEXw6qzWrTQdNvmIX22dcowNtHmh1hhiv5EWvEO4axkw4XjHfDUOyw8JytAtJtRhXvm5VfGYkuzejU0zeQ+LChBDQizqSKhaRZNGLTegELgSasQFaMBgikAnCi+INJ6eY0MBZhUpX8iqM007AazjPA/LI0gxoi8wvjtWZcRVZrp5bQkH0MaqqJCh0NOliqdrxZh8ReY6wsqWFy9Bf+Av/ff1fqXz1ByU15blbGkrERcKMQxfxZz324t468Nl04PaF/T+e/tI8/p8Uq76no5t0An1JIcLYOwsKc+wO8beUcF4UplvE2k1fU2ngkls+2ZdSfxgpEZPYMcaeMoMerhIvKuQ+I8GQdPcUpZ5QyPV987U8LNFhbJ7YRVzyoGZId79M/W0rY7yNNH8Z5+niXG25l38Y2FhFV7PxJrxL0iDmhwJxs7BADDiL08x438/FUXxlRhB3VhZ3rCTuqMJQAduPufBO10+2EzmY+vNvsAY/6zpjP56DUVyc+JP4C8xzCRJisZRMJCrMddPCCUZiDryh6BXeomN6U8x+mqFVCQ0iR6JwovrNw5OTVIKnF0sjz0b6aEsDzOFpoDZy08CQrnF/qGvou9aenODCdvQMwqIINW7H+3gb56nVsBlG9rxFfkUrrNGGwCTWJ/GSpYADjcYmzhL7suaFIxRm4ECxTpefzgoLOi5sJBVU91YcrQqbHbmopf6DqIi3K6/C06xVEObVhgUVgdV5TJ4dF3bW2DA3ft+SzLAVpC3PmJ3AZxSVpDZmGssHm8vhMRkP7ip6hNcpDk+iuRT6EIfBzl6CWsnxHp7U8bLHTfKIOoUovHgz82geCQkJMtgU9mqRGFtk3NG6GHsgcEOmzTiIh9k+H28FJ7wZE1u8JjU6b3GZK9lBgjEIWtGSy0ULeg4iwfoetkeMJTXCC2JcXJyYReXZx5nwlN3o8AiXWJ3LGNyXiQPhcbXzC+YY32SbsKo7oQsmZYFNVhbezsKOLGE+1pow9FdFE19BsUNhqkyk5D4rzHtQWqDIGNjAV+wx5FbJ971OsWQ8jTlaFTgrpTXby0tizQtilZPgfCoGwtVIsUzhYLgTyaZA+EAEf1p0W8XGj2RtiqC7qAjHGWcVspXG21Is4n2aP0qqPWk5Jo2mNVlJkOZgJ2l/lio3F/LneuM+9s0RIhEOG/uIwKYbtx8izXGBYurpHaJ68nJ4hjxVlAuJRUXn8HqRmNU+PxwUZK54Er/TmnilhzTzxYwDRQHB3XUsuGtUtgGyLyQMCCDNoe/gz01NP4A9/hmVEDZ1pDu8QDaWgWUI1peJ5e/M9TTzUzfXK9exdByzSvN3dvRylz/Sgbje3VQD78cVcBhevbKl8hiKFLj8/u7eFTobvoZEWAVtz5xLpMLtF7yJB5JNY/JS7T72v8ZUmgt3mcj5OT1UatbgAmiFNw3VDTgoYj34AuxNwxoydnEm7IQ3YRGcju14FyvxD7yMjeHw1V6ZA+4qLgj7KQa6yTwWQPNT6l/QpXubGMBpmqTtvbylmFVSlDeBF6+ydRjmOKnWBNnLVDhFF7FLTsTnW9mtS44XW7CLZIUt3JAW/SZxj10cU5T4Glk7kCisK+lZvMPSriAxHZOo09eJLSTYdOBdXm0oa+G4mAhLEo9cXvK6hJwlYT4ZubM2LNRlGsr+JoSLS7DJYGm1yrq0Y3n7kK51ovrRRtKpUk1qRjy5HHXCPrFhwki+rY23HUP874uryGFPRyoaRone4yeQUnHsUO0vuFFbj5S6T3LfS7axq/TZVPu7uzFJUAhz4DnIwSVYjXW44jxMvD4A4/HWX39iJ+mH1dCIMViOmzCCqr4leB2/gengAT6UnarMzxmliBOm0mr+KTp1aldqVAqGcsWk8JarqP8o67UBXp7mu9ihu+g1mLVq9ao3IrLfXL1yefbMWetz1xQKVd3p+yztqlal6zP3fPAhsUhNentlbPbyrA2bspKWpOnW7cl8l9sccvDouTP+ePdOy6nIHaXXT03H1xUeDxrbudBE66V8jsFJNP6ZF1723R9X1Pg7ih13opuar7TFPPjrPDMc54HUTu7UShNA5apR+4K3L8hdGyncZVnlLDXVOKWbl3f6831vQ0TEuqobBaDtvb5E1567P+1MVx124IPEe4HgHhLb82xy7MzUY+lHrgV2vrF68dxVC9/pzD7ZqfIa0i8WltS34019JHPD4ZF+g+lsYYkxN7DNkeMEJzx6HZ+5DMGOnMC7wIyjEGpjyfFjwE8Y9eAulxgdP/NFQ5kwyrB0Rmzwm1y+YUHQ7Gdf4T4dXh+7h+oCOY+l1s9QHssLo3nM7t64/zZegqfBomjJu59U/XRs5ycb/aNBfQ9s4CVtc8KN5kMdc02+Fu9zy+hYYxhaf3na0NpT40RZeFiLUoupNCpWOyxGvbll4p7TRQd2wVZug/6sHhy6L7/NtbRA9cbSfXvgsTp4TSx+6eAiqHr3HEz+m5aOL9YUYmh6/a1TNZ1nS4f4fw7lfys5F8yZQM3EzcF76E3H4uCIEa2Io3hLWGC4Awaxd7j3tZfOdSTjZIr/oZg39bukVqSwAPxBeEDD/TwOYEfbka+qamqqKg6Rx6hMnoZn8C7qsQkCgT906WfY2nFtyJ6JdNyJpo4alVRytGiHosUcLNQi3ssLdv+R9Fn90gd1m9emfT531cv5BzIetO5J//fbhaEzn4rasmjr3qkHY+M+jAqaXvBO0bfPDsWHEEg1iyk+hjSLSbJoA+yHzkIg1p28ZM1xrENfK9aevGzJW3Cj+77rsRRGEJsOLs5QzVr4B04OMf7NxRlPeoYGTGWnGn/0CPMJYjWmdYCihWKlrcTuZkCkHGEndQ9sIZB9LPC1/PnRKaxa70Ww3nc+ePDr52XGb1gTVoY2og51NERCSKhpzhR3TwoHpRzizf28hyP+P229jBJywqhjK0JmFidEZrJtxkzS3re0xtKuMilDbC0Vz0/jLMGh6L2NkRsyw4qwV3QwcmFzLTOJ1tDARQz1j34TaoV/UO+Ol/tHUttDa2+qV5TDz5OkqIN0jcCUQSY0qjnMlw/E7lUNlg71i8uMwA4abdjbsB03YFtI0Hlwa4d1pPZBD3ZwkfOsdkB0PqS3kXbMhDRZA9+lHL6WPl2yGuUJe1PXDkxdI7a8ru7SBAgnoSE+Yi0Nqg6x0ieIpIAfe9wwWZHlK97snh8HauhGfyyKTIDZhJjWQmO8mGLCaIrVE+WuiLQ/ICjkXj7nDnL7Qa0cakO0wbGT/Vs27jiIv13Fe3VbKvBS2+3Pd+OnwqgfvkxveZyz+7Gk7T4/F8dvfu8X8R2xa8v7YDnU/+imzxkhKzqzWme7xV1kstjexk7g54j6GnEOvUPqYysUncJuqpQmSn4dVoBm3W8n635z0W0WeRDO55Ya9pViP1srds18PSzu0uLKKVPhQjmpFaPZqIG5vN6wj4shjxuvJCdzOoj64kNfP3CFWL81SyEIWwuwrgCLMWmIAw/QmdgP9WWdZN0vwaOXxlbFH/jTeE/pIFi0GsjfWkvOspX7TuWvrjAcokA4L945rJR7fLjfpJLHcZFH8oUnhjo78maJs9m6Xu68qhcHX0p/Na8NOq4Bg1Whr0WJG3s/2l2yDarff01MFUZ1tmDTPH580lo2V7xcmrE2+2EP+RVTH2rEI1jOvyIeuyKevwJfOtLsHA87Jbym05sePGN6JHfg4b7CMZNPRkiyQyNvLRwzfsC+ZVjBrjae4RL5c/qre8N5532mOuyB4orwtamnD/9P+UXxwouXr8stIy8SBpYbyp98mu0UJ5FscQ0pE+eWfLW8MFDMhXHl5S+9UrJg+QR2PSyBwxOK83RYpsMCHVrwnf3t2jQoI8F+vWf0z3GrD5ZED+FqPcVjG8meno7mUux/6kzX4TLz7D3884+O5aYyE6/9s6SEFmi+F6kqxB78rg6eyKJl5rdnTn8xbA/SLbwn6Vdpd+cxMK9KozKBLOkOCK0HFsXmgwdP7Yj355se66ooM8ayNWXV+360l34+6C/ehy3CTMZCYj17tdbfyUFw87K3Ie4yY/mPBVVRxNuRLyyO+OXXU9EvxETcaOE71M9HJEc8r/+Qov5F971+sJILH/JPAsV7u/+sP+Xi2smRS/jhwfuVdJ3v9dU1kXfh5cNbxY+4uO/P3fjVnG8MJ/Vs5LU88ktnlpmasKKA/jD+9HMebJ77v6rFs1zcb/2jh545mXLAf8UTNxmjWnFeO1grBd5yLIyXUJ6i+83XQ+OfZ8cN2489SH9r2p80N4rZA8bF5APxADtXfJ+sf5ZdVhZqzDPj6OBmRRt/Wa6JRj6siqRzvaB7kJqm0A2ktvMd+kn85T0Groot22Xm148VjbxI48BB6mDRimaoRGLNJVI0/+JYQ9EGPC1c0uMT3AsvwXsDeXyu8eVn2BPOMCuXTcnLw07DWR3ng5FmrbJMcUn4RuZWcyH5sMfwEGukYxJbKgaTFvH2h99MDpPiWBMKR/YQo8jRi8Fs8kA0D1gMp283h0VBMLT55GdBFc4qRv9iJIXysxSKLmETYy3XYypBYQfmGsyVTfavGTlxgOyJDxWrxV/ENdo3YIC73d5sHIMtiXEQy75sbMP3osw8dF1xW/iKspCKCaSLkPPRnWpbqR1g3jQgrAyZtMww9SrNy/AUnDmp1QRKO3s7rq0k6kV2RMPUmwfBL1yNvSeO4dHJoRDRdPgfj0+eHN5jXJK/6B3UsrW4S8JKmC1wYjsmOEXF8GfRaXrjP4Wv1Hu36cTzWPavz6KjvrrY1RQc1fgr2MMvoQHqcHBF4yJ7e+773FwUcsXesFnhSXElVtacFl3wV/AZwsqP5L71aCnaqSsl4WHWIPxHl3eI67ddJjHt226OHMWPtO6RLCqMMkaRzIkvRXiJaWZ9RKTet5XURXwk4UhK36nXQ0Lm9rcRf1pGnHL5YQoUs78MY+AIE34+Dg+9zI0wHmInmNiJrBVG7UObveJ9M0Z4mu4HKaOUtMSl7EQ8t21D623b4A6fZJi+cyf3z51DvTED5e3t1EfeJub7/xhMViaP4mYKqcMP/jEvKKppdpn3FIguhSvoRlzFq6VilFgJDlXlUQk7FqZOINNB//POQH+IgRLfbR/CWfTToaATw0kjcFBEwv1unNQ/zy07uCPBvDdqYdL/Q9vpD+lNac50pa3w7Gh+nAPCMiR2LoLlIRgNx9W2CrUjnAPCkY37UnwMb3G5k95bctqooJzSuS78ExXn9uCuqZ8r0vV+K2O0q2nNjv8pw8Dj4ULlYpd+p6FP+rk7mKavqrayK9mRIwaLPqBF3aKVlWsyXFk3ztpwx9gMIc15PVjlwR4oNN4ZILFJNtmkA9sRTibs2264zL6cvTKeW1yI/X9DxCwswe5osw9eEWYM79tTeezBcuaKlL7ztPTyD4BHSlNhxgsL83DwdzyOy2iVOg+mwcd4c5B5vy4lMVR//dE6FZc664BmbhPEwVpokAvWHzFxmm7RePbi/5Ss5j5EOPXBKClGla5jpV0++cSHi3vEo22QBWvbyGpxGVnHfmNMxCehmd1vjtPjpt+NkETEQ2Q+juQ4jv4DZoAHa/UYzKVZMUVsJRq2Q4yD+ZMDybdm7mTbKZc4SAhnr1XJ3RmpePLS2EwByi0KsAbT/weYBscrOiCvmV959csmWAaRV4+VQ2T2opTE9EqOX4SJYuDuo9W0BlX5wugsPGyXNHtGUm324uHaRy33gp0l38u2dRSoodWMq50jrVcoFyk4tVHP11c0bM/HiXgXvKG2vwV0G9/d7QT39x5Z/e0i8ANrhCi8YQgqLK/WmfAB8mRetXqUV2kOffr8Wy9QOnU28ehb06frEyVyMtcz1031EuVP9RAN/heROkvdXOF6S0r6kuKWuylbKTGu+n3ZLuX6e/AOWV+7/USOmEM+gacPbxM/5uK+Pb4u+RqGmPH3lsIofEejPIChpYG9angjSFCYI9xbRl27AKohhjo+xFPloLDjFR6sabtU62mCA0fW215Chzv9OrDDvq+/sOQKynWnfrfr6cRVy166MBg4sfRJP/gS/eewR0ssPof9G7Dg87fRxzeQuLkKaNVRuCPgadbubEmtuHhhShB5AAW4JBPP2TtEh9o4YDDUjptQ6OQM0zbAuK7MWp1rfm4mGrPP1ojxrTknCI9PY09YGPiQ2DR/yltXJ5fosnCDGyRerfUa7vMvG+rzS/WwoHQjtv9Te9MQ8pZ2glIRzxnAoefqQqns3pldsm/3x0V7AVny/QXqc0LpIYqW3MEfHOg53XD+XMlwTySUagjboZ0mc4pQZWer4kKPcsu6cXUzkPQdR6vboJ0Uipvw24N55D0Ze24peoUK6u/nmReZGSZGlMmQNe1kSm6RN/clyKaO5x+5pJEuPgRnR6VMpiZ/yntMFKkV2oUth+tgQshksWFXY2kTsSjucglcHMpeycfCoKQWYxKxyN/l77s4DPvALtvVzcuvZUW+fzCkYFRQGCE3wQ2vpHFxYiSplw4SkJqz6nBsfHgn2SCmRicnRkSl2/nnhMUJwQajf2Nuet+sqLhUD9+c0ARozAGPbH/fyFCwg1l+2Wk5S5Ny8FIOekSTakqwV0kzXpPsJ97nVOZ9BxjiwGeAwoUJLVRoqRt8Qf2U5sOwlzHmIijH8IId+NKMUeAOmz1OJce5WOORORODWKknIMiav4tqDDtZY0gd+YhH94fN28McD0qtt9RZNhvPQfFoa1n6vwHso7vDnnLjflx0dAY7r7COs2zJx0G8yDJw2gLwTeIyO78k7W2YgT4KMlG89xnkRgbi51uWYnaIP3z5PIZ04KWtW8EDL0kHSU9dszgjfq/VKO6QTmeocosL8sZX4SP1NNKX31GQB05RR7YXXNxSgG49kTEwCrbhTLwfowYFXMfdFMn3jYWD1Ljg0RaXXTY7h2H+D2wQjMp4nGNgZGBgAGLptc8s4/ltvjJwczCAwI0Hy1/A6P9Gf79zcLHtBXI5GJhAogB2NA4uAAAAeJxjYGRg4Cj/+4LhM4fKf6P/Dzi4GIAiKCAFAKwWBwx4nONgAIIUBgaWjcRhDgYIZtVEsJExmwQQGwDVSgPxUyCehirPKgehmRShfBkIzfIWiPmwm8n0G6guCog/Q/WIAO1oheqbB6RVgDQzxGyWaCAtxsDAvAWoJhDTLFZWoBoHqFsbgXxLIM4Ein1DuIfpGBAr/zdieQc0B+gmFmMg/R6721gXAuW0gWrSgHqiIGLsF6HmA93I+hModwWIc4C4C+Jv9npEWIDsAPvDBkIDAC4MIFoAAAAAAAwANABKAHIApgDKAPQBMAFEAYoBxgHUAhwCSAKcAtgDEgNwA9QD+gQSBCIERgRaBOIFWAVwBZ4F9AYABoAGsAbsBwoHNAemCBIIJghQCIQIqAjSCQIJZgmICbwKHApUCowKsgrsCwgLNAtkC5gLvgv4DDIMXgySDKgM6A0MDT4NXA10DbQN5A4GDiwOSg5mDoQOnA64DuIPGA88D6IPxA/eD/YQDhBUEIYQ0BEWETIRUhGGEbIRzBH8EnQSphLEE0YTbhPqAAB4nGNgZGBgSGEIZuBiAAEmIOYCs/+D+QwAG9cB2AAAAHicZZG7bsJAFETHPPIAKUKJlCaKtE3SEMxDqVA6JCgjUdAbswYjv7RekEiXD8h35RPSpcsnpM9grhvHK++eOzN3fSUDuMY3HJyee74ndnDB6sQ1nONBuE79SbhBfhZuoo0X4TPqM+EWungVbuMGb7zBaVyyGuND2EEHn8I1XOFLuE79R7hB/hVu4tZpCp+h49wJt7BwusJtPDrvLaUmRntWr9TyoII0sT3fMybUhk7op8lRmuv1LvJMWZbnQps8TBM1dAelNNOJNuVt+X49sjZQgUljNaWroyhVmUm32rfuxtps3O8Hort+GnM8xTWBgYYHy33FeokD9wApEmo9+PQMV0jfSE9I9eiXqTm9NXaIimzVrdaL4qac+rFWGMLF4F9qxlRSJKuz5djzayOqlunjrIY9MWkqvZqTRGSFrPC2VHzqLjZFV8af3ecKKnm3mCH+A9idcsEAeJxtkUd31EAQhPUZjMk5m5wzChMksrSSyDln1rv2e1y48R4/H1StIzrUzFR3V7eqk4XEvuXk/9+cBdaxnkU2sMRGNrGZLWxlG9vZwU52sZs97GUf+znAQQ5xmGWOcJRjHOcEJznFac5wlnOc5wIXucRlrnCVa1wnJSOnwOEJREoqbnCTW9zmDne5R03DhJaOnvs84CGPeMwTnvKM57zgJa94zRve8o73fOAjn/jMF77yje/8YMoKM+asJvxZ/P3rZ5EKM2EuLIRO6IVBGIWlsFr6h65L0+H0XSrWl2L7rBYb2mY4y6bOdLZ5Lj4W9ZDt1MGp0s1Ur1m8U3ZqaiE3Fd92miUo6l1QXZgNr1h3/fCK4zRRypYRS5uh6SwzbWzycmJnZXpVXinbZpIDLiqSu1avFeGaVHovv7zyvHwJuge7KzesSVF8jONk/eiXnAgxNce63ByR/y7YHzrjVtVDEa89+Km57CfqokhQJEyFqojio/g4Hbc1bq1S9+iz3t6N9hV8IRed/s1Jy8lzN9b1UnNzOVo69R+dUJ6Xh0FMsG0ZYxV9sC0pHv3AVc40YzXu3CXJX4zu04I='
# 所以我们需要通过base64进行解码,还原回这个字体文件
font_bytes = io.BytesIO(base64.b64decode(font_face))
# 有这个字体bytes数据后,就可以使用TTFont来创建一个可以操作这个字体的对象
baseFont = TTFont(font_bytes)
# 获取所有字体的形状对象
baseGlyf = baseFont['glyf']

# 建立一个内容和字体形状的映射
baseFontMap = {
    0: baseGlyf['uni30'],
    1: baseGlyf['uni31'],
    2: baseGlyf['uni32'],
    3: baseGlyf['uni33'],
    4: baseGlyf['uni34'],
    5: baseGlyf['uni35'],
    6: baseGlyf['uni36'],
    7: baseGlyf['uni37'],
    8: baseGlyf['uni38'],
    9: baseGlyf['uni39']
}

# 去爬取网页
url = "https://www.shixiseng.com/intern/inn_a7xabqqr4f9u"
resp = requests.get(url,headers=headers)
text = resp.text
# 抓取出当前网页的字体文件
result = re.search(r'font-family:myFont; src: url\("data:application/octet-stream;base64,(.+?)"\)',text)
font_face = result.group(1)
b = base64.b64decode(font_face)
currentFont = TTFont(io.BytesIO(b))
# 获取当前网页的字体的所有字体的形状
currentGlyf = currentFont['glyf']
# 获取字体的code和name的映射
codeNameMap = currentFont.getBestCmap()
# 循环code和name
for code,name in codeNameMap.items():
    # 先获取到当前网页,某个name下的形状
    currentShape = currentGlyf[name]
    # currentShape.coordinates
    # 循环内容和形状的字典
    for number,shape in baseFontMap.items():
        # 看下循环后的shape是否和当前的shape相当
        # 如果是相等,那么就可以找到code与内容的映射
        if shape == currentShape:
            # 构建网页中的code
            webcode = str(hex(code)).replace("0","&#",1)
            # 把网页中的code值替换成数字
            text = re.sub(webcode,str(number),text)

print(text)




  • 0
    点赞
  • 0
    评论
  • 2
    收藏
  • 打赏
    打赏
  • 扫一扫,分享海报

<p> <span style="font-size:14px;color:#337FE5;">【为什么学爬虫?】</span> </p> <p> <span style="font-size:14px;">       1、爬虫入手容易,但是深入较难,如何写出高效率的爬虫,如何写出灵活性高可扩展的爬虫都是一项技术活。另外在爬虫过程中,经常容易遇到被反爬虫,比如字体反爬、IP识别、验证码等,如何层层攻克难点拿到想要的数据,这门课程,你都能学到!</span> </p> <p> <span style="font-size:14px;">       2、如果是作为一个其他行业的开发者,比如app开发,web开发,学习爬虫能让你加强对技术的认知,能够开发出更加安全的软件和网站</span> </p> <p> <br /> </p> <span style="font-size:14px;color:#337FE5;">【课程设计】</span> <p class="ql-long-10663260"> <span> </span> </p> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> 一个完整的爬虫程序,无论大小,总体来说可以分成三个步骤,分别是: </p> <ol> <li class="" style="font-size:11pt;color:#494949;"> 网络请求:模拟浏览器的行为从网上抓取数据。 </li> <li class="" style="font-size:11pt;color:#494949;"> 数据解析:将请求下来的数据进行过滤,提取我们想要的数据。 </li> <li class="" style="font-size:11pt;color:#494949;"> 数据存储:将提取到的数据存储到硬盘或者内存中。比如用mysql数据库或者redis等。 </li> </ol> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> 那么本课程也是按照这几个步骤循序渐进的进行讲解,带领学生完整的掌握每个步骤的技术。另外,因为爬虫的多样性,在爬取的过程中可能会发生被反爬、效率低下等。因此我们又增加了两个章节用来提高爬虫程序的灵活性,分别是: </p> <ol> <li class="" style="font-size:11pt;color:#494949;"> 爬虫进阶:包括IP代理,多线程爬虫,图形验证码识别、JS加密解密、动态网页爬虫、字体反爬识别等。 </li> <li class="" style="font-size:11pt;color:#494949;"> Scrapy和分布式爬虫:Scrapy框架、Scrapy-redis组件、分布式爬虫等。 </li> </ol> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> 通过爬虫进阶的知识点我们能应付大量的反爬网站,而Scrapy框架作为一个专业的爬虫框架,使用他可以快速提高我们编写爬虫程序的效率和速度。另外如果一台机器不能满足你的需求,我们可以用分布式爬虫让多台机器帮助你快速爬取数据。 </p> <p style="font-size:11pt;color:#494949;">   </p> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> 从基础爬虫到商业化应用爬虫,本套课程满足您的所有需求! </p> <p class="ql-long-26664262" style="font-size:11pt;color:#494949;"> <br /> </p> <p> <br /> </p> <p> <span style="font-size:14px;background-color:#FFFFFF;color:#337FE5;">【课程服务】</span> </p> <p> <span style="font-size:14px;">专属付费社群+定期答疑</span> </p> <p> <br /> </p> <p class="ql-long-24357476"> <span style="font-size:16px;"><br /> </span> </p> <p> <br /> </p> <p class="ql-long-24357476"> <span style="font-size:16px;"></span> </p>
参与评论 您还未登录,请先 登录 后发表或查看评论
©️2022 CSDN 皮肤主题:书香水墨 设计师:CSDN官方博客 返回首页

打赏作者

Garrick不想996

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值