1. 前言
最近接了一个私活,破解58同城的css反爬。(被鸽了)现在决定把它开源出来,以便大家参考学习。
2. 主题
首先,打开页面,了解到这部分信息是有字体加密的。如下图:
这部分信息包含 性别 年龄 学历 还有工作经验。 这部分信息需要经过转换,才能达到我们想要的数据。
可以看到它数据加密部分,都引用了一个叫stonefont的class,我们观察一下这个class
经过观察发现 这里引用了woff的一个字体文件, 我们把其中的base64编码部分提取出来,保存为一个.woff的文件。
python代码示例:
# -*- coding: utf-8 -*-
import base64
font_face = 'd09GRgABAAAAACJgAAsAAAAALkQAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAABHU1VCAAABCAAAADMAAABCsP6z7U9TLzIAAAE8AAAARAAAAFZtBmY2Y21hcAAAAYAAAAHrAAAFTgf83VJnbHlmAAADbAAAG3cAACFEC30Q92hlYWQAAB7kAAAAMQAAADYZ7i/JaGhlYQAAHxgAAAAgAAAAJBFoBf1obXR4AAAfOAAAAD4AAAC8Ri7/tmxvY2EAAB94AAAAYAAAAGCyTrsKbWF4cAAAH9gAAAAfAAAAIAFCAJZuYW1lAAAf+AAAAXIAAALQd5CEoXBvc3QAACFsAAAA8gAAAe3GCLPEeJxjYGRgYOBikGPQYWB0cfMJYeBgYGGAAJAMY05meiJQDMoDyrGAaQ4gZoOIAgCKIwNPAHicY2Bk+8g4gYGVgYNVmD2FgYGxCkKzCjK0MO1kYGBiYGVmwAoC0lxTGBwYKn70cZT/fcHwmaOcSQIozAiSAwDEswwFeJzN1L1O23AYxeFfnJR+0YT0g7bpR2jatE2aMjB1rMSAhLgAsnRjhYEFcQsgLoAVMXAxIMEahiAnIZbtmMgxygQ95mXvFKmOnsS2HMf6n/MGeABkpSE57c6S0R7OjM5m7s5neXJ3Ppfp6PgPv/Wdj2y1T9v+xZq77x67551et9k96d1cLveL/YpX8Krekef7jWAxWAm2w3K4G7YGzmAjyker0dmVP6wND+JSvBkfjrKjZlJPlpL1ZCdpXVfHC+O921v9zqTvP7ktozWa3Cu9v6NMckphioc84rHyeco0z8hTYIYiz3nBS14xy2ve8JYS73jPB6VWZo5PVPjMF6p85RvfqVHnh3L+ybxuPjXRtfnHwv0n23T65vy6P9KqsHVPj9g+NUqBtm/SCbpYM+kUufsmnS732Cgt3HOj3Oj0jBKk2zTKku6JSaeud2OUL5fLJn26ftEoc/oVo/TxCob0s2rUCLwjo27g+UYtwW8Y9YVg0ag5BCtGHSLYNmoTYdmoV4S7Rg0jbBl1jYFj1DoGG0b9I8obNZFo1aiTRGdG7eTKN+opw5pRYxkeGHWXuGTUYuJNoz4THxo1m1HWqOOMmkZtJ6kb9Z5kyWgCSNaNZoFkx5Be2zKaD66rJv2nHC8YzQzjPcP8X0MmGqcAeJxVWQ1cVFXav88595yLiMgMzAwaKsMwM4hIBMwMIiGySoqIpOQSKYuGSERoSuQaESohESISGhoZkZqZoZm6rZqhua6x1hq5amZmSkZmrln5AXMP73MH7bevP2fmzjD3zjnPx//juRJIUt9VKVoKkIgkuWIMAcMDwiT8R/FzVX6Ve0uDJZNkliSTzhxtNARwxTAcDAGyBXQx0U5HrJnqnBYTHvmZ/wH/PP7bmhc2fiS+uyB+37tmszh3LAdeef1d8Srlgz55s+LISFn/r+Zjt1iOuG/V81+rT6ldnK9ZBt74e1ySxGdKJrdLA6WR0v1SjBQnjZMkf7PO5lAcZoPF6aBmpyuGK2aiEHOI3WY3BrucLjNh9lB/s9MOZqfJbmZmhxnMJmUkcdmtiukoj63x7XE3sRWkc+CA1oHedH2A3iY62zu8jOeg0Ojn4+6ZrI4sII0z1IlchgDiJJSpHeRz4sRn7UGIs8dNnLIsjvQQbnenUMozM8vzRUV67uNThGgUB7wmFAXuWlQjJn5Uvx6mDc+AdcHy8e5uKI78sN1G6TtaPLU4r1V+4sOkcIyzWW+gVuawmnShJuZ02Z1mZlKszF/BpYfiK3VamXEwhA6CQppHqcgWbvjuU3Xez+JDmfBsups0MCYGiTuck32ECB0kqHcoXQ3x8AYDWbSpO902B8y/BbcJacE9JHBg3o45sgxv4OXiS5NycRtis/gVLzfkviIZ6N01hnndxzukoZhvjL1TsYdamF4X40fNBrOecUWiBGxT6Jjx7m14Dvm4CLxZG6PPuqvqqDE/Wn3mWLMaE5YIC8i3G0xDKIVBoBO/iv82u7/rm5sEnLyhvikanJ7fEl8oJ/h6rD0JZAZmh8uMgRgB48CuhcTsHAdOu4maMQ7mwVpO7UwxG/hgMJoUyQpuGAAxcKS3ZY6wUPp9+V77xEdEKiWMPBTgPSXrgdVP0A+4TNw6ZZ78NedlvfPmyHXv9RVU1c0V3iIYxpaugZWNt/2qGaGP/p3z5wIGRNz3iRaVm2P0U029XvWE3Gxu+Do7aWITeJ27m0NJUobyEZJVkqy6WJeFK2DH/za7xWjQ2VyKP3WZcH0xThe4TP7UpvCpg9nQAAExCWJooD6Ie+cGPUfpsjr4NFanxBrgi4G3OJ+d6iOTF3YtjOh9gjE5Lzzs+aLP3QoWIs0j73Dee3bphNUxcgjn7mnqJgpap4rzyh7eJAVJFkliWDIys9lDraFWvT/GEcNnwj4xmAnIFNtFNknKlGKxR7wsTqsvhMaTh2CY6Bad1YWCjMZyGr8rY4r8jvjLn8W3kLtafaJ4HpAesJSvO/TRpXdSspd99PJLEAZDG7HQfgvtOdmo/iK+2kYviDNnSp4FgwcrxD5lBw/C3o2WXFrX6geDS3FYmcECXAkFoi2vv54UXJ2WaCxwreRxnUa7YtYa3cpUthvrlL7m3imON/Fd8SHqEnF1tzrTmEw2bnSfY1h83z4GJ4xc9SZEgTx2HyGymCM2e/nROi3Z9RBxrLcMTPQpHiQitvnJcib+7z2VkZoFjaKhLiF7KcTGBS3E7+ZXzYoiQIPHxMhydV5CXLWo4T7iFGOSJON+rnoBnyINkeKlqdLDuCMMrPN+MBvsCq5YZ4vBnTldlCs2MGtI078zk4KVGmoxaaWgV7B0HRTLI0bnp7hkbGy7Jwwup85q0cWYL88VPVh75Gi025kKZe/NEBeqGfDY+FhaCMkuEGufwlaGW2/QSFlWf+Nx8OzszhplMK5dncX5omBoapiQCglnCsTV61WiMCoSqopT1KE3bpAEqIKj02KhkvPu4+4JnFdM/C8cK8hJIB9e95blRGMeBZJvC+uktPca/iZc30lSCvPd5YRQWXwuLmxNO9Fac2G7uNmSmth4rhaMHWqa5Km7S0oX9uwYKR0RQu+PcGw06ayKhmT4bNa5TDqnMgIU22DQWfHVrHdpmGbWGU2ylSuyNZQSjJlLjwE1Ear3xzIn1GbX+5OhzT6Nsn7mJvH7z/Dko+Ld+2Dh5GlvBp+mdMrLy+HNAYhbVdiekIUHW0sbQRDf0AAhYTMHtMUVh9HOJft2NS9aV7f3w53VLc27zyQm3th3/FmEnMDtjR0FVCaklhCIk+Va8ObY5Yhf8+wYB3GM0sVDxDdHLk0dJ6rCCPfNnkTkAMgQoi2Hl2eKU+I8REBarCgR28UJcSoPjMDF531SchgEwwTI1mDBC2NzW7nOP5eSpBQpTZohPSbNkfKlIoQNrBAbw+g4MESI7g6XyRA6ApCuXBghq8mgQZzVpD0pRoPiNCH+YwEhBGKH2CyKP7hk7HHtMmYTNXlaHbC28FQXwiZ+y6r9SWd22JkJxhLs07nq6V/IcLHicfiP+AtsLw2B12dHIqBkTGIMwveKYIwGRP6G9aXGHyTkyXlwlPOa0uUINoUY40Feyyh9OR62QkiSqHHfIGGcq2fYwCyxSbQFhUOB3CQkvMKI77jaQyQhYUnmgcCoEhBRIprSurAq7OTqds6rkWEOcH4Af3HB/WWMTSqhtHq+I5TzaK90WSYVEVjQSGV/w+WN9GeDwjmno9yLlyzMqy5n8VMXKz0/la+blZCWRaPns2zOV4oHD6QPv2pbJ7wOyHKlFnutLq8o/+arUTk8pKEPhtYayjCCSKCmwcAwqhh3Az50ARaPdIiJfhDjqYGlJ+7MYJaYBlQ6u6KT+gPsGkmbEV5gqahUe+KwM9rEKvIEpRsFtg39k/oSqVAZCVZXXys9wPW7lpa48+R0aqv2wqaGVbBeTcD4NcaGqUK0/3mM6FbPBKYQLwhR0xnMGjZkejun1HcphuVPs9OavAYqFORaGZhwX5Qn5OsLLi6QZ19jLNCJ9F0rjmVPT94OcaIEGbxRnDufML0NJO97nERH8WzEXuQkX1AczlBXjImSeZFfz34OAqekyvIoNjpykak9ZG1yZc8jHrxGkFvCM6RA5I8IjFis0xUJjthEiNbEHVeGg58vBEeCK8DoUXm2YO4fYIx2xtpC+Mst5fNKn/70146Fi/PLN5Z3i28vV7RvW1Wz433xy864xpPrX/lmDXxVdTFqzK6C4r0F8/cuyP8g/oGL4tYXJSWdtTVb3llZseNdkl6wdt2JhtUa1vbNV27zQ6hAbdKD0gTckiOUmYwa0GqgaeVaOgYA1jcCLSZKhwCrpRKcLq5wZtde7B40QTlokgDPMMiMJsMNmPoi9vfynTD6Vn4fyo1rYnVYHEmBPHcJiRfpnEMsOSKyoDbqpBoAv/f+VxQdueLDb4rW0NfePFxoO7uvad/BX9OnNQohwqAFhnNxXVSxT+sxaemTave3vLaueufWT9ZMSj4IHT2R0F2u57xeBMpyPTIL2TrUlpkUktDY8qvPgGoIga0JYoPo4i+8kipyVd/MSdOT4/I8msstj+cWKRh3HmuzhHDFkQgx0bKWB19iMBusWMiYGlci0Ic3VuceeSTt0PJOkF4/eHhFLCXuTYtkefLWXXtflBc/v3ruYwenZX7X/m7PiqqSwrQ2zh8Iqtr/flnFoXsalCmdnEl6rUv6FadL0w+gs+jsLE5pQoo/rC7ifBZ0gLNUdIiMvQ+rYzl7bDIK4JVjK8SNM63XwKu3kCMF3b2mOKsc4o3SKKw/FIig8yg2ncYAoSYF4d/fjprRLmkKj+likOYNNjOTlEOxmSJIXIAODYOdwyyzxKTIDCzvs4T0qb+kO8kYGOr+agtpb4Z5BJaIk+JZSOyOzYJnyEPubp4VKfI1QbABcXwVkNn7hSzCUiPhSzKgEXVngWjivJExdRtKNty3VSnhT6GoDUeF4kRWT5ImSZ4laZVm1js0xHVgSGwGRY+oakLidoSamdFF9bh4vQ4B16AgsGh12P9Aykfwdd2P8OIi8STlkXGaIk7h3KVf4PYixLsGg9nFMwgJL+x9EBaIKOxwWfQ4E8+zAz3nZJmFIvBFwS6R3v9we8EuTbzjIaVpYgMDOqeLkNdjnxPPXnE/jmeXwGIES7L4c1kZq35GabEsjw4WO9Qi0ZacBGEkiNwIm5vz71k9eyAQWlDi5b4985/V9dWyfPH4PayQJ/F46T58EyAhFDqMwU4/LDrFX2NrOyPIO9GyPEl8Kc4++TpM/Oe/P2p4Zgz52ECIUf3Bf4Bx1U0YQrwuiSvjj+V/AGGbB1E9pc/fj1v9bvJ3/HdPP/spP/Oz2M+aTsaqABfymEsTRzo/LZZDkOxQNnlaFqub+bs0GQCeZ3q6YWb6nF9JHKV3rohzUROuk1NLxqur6evNO9r2r4FrYmzvqqKmspJtsyB3Q0HGznhem15cgfC83CiMaoJoTZ4EPqSaVDFWfMcNq8meKAxPBRa2XByLZi0hFmqJV4RoEsmRUIvh6LslTij7eRmuNUFK1qrCroloBBsUd6ipUUnrPObDoG3DZtdZ9FjQim00uCg6X1z0ODAxI9Ukj78dN6N9y7Nl24CuEnEEJsnyhmrfoVAJ5bG7WyFRluNET1b90oUVi9ah9iptDSFbqR/nzy+oxbfje33Eh8gMMDaHEFU6P+t8TUGdOJDbVVKQcxOiYUnd3CN+PqmpQQmNgXpxbN5IQi5mlJUuqi90R8AS2POoOH5q+uGUAfK/8PQuVWwd6F8/PFG89XeNlda/lyIvyttWUdogOvJ3FS6de68mHuUhkg++IRK16BEaNASWH/1N/Gg+boWAH1SJvAdmb7ZFzFRHiZ9ARw7AK+9h3/bPCUxKLT+N+b5PcyP+GLkYNLI6C8pBi05TzE5/qxk/0SvYNDpUNcpJSmR5OxK99yZ1w3a1dfZDhPLB6gaUA/mc9qwiOah4WzlllgsnxdJI98IiNZB0F3H+Fzj4rYh3wsElYp8vzF/56JF7uPaIcpi/onGZSRNCZv1AMGv5k/w1J6RDPtSkgKKZDEOoXdFjLxuwvX1pi5pAjiCOqNdffC88GYZu2uRIgkPbcSUj1VvkHKWqDKHYS/mpFxijBe469araaQCxATpFHRRfP5yMEhyORTRUwdax4hUUPc+NENGcb0A5I358KWedLGv6pK8LY2yVjFIILhf7jdAYZ0y03qA1oaTzk8zRJrjHtUgB9PZfKz/9jbDiXz69Ln46fVX8CnMgeHOOmvHWyoqWV1+u3MQmJIpW8cV/RM9Xl8Q5eBpmwAa4NNoN688eaGzZtfteXFR5APdD3JOYGc0pbj6G3ZXkeh1F0WmXuftjzs1EFdsPiiSNt8Jx06cQiL/eSPHdJYqKbBcC17CSZVsGivUwSDxM6VvJkscn9T3g5ce/lCKlKeiS/qwh6l3vaVM0a4d+ABEUJS4aJjz4Q3w5MEdOlwXBVeHaJy6nXfMAuESbSfEQCFGMzKU3KURrQLuTBsOoByKbIkYRdfxmdc9jWZA2KQPNPlQV/f6CqMSjq1VLffS1BeWCz3/sthwdfzO9zC9o7syaqKCm7FThRrnpV0KSsGarQzfRivIDpcu3q4nLtzZVbWF7ahvCn3+2DjvkzneQRcj8QIGuvF6Wy8vUsitkeqZfOcluuHjzCfdzoiXS12f6zEAvnxutR9EUD+85JrYhJCtDjYY5IqnozM2cPuloFoRczbjdnv1hfw7+Rq+w21rmXZ4M+5ktSDvBulhzCIq1mGijwY95hmoaIOe+caxlv3hs2ZN+8Iy49FZ907+OXhcdWz8RX/acW3nyudcrwdaKuQnoK/571r83ixN/lvWfrvuiT5reP7MT/1CO82WSzoO9kmY9qV5DrX4DrqdgNEmKFlA7MrETjAi5dnZxU93tczWfHf9BREUmwu22fbLayL1FcGlywqIcsukceFVtam6u4DNEk3pa7O+TbpeL78WbRxPTuhGRIki26rc8QXSmRzo3Zpay4p7CSjbrpDiTBbEnPdr8R8SHZdibD3sYF52/SZHRGvKBgALO7ikJLBOnv8WjE+69mB2g18AEQRb7V+knahM67Xs47PTgMA9U5aFkKuxAW9ir/hRE60EX4J2a8mFmi6+xomGbXh/Qnat2x0WBO/1iociMiIXyueKGajs1U1yW5U1p5+oZuxqliZAuxibGTqxaUlpYghKPFzUFsZfrkTgcszeKdv1UVG8fdWdmET4zUy/zfVegHZmlJN0JRSRKRIgjyYnIukuahBdMggplGfqj6oMdnK/IefxwfXxEfEN1fm117w0DvfawqD8zs9+39P3KylBHDdAUGbPYqZlaIGYAoCO2cAX7teyb+1aoz639hqSfgl6x4crAQWygb/foQX+BySKLD+r5Fp4klSOmptjUMurJ/znlE74Uc58hzdTyr/enMUZ/qs1Y7Kx/gGcxUSeGkHvKwKGz3FM72HnI/k4LIwq516eYFITtccA8DtZpRyg0SYzsuIZiBPUJp9ok5Uux6TXYmR4r4ta8T0jhAV991eHU8KJJ12R1OalgLCHrv5lrA73115bC3EDfQPf0vT7GttxKvdFH2B40+o+A9Zx7iQUzOMgPwEwILxXHRTONpOPVdjIlBFEudtqEFzG7pr60JJhBzC+pX8cWzUqeUBIbXjMphzQk0/g9AgVKIjaw+PaZNvzqon3iMk/L8atshqHk0dPdjO15HTOsGySKOiBoobjsmeeRvp/7zsn57EfsF0RI0DkVE9IVhsSu1Z2EkfGnnNDyV//akVd5EDlpNGOHoOoFYYRbslxwjf3oBrxsK2OisXODbfFXxImaAx6HF/aLN/t9qVDu8OEe3an5Uk1YmLQJttmIIHy3zlmMLiDGjLHuz8pICNY9Sv3CvdSZPItzd4+vjXZirN0+TWpmRBLJ3K0GyFJrA+ir9X5eabMC/IaiP9kieuiuEyA1ezX2SSdys9eps0TthFQIJ7n16o2NpYv2Bg0/mncMVV9/3fXIT/Gx6CA1J4D0E4wchEzk76cdmYMXgB8QuCp+6rkq1GMwBmJFpzhE3Mgxm92XxRIxB2pgmfoqWUxe8lwPyWAOT8Z9WvpRTq/jdo4XdMRKMdEeggux+/8PwZ2saDzxPSH5v3zSJ0Hgtz+g7N0pTr/19NMtq5Zu2bzy+U0/p2IXRRJyGHxOdYFVrBWbxQwR7ZCDmv/29rNvf31QG8X2XRA/K8dQ1w+TwlDV9+u3aZhJGoNSDLNIXagx794tuFfN/n9IT7MLlD+G2pgJQE3AFFM/7XiemdPKMSfgFdkRGybmck5yEgc84+vDhShq8ParX1jr5eMlgs2DIkgy6axf2FRaCbm9UZR8absB+UnhouX8cdEQmgR1J8VWGod+Ki39Rl2msyFezKqdm9yY7M7tBiORu4R8Rz2rFdIMh/uUZ+B/XZzgqZn6becgBSa8QQb07IM8qIxCEGqrGaUMSRLTj2QkgAzFYp24EJsKAXBEDa5B/VFqEycvhENAa4ToOh8NUkBkckZiVIqkoYzUN8zLG/k6AiOVIxVIT0ul0vPSCuklzXXbFYPFYVVQ6yr+WgNgrToNnhmV08480yltbGU0URtilA553l9xxngaBgXVQNCABKUfIvQIMGijq/83atGEgd1mVYx2po3DLIoRrRRmgGmfIwD1y2abXeuBEx98Ui8uTCdk5HzUYYRztZnkYWge5HzLMLilDUBf8YfRtg8o5TzsjrHjT/JUWQ5u8sqZO20iRqBDhst4HkH753xxpThIZH590VYvv9by4l7BEGFL8ZOx4uRgBmoPLGKse/J4QuliUQOl1P0iLEIVJOrog9n1FbmlNG5JZWJDIdwp5ryiglK/KkpJV0TYrIq36QTON01ApogPiGasEu2f72lV5WQDicUfEGU7UUQQ0huAav4beApax6kO3/x5gxGmKpawJy7KSZm+RZ1O2D5fHoYCJsmY43pYlp2+kFw1ffrwuTzcnZmXk1+Yx9qTE/ITncjvfTfER0o7cuko6REpW8uZJ3roq9GLjNBQxeGRdpp+YkZtoGX3TAtdNsqcLkyT04486hnS2j2Jc7oCTIqJBvSrME2XYbGb9SYnOUNQXHH15o9gJiQywYmJCEY3fQH3so0TeZVwfwdpabFi/0VxR7SEJ8E89fwC99pXVx1IRzV9yR5RvDZhTcHyxU/JaGGPZSz08S1NmLnyJb5QXZbdeZX8Ujnu6OgGQpJBwoRtaZ716OrcznDfJLHw9vuzs5IrPzr6clJ6yYc3oLpxVDjsYOyxKOAcKlGHluG6NGu9zBqamzCZqRUnMdWZM32ySdTo+yFeHDl74a72zVN28AptruLyDI+0m264b/1gCPWICaxGbZaEhWdeS664q9GPLJHlJYt3aB77x5SH8OU14f092RWUKQO7s0SmOTGL4bMVEI7sJ152dODLzKgKzL+7FWvCWaDBIEPMv6z8wHei3ouTJnh0sefugfb7/RNfFDSYIIvpDyGj00biGHREJaU/a7w/a2hiPB2heDoF/4oi2eo5a0TnTHENcxGIK1vFCPl41otYVt2UptQ5ixJzGwqqhtKrRRsbczdUbrgI2VFO0XT2sqgKi4NF57eVbCFB4ZOq0p3LyfLeNAbmmp3tCcknSHDR2SNL22A35ENuhTfUvrWH0sNogaxD0h6Z59ydmZFT01sgR4i2IzOhTXTt3QuRcWJL+4Hc+Fnl7ecakjKWH2+HFLJlUWN8TVl8bVHTBHFChIeD1+7s7t8Kbnju8d5Q/sHb0GtapSjE6z9JkzVNiI7E4ghFi21XqMXfjkdGw/+aBSvTU6tZh8bNqKHT3ahgULTbvgZnDJa/A8+2wpP4RZCb1Jpp4tKMiZOaZJm9thQQJ7qXt/oEfDh3qXi3930ugw+tVitLvLDPX1LULtoYEbZujJM+eTuDEfaZe/h7Y0kPP/yTWAHj5ZPCsnN/EiINGne3TKOW4mHyW+LBm/KEbN+lBI27mg37CkQO1oFPEDIWT3hkfMLjCypzQkTFTkJa3OHbGNtG7t5TH6Y08U+kUOkBrA702/+rzglFzumvlP6shyK62kye+0f9/Wl0QAxapMEgR6V2djXUbGk9IzrDUiHx1JnOsjNtNTnu3YyRsJS0mry4YhiwpjY7lJRsvFZ4ykffumiDiGzifErIKPl70SDKREdn4Y3LYufVuPRrqLoiwZgtuq6ABPnWp2rKUlpK4mS5gVP44M4G4cZdTc8NIE2sSH1NfQYbw+P9zEor9lcQcu9D2v0gwGXiZpwu5xCwsj8WjfkLtWLO/JXQwaBNA1xUY4DB4LJbTQ6tIbQZAeohzSZa8c+gJdqkQEtEVFN8XHlYMO0iB2/XtXkFHi6oU1eoTctAplmouQ5//faopxhsIwjevcPqsB2kLUVRqLu3bBWYDbhkrnUvrEITmytmz4jGb60X8sHuU6kJ7Ye3Qiv1YuzOEaF9MyHfp4o1YOW4m8VbHyMgIc4fWkGq9hiwqQqHM7aTsS3e1YS8v+MYpfgqPzBZTcMqovzVZ6T/A9KIhe8AeJxjYGRgYABi02dz9OL5bb4ycHMwgMDNzE9zYPT/B/92cUiz3QJyORiYQKIAadkN3gAAAHicY2BkYOAo//uC4TMHw/8H/59ySDMARVCAPgC5qwd3eJzjYACCFAYGVlYGBg4GVMx+EVMMhlk2QjCMjS4HYf//jiwH1pOGqYd1IUzs/1tUs/8/gJrzCZsbADb6EhsAAAAAAAAADABEALQBBgE8AZYB2gIiAooDFgOkBGIE4AT6BUAFwAX2BiAGcgb2BygHgAgACCAIYgiyCO4JHgmwCeYKMAq+CuYLfguuC/oMIgxeDPYN1g5sDqgPNg+2EBwQonicY2BkYGDQZ+hi4GQAASYg5gJCBob/YD4DABuIAdkAeJx9ks1Kw0AUhU9sVWxFQcGVyqxEUFN/du5E0W6K0EWh3aXpTI2kmTAZCz6H7+DT+Azik4gn6VWpQjPk8t1zz525AwNgC+8IMPv2+M84wCazGS9hFcfCNWzgQrhOvhJeRhP3wivUB8INHOFBuIltvHCHoL7G7BKvwgH28SG8xN5P4Rp2g3XhOvlQeBk7wY3wCvWBcAO9YCrcxEHw1lDq2unI65EaPitjM38SR84l2rHSSWJnC2u86kdtnXT1+CmN3I9aifNZT7sisZk6C0/nC3c60+77mGI6PvfeKOPsRN3yTJ2mVuXOPurYhw/e55etlhE9jO2EcyuuazhoRPCMI+ZDPDMaWGTUThCz5rgS1p30dJjFzCwK/oY+hT59bXoSdBnHeEJadf73/joX1XrVeQWpnEThDCFOF3bcMWZV19/bFJhyonOqnj3l7codJqRbuafmtClZIa9qj1Ri6iFfUdmV8920uMwff0gXd/oCX7iEuAAAeJxtz8lWhDAQhWH+dmjneW61nWdtCATCkibJu7hx5zk+vp5clmbznQTqVlU2ynQm2f9nxogFFllimTErrLLGOhtsssU2O+yyxz4HHHLEMSeccsaEcy64ZMoV19xwyx33PPDIE8+88Mob73wwI8/4GX9/fYbcFzLMk0Vtk2Xbyd5L3yerUCetbWUfknWUTaU6VyjXGZNs8yiN8ttO9e3Qd26U31fl4HBvlOcL1Xund+9VF3Ll/w2QjLnmiNZJn77HYb9YmFZ2qU8si042w71L88Uq173SntEa/WerRtZBOuVazRNr7Rddqb7OhSz7Bdvma/oAAA=='
b = base64.b64decode(font_face)
with open('zt01.woff', 'wb')as f:
f.write(b)
然后使用网站http://fontstore.baidu.com/static/editor/index.html 打开这个woff文件,如下图所示
可以看到每一个字都有对应的一个编码, 通过观察我们发现,这个编码后4位,跟在网页源代码中的编码是一致的。
我们可以用fontTools这个库去解析这个woff文件
# -*- coding: utf-8 -*-
from fontTools.ttLib import TTFont
font1 = TTFont('zt01.woff') # 打开本地字体文件01.ttf
uni_list = font1.getGlyphOrder()[2:] # 前两个不算
print(uni_list)
# 输出信息如下
# ['uniE0D1', 'uniE0EB', 'uniE165', 'uniE39A', 'uniE3CD', 'uniE3DC', 'uniE4E6', 'uniE559', 'uniE5CE', 'uniE6FE', 'uniE74A', 'uniE811', 'uniE822', 'uniE90F', 'uniE925', 'uniE9A9', 'uniE9EB', 'uniEB2C', 'uniEC43', 'uniEC4C', 'uniEC7A', 'uniED1F', 'uniED8C', 'uniEDDB', 'uniEE02', 'uniEE6F', 'uniEF0E', 'uniEF58', 'uniEFD2', 'uniF0EB', 'uniF129', 'uniF1A3', 'uniF31A', 'uniF373', 'uniF3A5', 'uniF403', 'uniF459', 'uniF52A', 'uniF547', 'uniF56E', 'uniF58B', 'uniF5DB', 'uniF625', 'uniF832', 'uniF88E']
这样我们就得到了网页中看到的编码。然后我们自己手动做个映射关系就行了。一共45个,花不了多长时间。网页中看到的前两个不算。
你以为到这里就结束了? Too Young Too Sample! 如果单纯是这样,那就太简单了。
后来观察发现(其实早就发现了[小声BB]),每刷新一次,woff对应的文件都不一样,源代码中的编码也会变化。那我们前面做的都没用了? 不是这样的。
重点来了:
我们任意保存两次请求网页的woff文件,然后保存为xml格式,静态分析一波。
保存为xml的代码:
from fontTools.ttLib import TTFont
font= TTFont("zt01.woff")
font.saveXML('zt01.xml')
xml文件展示:
注意了,前方高能!
我们在网页上打开这两个woff文件, 找到两个相同的文字,这里拿性别来举例,也就是 “男” 这个字。
在woff 1文件中,它的编码为
在woff 2文件中, 它的编码为
我们打开这两个woff文件对应的xml文件。分别根据编码找到这个字的具体描述部分
woff1对应的 ‘男’ :
woff 2 对应的男:
通过观察图中的pt值发现, 后一个pt标签中的x,y值 分别减去前一个pt标签的x,y值是一个固定值,就算切换woff文件,再去计算,仍满足这样的规律。 这里woff1的“男” 前两个pt的x,y值相减得到
(1786-198,1575-1575) => (1788,0). woff2的计算结果(1822-234,1611-1611)=>(1788,0)。
通过这个规律,我们就可以制作映射关系的字典啦。 用计算结果当成key,对应的汉字当作value。
# -*- coding: utf-8 -*-
from fontTools.ttLib import TTFont
zt1 = TTFont("zt01.woff")
# wods列表中网页上按顺序打出来
words = ['B', '男', '王', '大', '专', 'M', '女', '吴', '硕', '赵', '黄', '李', '1', '8', '经', '2', '下', '本', '届', '5', '应', '科', '7', '中', '生', '6', 'E', '陈', '3', '以', '杨', 'A', '张', '4', '无', '0', '9', '验', '博', '技', '士', '校', '高', '刘', '周']
uni_list = zt1.getGlyphNames()[1:-1]
data_map = dict()
for index, i in enumerate(uni_list):
temp = zt1["glyf"][i].coordinates
x1, y1 = temp[0]
x2, y2 = temp[1]
new = (x2-x1, y2-y1)
data_map[new] = words[index]
print(data_map)
ok, 到这里字典制作完毕。
后面抓取数据过程中 我们只需要 抽取抓取网页的woff文件, 计算每个以uni开头的值所对应的key值,根据key值到data_map里再取到文字。就可以制作当前抓取页面的 字体字典啦。
验证
完整代码: 传送门