浏览器端反爬虫特征收集之字体检测

最新推荐文章于 2024-05-05 12:02:54 发布

FserSuN

最新推荐文章于 2024-05-05 12:02:54 发布

阅读量849

点赞数

分类专栏：爬虫数据抓取爬虫与反爬虫

本文链接：https://blog.csdn.net/Revivedsun/article/details/98261144

版权

爬虫与反爬虫同时被 2 个专栏收录

25 篇文章 9 订阅

订阅专栏

爬虫数据抓取

15 篇文章 1 订阅

订阅专栏

1 背景介绍

现今的反爬虫系统主要思路是通过收集访问者的设备特征和行为，并在后台进行行为与特征分析识别出异常流量，达到反爬虫的目的。

这些特征中字体也是一项重要的采集项。即收集一款浏览器当前有多少可用的字体。我们在看前端特征收集脚本常会见到如下的代码[1]。通过设置给一段文本的标签设置不同的字体样式，随后比较宽高，来判断字体是否存在。

var Detector = function() {
    // a font will be compared against all the three default fonts.
    // and if it doesn't match all 3 then that font is not available.
    var baseFonts = ['monospace', 'sans-serif', 'serif'];

    //we use m or w because these two characters take up the maximum width.
    // And we use a LLi so that the same matching fonts can get separated
    var testString = "mmmmmmmmmmlli";

    //we test using 72px font size, we may use any size. I guess larger the better.
    var testSize = '72px';

    var h = document.getElementsByTagName("body")[0];

    // create a SPAN in the document to get the width of the text we use to test
    var s = document.createElement("span");
    s.style.fontSize = testSize;
    s.innerHTML = testString;
    var defaultWidth = {};
    var defaultHeight = {};
    for (var index in baseFonts) {
        //get the default width for the three base fonts
        s.style.fontFamily = baseFonts[index];
        h.appendChild(s);
        defaultWidth[baseFonts[index]] = s.offsetWidth; //width for the default font
        defaultHeight[baseFonts[index]] = s.offsetHeight; //height for the defualt font
        h.removeChild(s);
    }

    function detect(font) {
        var detected = false;
        for (var index in baseFonts) {
            s.style.fontFamily = font + ',' + baseFonts[index]; // name of the font along with the base font for fallback.
            h.appendChild(s);
            var matched = (s.offsetWidth != defaultWidth[baseFonts[index]] || s.offsetHeight != defaultHeight[baseFonts[index]]);
            h.removeChild(s);
            detected = detected || matched;
        }
        return detected;
    }

    this.detect = detect;
};

2 原理分析

根据专门研究者研究的结果[1]，检测原理利用了如下两个特点

css字体的优先级：css的font-family属性可以指定字体列表[2], 其优先级为由高到低，如果一个字体不存在，那么会尝试使用列表中配置的下一个字体。
fontSize不同fontFamily的字体其最终的宽高不一样。

可以看到上述代码中先构造了3种基本字体的样式，随后待检测的字体作为入参数传入。检测时候将待检测字体放在字体列表中第一个位置：

    s.style.fontFamily = font + ',' + baseFonts[index];

如果待检测字体存在，那么与所对比的baseFont所表示的文本的宽高是不一样的。那么说明这种字体存在。反之如果字体不存在，检测的字体和对比字体所表示的文本宽高
将一致，则表示这种字体不存在。这里baseFont用了3种不同的默认字体，防止因为某些字体不存在导致检测失败。

3 参考资料

[1] js字体检测,https://gist.github.com/szepeviktor/d28dfcfc889fe61763f3
[2] css中字体优先级,https://developer.mozilla.org/en-US/docs/Web/CSS/font-family

FserSuN

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录