大众点评字体解密

最新推荐文章于 2024-07-25 15:36:20 发布

qq_36532060

最新推荐文章于 2024-07-25 15:36:20 发布

阅读量1.1k

点赞数 1

文章标签：爬虫 java

本文链接：https://blog.csdn.net/qq_36532060/article/details/121625636

版权

文章目录

前言
调研
采集程序
小结

前言

最近有一个大众采集评论内容的采集需求，于是又和老朋友打了一次交道，这篇文章就是记录一下过程。

调研

随便打开大众点评一个KOL，查看对应评论
在这里插入图片描述
点击检查页面元素内容

可以发现还是老一样的字体加密，但是直接从元素检查页面看不到加密之后的密码，全局搜索“现在很难”后发现是接口形式传输，

可以看到e7dd对应的是餐，ec90对应的是厅，下一步去寻找怎么加密解密字体。
在这里插入图片描述
选中到加密的元素，发现使用的是一个css脚本，点进去发现是

发现对应的是一个eot和一个woff文件，两者取其一就行，woff文件地址为
//s3plus.meituan.net/v1/mss_73a511b8f91f43d0bdae92584ea6330b/font/8063a325.woff（这个地址经常会变的）
下载下来之后可以用市面上各种字体工具打开，我自己使用的是fontLab，打开之后可以发现
在这里插入图片描述厅是ec90没错,，餐也是e7dd。所以这个文件是没错的，接下来就是处理程序的编写了

采集程序

主要使用的是google的typography包，maven引用如下

		<dependency>
			<groupId>com.google.typography.font</groupId>
			<artifactId>sfntly</artifactId>
			<version>0.0.2-SNAPSHOT</version>
		</dependency>

接下来是代码，首先需要将文件载入成Font类型（com.google.typography.font.sfntly.Font）

private Font[] loadFont(String filename) {
        if (!filename.endsWith(".ttf") && !filename.endsWith(".woff")) {
            LOGGER.error("请加载正确的字体文件");
            throw new IllegalArgumentException("请加载正确的字体文件");
        }
        FontFactory fontFactory = FontFactory.getInstance();
        fontFactory.fingerprintFont(true);
        try {
            InputStream inputStream = new FileInputStream(new File(filename));
            if (filename.endsWith(".woff")) {
                inputStream = woff2ttf(inputStream);
            }
            try {
                return fontFactory.loadFonts(inputStream);
            } finally {
                inputStream.close();
            }
        } catch (Exception e) {
            LOGGER.error("", e);
        }
        return null;
    }

其中woff2tff函数的作用是将woff文件的输入流转成tff文件的输入流（可以通过各种方法实现）。然后获得font对象之后再进行以下处理

public class FontDecoder {
private Font standardFont;
    private CMapTable standardCmapTable;
    private LocaTable standardLocaTable;
    private GlyphTable standardGlyphTable;
    private Map<String, List<GlyphWithStr>> feature2glyphsMapForStandard = new HashMap<>();
    private Font tamperedFont;
    private CMapTable tamperedCmapTable;
    private LocaTable tamperedLocaTable;
    private GlyphTable tamperedGlyphTable;
    private Map<String, SimpleGlyph> unicode2glyphMapFortampered = new HashMap<>();

    public FontDecoder(String standardFontFilename, String tamperedFontFilename) {
        setStandardFont(standardFontFilename);
        setTamperedFont(tamperedFontFilename);
    }
    public vo
    id setStandardFont(String filename) {
        Font[] fonts = loadFont(filename);
        standardFont = fonts[0];
        standardCmapTable = standardFont.getTable(Tag.cmap);
        if (standardFont.hasTable(Tag.loca) && standardFont.hasTable(Tag.glyf)) {
            standardLocaTable = standardFont.getTable(Tag.loca);
            standardGlyphTable = standardFont.getTable(Tag.glyf);
        } else {
            throw new IllegalArgumentException("非法字典文件 " + filename);
        }
        buildFeature2glyphsMapForStandard();
    }
    public void setTamperedFont(String filename) {
        tamperedFont = loadFont(filename)[0];
        tamperedCmapTable = tamperedFont.getTable(Tag.cmap);
        tamperedLocaTable = tamperedFont.getTable(Tag.loca);
        tamperedGlyphTable = tamperedFont.getTable(Tag.glyf);
        buildUnicode2glyphMapFortampered();
    }
     
     private void buildFeature2glyphsMapForStandard() {
        for (int i = 0; i <= MAX; i++) {
            SimpleGlyph glyph = getDicSimpleGlyph(i);
            if (glyph != null) {
                String key = generateFeature(glyph);
                if (!feature2glyphsMapForStandard.containsKey(key)) {
                    List<GlyphWithStr> simpleGlyphs = new ArrayList<>();
                    feature2glyphsMapForStandard.put(key, simpleGlyphs);
                }
                GlyphWithStr glyphWithStr = new GlyphWithStr(glyph, Integer.toString(i,16));
                feature2glyphsMapForStandard.get(key).add(glyphWithStr);
            }
        }
    }
    private String generateFeature(SimpleGlyph glyph) {
        List<Integer> pointCounts = new ArrayList<>();
        for (int j = 0; j < glyph.numberOfContours(); j++) {
            pointCounts.add(glyph.numberOfPoints(j));
        }
        return StringUtils.join(pointCounts, ",");
    }
    public String decode(String unicode) {
        SimpleGlyph glyph = unicode2glyphMapFortampered.get(unicode);
        if (glyph != null) {
            String key = generateFeature(glyph);
            List<GlyphWithStr> glyphWithStrs = feature2glyphsMapForStandard.get(key);
            for (GlyphWithStr glyphWithStr : glyphWithStrs) {
                if (compareGlyph(glyph, glyphWithStr.getGlyph())) {
                    return glyphWithStr.getStr();
                }
            }
        }
        return null;
    }
}
public class GlyphWithStr {

    private SimpleGlyph glyph;
    private String str;

    public GlyphWithStr(SimpleGlyph glyph, String str) {
        this.glyph = glyph;
        this.str = str;
    }

    public SimpleGlyph getGlyph() {
        return glyph;
    }

    public void setGlyph(SimpleGlyph glyph) {
        this.glyph = glyph;
    }

    public String getStr() {
        return str;
    }

    public void setStr(String str) {
        this.str = str;
    }
}

其实核心点就在于找到CMapTable，LocaTable和GlyphTable三个table。以上代码的调用方式是先指定两个woff/tff文件，分别是标准字典和使用的字典
在这里插入图片描述
如图，standardFontFilename是我自己用的一个标准的字典的woff文件，其内容如下

而tamperedFontFilename则是目前大众点评所使用的字体文件，其内容如下

而decode则是可以将ec90（下图中的厅的对应编码）转换成f3b9（上图中的厅的对应编码）
在这里插入图片描述
至此整个采集就预研完成了