Torrent文件的格式分析与解析器实现

YASinger

已于 2023-09-22 23:49:24 修改

阅读量1.8k

点赞数 2

文章标签： java 网络协议数据分析

于 2023-09-22 23:37:10 首次发布

本文链接：https://blog.csdn.net/YASinger/article/details/133150096

版权

1.Torrent简介

BitTorrent协议的种子文件（英语：Torrent file）可以保存一组文件的元数据。这种格式的文件被BitTorrent协议所定义。扩展名一般为“.torrent”。

.torrent种子文件本质上是，包含Tracker信息和文件信息两部分。Tracker信息主要是BT下载中需要用到的Tracker服务器的地址和针对Tracker服务器的设置，文件信息是根据对目标文件的计算生成的，计算结果根据BitTorrent协议内的Bencode规则进行编码。它的主要原理是需要把提供下载的文件虚拟分成大小相等的块，块大小必须为2k的整数次方（由于是虚拟分块，硬盘上并不产生各个块文件），并把每个块的索引信息和Hash验证码写入种子文件中；所以，种子文件就是被下载文件的“索引”。

2.Torrent文件编码

Torrent文件以Bencoding编码进行存储,Bencoding有4种数据类型:字符串string,整数integer,列表list和字典dictionary，其中列表和字典为复合数据类型。

1.string：字符串编码: <字符串长度>:<字符串>。
如，"hello"编码：5:hello

2. integer：整数编码: i<整数>e。

如，1234编码：i1234e

3. list：列表编码: l[数据1][数据2][数据3][…]e，[数据]可为任意四种类型之一。

如，["hello","world",1234]
1. "hello"编码：5:hello
2. "world"编码：5:world
3. 1234编码：i1234e
4. 最终编码：l5:hello5:worldi1234ee

4. dictionary：字典编码: d[key1][value1][key2][value2][…]e，其中[key]必须是string而且按照字母顺序排序，[value]可为任意四种类型之一。

如，{"name":"jisen","coin":"btc","balance":1000}
1. "name":"jisen"编码：4:name5:jisen
2. "coin":"btc"编码：4:coin3:btc
3. "balance":1000编码：7:balancei1000e
4. 最终编码，按key的字母排序：d7:balancei1000e4:coin3:btc4:name5:jisene

3.Bencoding解码器编写

了解了Bencoding编码，那么我们使用代码对Torrent文件解析就很简单了。我们只需要读取种子字节流，判断具体是哪种类型并进行相应转换即可。其中，fileData[]为torrent文本流，offset为偏移量。

字符串解码

private String readString(){
        int length = 0;
        StringBuilder string = new StringBuilder();
        while (fileData[offset] != ':') {
            length = length * 10 + fileData[offset] - '0';
            offset++;
        }
        offset++;  //循环结束后offset偏移至':'，应再偏移一位
        while (length > 0) {
            string.append((char) fileData[offset]);
            length--;
            offset++;
        }
        return string.toString();    //字符串编码无后缀，无需偏移
    }

整数解码

private String readInt(){
        StringBuilder string = new StringBuilder();
        while (fileData[offset] != 'e') {
            string.append((char) fileData[offset]);
            offset++;
        }
        offset++;   //循环结束后offset偏移至'e'，应再偏移一位
        return string.toString();
    }

列表解码

private List<Object> readList(){
        List<Object> list =new ArrayList<>();
        while(fileData[offset] != 'e'){
            switch (fileData[offset]){
                case 'l':
                    offset++;
                    list.add(readList());
                    break;
                case 'd':
                    offset++;
                    list.add(readMap());
                    break;
                case 'i':
                    offset++;
                    list.add(readInt());
                    break;
                case '9':
                case '8':
                case '7':
                case '6':
                case '5':
                case '4':
                case '3':
                case '2':
                case '1':
                case '0':       //字符串解码需要字符串长度，应保留第一位数字
                    list.add(readString());
                    break;
            }
        }
        offset++;
        return list;
    }

字典解码

由于字典编码d[key1][value1][key2][value2][…]e要求[key]必须是字符串string，因此字典中解码出字符串可能为[key]或[value]，因此用key做标志并缓存字符串：若key为null，则字符串为[key]，值赋给key，若key不为null，则字符串为[value]，将[key][字符串]加入字典。

private Map<String, Object> readMap(){
        Map<String, Object> map = new HashMap<>();
        // key为null时，字符串为键，否则为值
        String key = null;
        // 读取到第一个'e'为止
        while(offset != length && fileData[offset] != 'e') {
            switch (fileData[offset]) {
                case 'l' :
                    offset++;
                    map.put(key, readList());
                    key = null;
                    break;
                case 'd' :
                    offset++;
                    map.put(key, readMap());
                    key = null;
                    break;
                case 'i' :
                    offset++;
                    map.put(key, readInt());
                    key = null;
                    break;
                case '9':
                case '8':
                case '7':
                case '6':
                case '5':
                case '4':
                case '3':
                case '2':
                case '1':
                case '0':       //字符串解码需要字符串长度，应保留第一位数字
                    String data = readString();
                    // key为null时，字符串为键，否则为值
                    if (key == null) {
                        key = data;
                    } else {
                        map.put(key, data);
                        key = null;
                    }
                    break;
            }
        }
        offset++;
        return map;
    }

4.Torrent文件结构

torrent文件以字典结构存储信息，结构如下：

{

"announce":[字符串]tracker服务器的URL ,

"announce-list"(可选):[列表]备用tracker服务器列表,

"creation date"(可选):[整数]种子创建的时间，Unix标准时间格式，从1970年1月1日 00:00:00到创建时间的秒数,

"comment"(可选):[字符串]备注,

"created by"(可选):[字符串]创建人或创建程序的信息,

"info":[字典]包含文件的主要信息，为分二种情况：单文件结构或多文件结构

}

其中单文件结构info如下：

"info":{

"name":[字符串]文件名,

"length":[整数]以字节为单位的文件长度，

"piece length":[整数]以字节为单位的块大小,

"pieces":[字符串]每个块的SHA1 Hash值

}

多文件结构info：

"info":{

"name":[字符串]目录名,

"files":[字典]{

"length":[整数]以字节为单位的文件长度,

"path":[列表]文件路径+文件名，如["路径1","路径2","文件名"],

"piece length":[整数]以字节为单位的块大小,

"pieces":[字符串]每个块的SHA1 Hash值

}

5.Torrent文件解析

经过Bencoding解码，我们最终获得一个字典。从字典中取出"announce"、"info"等对应的key值并根据Torrent稍作解析即可。

    private void analyze(Map<String, Object> torrentFile){
        announce = (String) torrentFile.get("announce");
        announceList = new ArrayList<>();
        List<List<String>> announceListL = (List<List<String>>) torrentFile.get("announce-list");
        if (announceListL != null && !announceListL.isEmpty()) {
            announceListL.forEach(announceL->{
                announceList.add(announceL.get(0));
            });
        }

        comment = (String) torrentFile.get("comment");
        creatBy = (String) torrentFile.get("creat by");

        Map<String,Object> info = (Map<String, Object>) torrentFile.get("info");
        pieces = (String) info.get("pieces");
        name = (String) info.get("name");
        pieceLength = Long.parseLong((String) info.get("piece length"));

        List<Map<String,Object>> filesMap = (List<Map<String, Object>>) info.get("files");
        if (filesMap == null || filesMap.isEmpty()) {
            Length = Long.parseLong((String) info.get("length"));
            flag = "单文件";
            return;
        } else {
            flag = "多文件";
        }
        Length = 0L;
        files = new ArrayList<>();
        filesMap.forEach(fileMap->{
            Length += Long.parseLong((String) fileMap.get("length"));
            files.add(new File((List<String>)fileMap.get("path"), (String) fileMap.get("length")));
        });
    }

6.Magnet生成

一个普通的磁力链接如下：

magnet:xt=urn:btih:9bf6814dcd6b7de3afd71deeb7369fd8d8eed6b7&tr=udp://open.demonii.com:1337/announce

其中，"xt"为magnet协议名xtexact topic的缩写；"btih"（BitTorrent Info Hash）表示哈希方法名；"tr"为tracker的缩写，表示tracker服务器的地址。40位十六进制即为Torrent文件中，info对应的字典的SHA1哈希值（Hex）。

根据上述可得到：

magnet = 'magnet:?xt=urn:btih:'+Hex(Sha1(info))+'&tr='+encode(announce)

如何取得info呢，只需在字典解码进入key="info"时记录起始偏移值，退出时记录结束偏移值，将两个偏移值之间的数据带入SHA1函数计算即可。由于"info"对应的值为字典，所以只需在字典解码函数中更改字典对应的处理(即case 'd':)即可：

case 'd' :
    offset++;
    if(key != null && key.equals("info")) startInfo = offset;
    map.put(key, readMap());
    if(key != null && key.equals("info")) endInfo = offset;
    key = null;
    break;

一个sha1计算函数：

 public static String getSha1(byte[] Byte) {
        char hexDigits[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
        try {
            MessageDigest mdTemp = MessageDigest.getInstance("SHA1");
            mdTemp.update(Byte);
            byte[] md = mdTemp.digest();
            int j = md.length;
            char buf[] = new char[j * 2];
            int k = 0;
            for (int i = 0; i < j; i++) {
                byte byte0 = md[i];
                buf[k++] = hexDigits[byte0 >>> 4 & 0xf];
                buf[k++] = hexDigits[byte0 & 0xf];
            }
            return new String(buf);
        } catch (Exception e) {
            return null;
        }
    }

最后进行简单的字符串拼接即可得到magnet

public String getMagnet() {
        return "magnet:?xt=urn:btih:"+hash+"&tr="+announce;
    }

7.具体实现

项目地址

https://github.com/YASinger/torrentAnalyser

依赖jar包

https://github.com/YASinger/torrentAnalyser/releases

调用方式：

 public void test1() throws Exception {
        TorrentFile torrentFile = new TorrentFile("G:/文件/Hanamonogatari1080p.torrent");
        System.out.println(torrentFile.getHash());
        System.out.println(torrentFile.getMagnet());
        System.out.println(torrentFile.getName());
        System.out.println(torrentFile.getComment());
        System.out.println(torrentFile.getFlag());
        System.out.println(torrentFile.getCreatBy());
        System.out.println(torrentFile.getLength());
        System.out.println(torrentFile.getFiles());
        System.out.println(torrentFile.getAnnounce());
        System.out.println(torrentFile.getAnnounceList());
    }