P2P-torrent&tracker&magnet&dht原理

1. Torrent 文件格式解析

种子文件包含以下数据:

  • announce - trackerURL
  • info - 该条映射到一个字典,该字典的键将取决于共享的一个或多个文件:
    • name - 建议保存到的文件和目录名称
    • piece length - 每个文件块的字节数。通常为{\displaystyle 2^{8}}2^{​{8}} = 256KB = 262144B
    • pieces - 每个文件块的SHA-1的集成Hash。因为SHA-1会返回160-bit的Hash,所以pieces将会得到1个160-bit的整数倍的字符串。和一个length(相当于只有一个文件正在共享)或files(相当于当多个文件被共享):
    • length - 文件的大小(以字节为单位)
    • files - 一个字典的列表(每个字典对应一个文件)与以下的键:
      • path - 一个对应子目录名的字符串列表,最后一项是实际的文件名称
      • length - 文件的大小(以字节为单位)

d8:announce30:http://localhost:8000/announce10:created by14:uTorrent/3.5.513:creation datei1559010849e8:encoding5:UTF-84:infod6:lengthi5533498e4:name11:Charlie.mp312:piece lengthi16384e6:pieces6760:

根据BitTorrent协议,文件发布者会根据要发布的文件生成提供一个种子文件。下载者要下载文件内容,需要先得到相应的种子文件,然后使用BT客户端软件进行下载。

下载时,BT客户端首先解析种子文件得到Tracker地址,然后连接Tracker服务器。

下载者每得到一个块,需要算出下载块的Hash验证码与种子文件中的对比,如果一样则说明块正确,不一样则需要重新下载这个块。这种规定是为了解决下载内容准确性的问题。

1.1、Bencoding编码和种子文件格式

Bencoding编码(简称B编码)在BitTorrent协议中非常常见,从上面的四张图中就可以看到KRPC协议采用了B编码来发送消息,不仅如此,在后面介绍的PEX和BitTorrent扩展中我们也会看到B编码的影子。而且更重要的是,BT种子文件本身就是一个B编码的字典,所以要想学习BitTorrent,首先得学习B编码。B编码是一种非常简洁的数据格式,共支持4种不同的类型:字节串、整数、列表和字典。

  • string: 格式为 <字符串长度>:<字符串>。如 hell: 4:hell
  • integer: 格式为 i<整数>e。如 i1999e 表示数字1999
  • list: 格式为 l[数据1][数据2][数据3][…]e。如 l5:hello5:worldi101ee 表示列表[hello, world, 101]
  • dictionary: 格式为 d[key1][value1][key2][value2][…]e,其中 key 必须是 string 而且按照字母顺序排序。如 d2:aai100e2:bb2:bb2:cci200ee 表示字典 {aa:100, bb:bb, cc:200}

BT种子文件整个是一个dictionary格式,比较重要的key有announce( tracker 服务器的地址)、announce-list(可选的 tracker 服务器地址)、creation date(文件创建时间)、created by(文件创建者)、info(该bt种子文件的文件信息)等。其中info对应的value根据种子包含的是单文件还是多文件有所区别,其中piece length(每一数据块的长度)和pieces(所有数据块的 SHA1 校验值)是公共部分,如果是单文件的话,则包含name(文件名称)和length(文件的长度)两个key,如果是多文件的话,则包含name(文件夹名称)和files(文件列表,每个文件列表下面是包括每一个文件的信息,文件信息是个字典)两个key。

2. 磁力链接 

magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C

磁力链接不基于文档的IP地址或定位符,而是在分布式数据库中,通过散列函数值来识别、搜索来下载文档。因为不依赖一个处于启动状态的主机来下载文档,所以特别适用没有中心服务器的对等网络

力链接最常见的用途是基于文件内容的散列函数值来链接到特定文件,生成一个唯一的文件识别符,类似于ISBN。不像常规的识别符,内容散列可以被任意一位持有此文件的人生成,所以并不需要一个中心机构,这使其在文件共享领域经常被用作搜索条件,因任何人都可以分发一个磁力链接来确保该链接指向的资源就是他想要的,而和得到该资源的方式无关。

磁力链接可以根据torrent文件中的info信息生成。用于DHT协议。DHT先从网络节点中请求torrent信息,再解析torrent信息下载需要的文件。

 

3. Tracker协议

 

3.1 scrape协议 

探测tracker是否有效

request:

/scrape?info_hash=%a1%12%99%25%ff%be%ba%96U-%aeT%22%9d%09J%e7%20%ec%84

response:

files="[{a1129925ffbeba96552dae54229d094ae720ec84 0 0 0}]"

 

3.2 announce协议

1.  create

request:

/announce?info_hash=%a1%12%99%25%ff%be%ba%96U-%aeT%22%9d%09J%e7%20%ec%84&peer_id=-UT355S-%af%b0%fa%c6%ef%fc%d6%aa%27%08%9d.&port=61362&uploaded=0&downloaded=0&left=0&corrupt=0&key=CFC639D5&event=started&numwant=200&compact=1&no_peer_id=1"

response:

compact=true complete=1 interval=30m0s ipv4Peers="[2d5554333535532dafb0fac6effcd6aa27089d2e@[127.0.0.1]:61362]" ipv6Peers="[]" minInterval=15m0s

2. update

request:

/announce?info_hash=%a1%12%99%25%ff%be%ba%96U-%aeT%22%9d%09J%e7%20%ec%84&peer_id=-UT355S-%af%b0%fa%c6%ef%fc%d6%aa%27%08%9d.&port=61362&uploaded=0&downloaded=0&left=0&corrupt=0&key=CFC639D5&numwant=200&compact=1&no_peer_id=1"

response:

compact=true complete=2 interval=30m0s ipv4Peers="[2d5554333535532dafb0fac6effcd6aa27089d2e@[127.0.0.1]:61362]" ipv6Peers="[]" minInterval=15m0s

3. download

request

/announce?info_hash=%a1%12%99%25%ff%be%ba%96U-%aeT%22%9d%09J%e7%20%ec%84&peer_id=-UT355S-%af%b0%fa%c6%ef%fc%d6%aa%27%08%9d.&port=61362&uploaded=0&downloaded=0&left=25226791&corrupt=0&key=75FA86C7&event=started&numwant=200&compact=1&no_peer_id=1"

response:

compact=true complete=0 interval=30m0s ipv4Peers="[2d5554333535532dafb0fac6effcd6aa27089d2e@[127.0.0.1]:61362]" ipv6Peers="[]" minInterval=15m0s

4 remove

request:

/announce?info_hash=u%98%3a%a9%3e%8e%a7%e0%dc%af%09%eef%27%d9%2c%8f3f%5e&peer_id=-UT355S-%af%b0%fa%c6%ef%fc%d6%aa%27%08%9d.&port=61362&uploaded=0&downloaded=0&left=0&corrupt=0&key=86AF3165&event=stopped&numwant=0&compact=1&no_peer_id=1"

response:

compact=true complete=2 interval=30m0s ipv4Peers="[2d5554333535532dafb0fac6effcd6aa27089d2e@[127.0.0.1]:61362]" ipv6Peers="[]" minInterval=15m0s

Tracker Request Parameters

The parameters used in the client->tracker GET request are as follows:

  • info_hash: urlencoded 20-byte SHA1 hash of the value of the info key from the Metainfo file. Note that the value will be a bencoded dictionary, given the definition of the info key above.
  • peer_id: urlencoded 20-byte string used as a unique ID for the client, generated by the client at startup. This is allowed to be any value, and may be binary data. There are currently no guidelines for generating this peer ID. However, one may rightly presume that it must at least be unique for your local machine, thus should probably incorporate things like process ID and perhaps a timestamp recorded at startup. See peer_id below for common client encodings of this field.
  • port: The port number that the client is listening on. Ports reserved for BitTorrent are typically 6881-6889. Clients may choose to give up if it cannot establish a port within this range.
  • uploaded: The total amount uploaded (since the client sent the 'started' event to the tracker) in base ten ASCII. While not explicitly stated in the official specification, the concensus is that this should be the total number of bytes uploaded.
  • downloaded: The total amount downloaded (since the client sent the 'started' event to the tracker) in base ten ASCII. While not explicitly stated in the official specification, the consensus is that this should be the total number of bytes downloaded.
  • left: The number of bytes this client still has to download in base ten ASCII. Clarification: The number of bytes needed to download to be 100% complete and get all the included files in the torrent.
  • compact: Setting this to 1 indicates that the client accepts a compact response. The peers list is replaced by a peers string with 6 bytes per peer. The first four bytes are the host (in network byte order), the last two bytes are the port (again in network byte order). It should be noted that some trackers only support compact responses (for saving bandwidth) and either refuse requests without "compact=1" or simply send a compact response unless the request contains "compact=0" (in which case they will refuse the request.)
  • no_peer_id: Indicates that the tracker can omit peer id field in peers dictionary. This option is ignored if compact is enabled.
    GET /announce?info_hash=%fc~6%f2%d01d%8e%f3%cd%dd%a0%1f%f7%3a%9d%ffH%cd%e3&
    peer_id=-UT3480-P%a6%93%02%b4%40%27%9b%60%e9A%ed&
    port=20111&uploaded=0&
    downloaded=0&
    left=0&
    corrupt=0&
    key=10E0CE47&
    event=started&
    numwant=200&
    compact=1&
    no_peer_id=1&
    ip=192.168.43.188

Tracker Response

The tracker responds with "text/plain" document consisting of a bencoded dictionary with the following keys:

  • failure reason: If present, then no other keys may be present. The value is a human-readable error message as to why the request failed (string).
  • warning message: (new, optional) Similar to failure reason, but the response still gets processed normally. The warning message is shown just like an error.
  • interval: Interval in seconds that the client should wait between sending regular requests to the tracker
  • min interval: (optional) Minimum announce interval. If present clients must not reannounce more frequently than this.
  • tracker id: A string that the client should send back on its next announcements. If absent and a previous announce sent a tracker id, do not discard the old value; keep using it.
  • complete: number of peers with the entire file, i.e. seeders (integer)
  • incomplete: number of non-seeder peers, aka "leechers" (integer)
  • peers: (dictionary model) The value is a list of dictionaries, each with the following keys:
    • peer id: peer's self-selected ID, as described above for the tracker request (string)
    • ip: peer's IP address either IPv6 (hexed) or IPv4 (dotted quad) or DNS name (string)
    • port: peer's port number (integer)
  • peers: (binary model) Instead of using the dictionary model described above, the peers value may be a string consisting of multiples of 6 bytes. First 4 bytes are the IP address and last 2 bytes are the port number. All in network (big endian) notation.
  
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值