作者简介
王振威,CODING 创始团队成员之一,多年系统软件开发经验,擅长 Linux,Golang,Java,Ruby,Docker 等技术领域,近两年来一直在 CODING 从事系统架构和运维工作
前言
最近 Google 发布了一篇文章,描述了对 Git 的一个传输协议的更新,引起了国内技术圈的不小规模的轰动(相关文章请自行百度“Git v2 性能提升”)。
很多技术圈的朋友也在转载这个新闻,那至于性能改进有多大,里面的细节是什么呢?事实上这次改动只在极端情况下有性能提升,绝大多数情况下,用户感受不到性能的提升。很多不明所以的转发大概是因为 Google 的品牌效应吧 :)
Git 是什么?
为了讲清楚 why,我们先来简单介绍一下 Git 相关的协议。如果你还不了解 Git,想了解更多内容,可参考其官方网站:http://git-scm.com/ . 也可来 https://coding.net/help/doc/git 这里了解如何在国内使用优质快速的 Git 托管服务。
Git 传输协议
Git 常见的有三种协议,SSH,HTTP(S),Git,使用最广泛的是前两种。
让我们来看一下, HTTP(S) 和 SSH 协议的使用示例
git clone https://git.coding.net/wzw/coding-demo.git
Cloning into 'coding-demo'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
git clone git@git.coding.net:wzw/coding-demo.git
Cloning into 'coding-demo'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
可以看到,对于全新 clone 来讲两者基本上的过程是一模一样的。
事实上, Git 底层对于各种应用层协议的底层处理是一致的,不管是 HTTP(S) 还是 SSH 还是 Git 协议。
让我们来进一步看一下, Git 在传输过程中都做了什么。
GIT_TRACE=1 GIT_TRACE_PACKET=1 git clone https://git.coding.net/wzw/coding-demo.git
17:48:21.767799 git.c:344 trace: built-in: git 'clone' 'https://git.coding.net/wzw/coding-demo.git'
Cloning into 'coding-demo'...
17:48:21.797959 run-command.c:626 trace: run_command: 'git-remote-https' 'origin' 'https://git.coding.net/wzw/coding-demo.git'
17:48:22.278880 pkt-line.c:80 packet: git< # service=git-upload-pack
17:48:22.279390 pkt-line.c:80 packet: git< 0000
17:48:22.279405 pkt-line.c:80 packet: git< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2.15.0
17:48:22.279419 pkt-line.c:80 packet: git< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.279431 pkt-line.c:80 packet: git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.279442 pkt-line.c:80 packet: git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.279453 pkt-line.c:80 packet: git< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.279472 pkt-line.c:80 packet: git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:48:22.279483 pkt-line.c:80 packet: git< 0000
17:48:22.280959 pkt-line.c:80 packet: git> fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.280986 pkt-line.c:80 packet: git> fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.280999 pkt-line.c:80 packet: git> 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.281011 pkt-line.c:80 packet: git> 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.281023 pkt-line.c:80 packet: git> 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.281033 pkt-line.c:80 packet: git> 0000
17:48:22.281089 run-command.c:626 trace: run_command: 'fetch-pack' '--stateless-rpc' '--stdin' '--lock-pack' '--thin' '--check-self-contained-and-connected' '--cloning' 'https://git.coding.net/wzw/coding-demo.git/'
17:48:22.287860 git.c:344 trace: built-in: git 'fetch-pack' '--stateless-rpc' '--stdin' '--lock-pack' '--thin' '--check-self-contained-and-connected' '--cloning' 'https://git.coding.net/wzw/coding-demo.git/'
17:48:22.288761 pkt-line.c:80 packet: fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288799 pkt-line.c:80 packet: fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288824 pkt-line.c:80 packet: fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.288838 pkt-line.c:80 packet: fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.288851 pkt-line.c:80 packet: fetch-pack< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.288863 pkt-line.c:80 packet: fetch-pack< 0000
17:48:22.288876 pkt-line.c:80 packet: fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2.15.0
17:48:22.288901 pkt-line.c:80 packet: fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288914 pkt-line.c:80 packet: fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.288927 pkt-line.c:80 packet: fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.288941 pkt-line.c:80 packet: fetch-pack< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.288955 pkt-line.c:80 packet: fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:48:22.288967 pkt-line.c:80 packet: fetch-pack< 0000
17:48:22.289909 pkt-line.c:80 packet: fetch-pack> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:48:22.289924 pkt-line.c:80 packet: fetch-pack> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:48:22.290081 pkt-line.c:80 packet: fetch-pack> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:48:22.290094 pkt-line.c:80 packet: fetch-pack> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e
17:48:22.290103 pkt-line.c:80 packet: fetch-pack> 0000
17:48:22.290127 pkt-line.c:80 packet: fetch-pack> done
17:48:22.290257 pkt-line.c:80 packet: fetch-pack> 0000
17:48:22.290290 pkt-line.c:80 packet: git< 00a8want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)0032want 1536ad10fc0a188c50680932ca191c8da46938c40032want 1536ad10fc0a188c50680932ca191c8da46938c40032want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e00000009done
17:48:22.290375 pkt-line.c:80 packet: git< 0000
17:48:22.436811 pkt-line.c:80 packet: fetch-pack< NAK
17:48:22.436844 pkt-line.c:80 packet: fetch-pack> 0000
17:48:22.437152 pkt-line.c:80 packet: sideband< \2Counting objects: 7, done.
remote: Counting objects: 7, done.
17:48:22.437185 pkt-line.c:80 packet: sideband< \2Compressing objects: 25% (1/4) \15
17:48:22.437200 pkt-line.c:80 packet: sideband< \2Compressing objects: 50% (2/4) \15
17:48:22.437250 pkt-line.c:80 packet: sideband< \2Compressing objects: 75% (3/4) \15
17:48:22.437279 pkt-line.c:80 packet: sideband< \2Compressing objects: 100% (4/4) \15
17:48:22.437302 pkt-line.c:80 packet: sideband< \2Compressing objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
17:48:22.447214 pkt-line.c:80 packet: git< 0000
17:48:22.447201 pkt-line.c:80 packet: sideband< PACK ...
17:48:22.447316 pkt-line.c:80 packet: sideband< \2Total 7 (delta 0), reused 0 (delta 0)
remote: Total 7 (delta 0), reused 0 (delta 0)
17:48:22.447363 pkt-line.c:80 packet: sideband< 0000
17:48:22.447372 run-command.c:626 trace: run_command: 'unpack-objects' '--pack_header=2,7'
17:48:22.453090 git.c:344 trace: built-in: git 'unpack-objects' '--pack_header=2,7'
Unpacking objects: 100% (7/7), done.
17:48:22.460604 run-command.c:626 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
17:48:22.464831 git.c:344 trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
GIT_TRACE=1 GIT_TRACE_PACKET=1 git clone git@git.coding.net:wzw/coding-demo.git
17:49:18.654786 git.c:344 trace: built-in: git 'clone' 'git@git.coding.net:wzw/coding-demo.git'
Cloning into 'coding-demo'...
17:49:18.669187 run-command.c:626 trace: run_command: 'ssh' 'git@git.coding.net' 'git-upload-pack '\''wzw/coding-demo.git'\'''
17:49:19.768942 pkt-line.c:80 packet: clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/2.15.0
17:49:19.772436 pkt-line.c:80 packet: clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:49:19.772527 pkt-line.c:80 packet: clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:49:19.772549 pkt-line.c:80 packet: clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:49:19.772566 pkt-line.c:80 packet: clone< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:49:19.772863 pkt-line.c:80 packet: clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:49:19.772910 pkt-line.c:80 packet: clone< 0000
17:49:19.776185 pkt-line.c:80 packet: clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:49:19.776215 pkt-line.c:80 packet: clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1
17:49:19.776224 pkt-line.c:80 packet: clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776232 pkt-line.c:80 packet: clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776239 pkt-line.c:80 packet: clone> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e
17:49:19.776246 pkt-line.c:80 packet: clone> 0000
17:49:19.776262 pkt-line.c:80 packet: clone> done
17:49:19.879841 pkt-line.c:80 packet: clone< NAK
17:49:19.880083 run-command.c:626 trace: run_command: 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 75332 on wangzheweideMBP' '--check-self-contained-and-connected'
17:49:19.885280 git.c:344 trace: built-in: git 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 75332 on wangzheweideMBP' '--check-self-contained-and-connected'
17:49:19.889021 pkt-line.c:80 packet: sideband< \2Counting objects: 7, done.
remote: Counting objects: 7, done.
17:49:19.895119 pkt-line.c:80 packet: sideband< \2Compressing objects: 25% (1/4) \15Compressing objects: 50% (2/4) \15Compressing objects: 75% (3/4) \15Compressing objects: 10
17:49:19.895170 pkt-line.c:80 packet: sideband< \20% (4/4) \15
17:49:19.897621 pkt-line.c:80 packet: sideband< \2Compressing objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
17:49:19.914866 pkt-line.c:80 packet: sideband< PACK ...
17:49:19.914916 pkt-line.c:80 packet: sideband< \2Total 7 (delta 0), reused 0 (delta 0)
remote: Total 7 (delta 0), reused 0 (delta 0)
17:49:19.914936 pkt-line.c:80 packet: sideband< 0000
Receiving objects: 100% (7/7), done.
17:49:20.088640 run-command.c:626 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
17:49:20.093965 git.c:344 trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
我使用了 GIT_TRACE=1 GIT_TRACE_PACKET=1 环境变量来让 Git 打印出 clone 过程中的更多信息,方便调试。而且我们发现,HTTPS 和 SSH 协议,Git 底层调用了不同的命令,但是内容的交互过程却是极为相似。
简而言之,整个 Clone 交互的协议过程大致如下:
- 客户端向远端声明自己要进行的操作 -- git-upload-pack (所有读取性质的操作都是这个)
- 服务端返回自己兼容的协议格式以及推荐的 ref 列表
- 客户端声明自己想要接收的对象列表
- 服务器端计算需要传输的所有对象并压缩而且将对象传输至客户端
- 客户端解压对象,校验对象
- 客户端更新本地 ref (此步骤在上述详细过程中未有体现,可看本文最后的 fetch 过程中体现出的 ref 更新)
要想理解这个协议的传输过程,需要对 Git 的底层数据存储原理有一个基本了解,这里稍微做下科普。
Git 有一个说法是:Git 是一个带历史追溯功能的内容寻址系统。听起来貌似比较抽象,但是实际上是很容易理解的,Git 底层对于所有版本控制内容的存储分为对象(Object)和引用(Ref)。对象(文件,提交,目录等等)就是存储的实际的数据,引用(分支,标签等等)就是指针。
对象一览:
我们可以通过 git cat-file -p 来查看一个对象的基本信息。
git cat-file -p fdacba1d541c75bd48f2cd742ee18f77ea3517a1
tree ae0532862af27ecd131a7f792c9156624783d562
parent 1536ad10fc0a188c50680932ca191c8da46938c4
author wzw <wangzhenwei@coding.net> 1526896089 +0800
committer wzw <wangzhenwei@coding.net> 1526896089 +0800
update README.md
可以看到, fdacba1d541c75bd48f2cd742ee18f77ea3517a1 这个对象是一个提交对象,这里列出了他依赖了父提交 1536ad10fc0a188c50680932ca191c8da46938c4 和目录树文件 ae0532862af27ecd131a7f792c9156624783d562 以及他对应的提交作者信息和提交描述
我们可以追随引用再看下他的父提交
git cat-file -p 1536ad10fc0a188c50680932ca191c8da46938c4
tree f7aa6821aa977f65dc987fe6d6838790371f3d90
author wzw <wangzhenwei@coding.net> 1526895383 +0800
committer wzw <wangzhenwei@coding.net> 1526895383 +0800
Initial commit
他的父提交则是依赖目录树文件 f7aa6821aa977f65dc987fe6d6838790371f3d90 .
我们来看下目录树文件:
git cat-file -p f7aa6821aa977f65dc987fe6d6838790371f3d90
100644 blob 3aed7e951e0457a2784ff6cd009412e07a09e362 README.md
可以看到目录下有一个 blob 对象, ID 是 3aed7e951e0457a2784ff6cd009412e07a09e362, 我们来看一下它:
git cat-file -p 3aed7e951e0457a2784ff6cd009412e07a09e362
#coding-demo
我们可以看到,这个内容是 README.md 文件的第一个版本内容,即其内容对应了 1536ad10fc0a188c50680932ca191c8da46938c4 这个版本。
总体下来, Git 的内部存储结构是这样的:
好,基础知识补充完毕,有没有发现火爆的区块链在技术层面上跟 Git 的存储是有相似之处的 :)
在 Clone 过程中,服务器端首先会推荐给客户端一些 ref 列表,这也是 Git v2 协议号称的性能改进的地方,后文有解释。
像这样:
17:49:19.772436 pkt-line.c:80 packet: clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:49:19.772527 pkt-line.c:80 packet: clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:49:19.772549 pkt-line.c:80 packet: clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:49:19.772566 pkt-line.c:80 packet: clone< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:49:19.772863 pkt-line.c:80 packet: clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
很显然,上文中的 40 位16进制数字就是对应后面的 ref 指向的对象 ID。
而客户端,只需要依据自己感兴趣的 ref 和自己本地已经存在的对象库(对于 pull 和 fetch 来讲,本地有对象库,对于 clone 来讲本地还没有对象库,那么他就是需要所有的感兴趣的对象)。
在客户端计算完毕自己感兴趣的对象列表后,会用 want 指令告诉远端服务器。
17:49:19.776185 pkt-line.c:80 packet: clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:49:19.776215 pkt-line.c:80 packet: clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1
17:49:19.776224 pkt-line.c:80 packet: clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776232 pkt-line.c:80 packet: clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776239 pkt-line.c:80 packet: clone> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e
如果客户端执行的是 pull 或者 fetch ,他还会告诉远端自己已经有了什么对象(在文章的后面,我们会补充一段专门说明此点)。
远端服务器会根据客户端想要的对象以及客户端已经有的对象并对比自身的对象库和对象依赖关系,将客户端必须的对象整理起来并打包压缩传给客户端。
客户端收到对象包后,解包并校验对象,并更新引用的对应指向。
Google 在 Protocol version 2 做了什么
完整的 version 2 的协议说明在这里: https://www.kernel.org/pub/so...
这里我们对其做的主要改动做些说明,主要有三点:
- 服务端引用过滤
- 新特性的易扩展性升级(例如可声明想要什么 ref)
- 简化的客户端 HTTP 协议处理
被很多标题党夸大其词的主要是其第一点:服务端引用过滤。
Google 官方的博客中对此段的描述是这样的:
The main motivation for the new protocol was to enable server side filtering of references (branches and tags). Prior to protocol v2, servers responded to all fetch commands with an initial reference advertisement, listing all references in the repository. This complete listing is sent even when a client only cares about updating a single branch, e.g.:git fetch origin master
. For repositories that contain 100s of thousands of references (the Chromium repository has over 500k branches and tags) the server could end up sending 10s of megabytes of data that get ignored. This typically dominates both time and bandwidth during a fetch, especially when you are updating a branch that's only a few commits behind the remote, or even when you are only checking if you are up-to-date, resulting in a no-op fetch.We recently rolled out support for protocol version 2 at Google and have seen a performance improvement of 3x for no-op fetches of a single branch on repositories containing 500k references. Protocol v2 has also enabled a reduction of 8x of the overhead bytes (non-packfile) sent from googlesource.com servers. A majority of this improvement is due to filtering references advertised by the server to the refs the client has expressed interest in.
本着实事求是,便利读者的精神,我把这段文字翻译成了中文,如下:
新协议最激动人心的是启用了服务器端过滤引用(分支和标签)。在 V2 协议之前,服务器对于所有 fetch 命令都以一个初始化的建议引用列表作为响应,这会列出仓库中的所有引用。甚至在客户端只关心他想要更新的那一个分支的时候(例如 git fetch origin master)时,引用列表也会被完整地发送到客户端。这对于那些有几十万个引用(Chromium 的源码仓库超过 50万个分支和标签),服务器可能要发送很多客户端完全忽略掉的内容,这很显然对时间和带宽是一个毫无意义的浪费,尤其是对于那些更新一个只落后于远端几个提交或者你本地的分支本就是最新的,只是执行这个检查更新过程。
我们最近在 Google 做出了 v2 版本的协议,这使得在一个有50万引用的仓库上更新单个分支的性能有了三倍的提升。这也将 googlesource.com 的非 pack 文件的额外数据传输降低了8倍。这个提升主要是得益于服务器端可以根据客户端声明的感兴趣的引用来过滤引用列表。
读到这里,很多人已经看明白了,原文说的很清楚,性能提升只是在客户端跟服务器端通信时的第一步,服务器端可以不必发送所有的 ref 列表。这在一些极端场景下(有几十万分支和标签的仓库),在这个步骤有显著的性能提示。
而事实上,大多数 Git 仓库都不会有这么多 ref,拿示例项目 git@git.coding.net:wzw/coding-demo.git 来说,这个过程的执行是非常快的:
time git ls-remote git@git.coding.net:wzw/coding-demo.git
fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD
fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
real 0m0.103s
user 0m0.020s
sys 0m0.004s
执行过程很快,约耗时 100ms,这还是包含了 SSH 协议链接建立,认证,数据传输等过程一起,对于这个过程而言,耗时主要是花在网络链接,认证过程中,Git 列出引用列表的过程并不是性能瓶颈。
拿 Coding 官方的主开发代码仓库来说,目前有 2000+ 标签,500+ 分支, 还有约 5000 个合并请求建立的隐藏引用。考虑到 Coding 对仓库有定期 gc,所以有 packed-refs 文件的存在,这个读取和发送过程的确开始变得慢了,但是还是在可接受范围。
time git ls-remote git@e.coding.net:codingcorp/coding-dev.git
// 中间隐藏去了几千行
5708bacfe2c2510efd0bbb0b4be8268f2a171747 refs/tags/private-1.2
dec93b8774f90c4660bbe8b3759b6d59db30ee45 refs/tags/private-1.2^{}
5ddcbab95eedc1664ac131cddfc51a5d265446ce refs/tags/release/20160927.1
a91709b7bf08c00fb0b2319aedf999ca7e636109 refs/tags/release/20160927.1^{}
476075fd8442e76d02264f0b109bb2afcb6d39a1 refs/tags/repo-manager-20161118.1
ce29fb126a27f58de555badeb33838d6a3dde8eb refs/tags/repo-manager-20161118.1^{}
30739025962d6e788f1542841aa509422810853e refs/tags/test-tag-20180308.1
1a7b7474257badeca9fa0c15204bf5769f42b33a refs/tags/test-tag-20180308.1^{}
real 0m1.677s
user 0m0.032s
sys 0m0.052s
而 CODING 主开发仓库的单次全新 Clone 的总传输数据量在 550M 左右,以较好的网络带宽,clone 仓库可达 5MB/s 算,也要 110 秒才能全部传输完毕,而这前置的 1.677秒就显得非常微不足道。
这样算来,Google 的这次改动确实给一些大仓库(尤其是一些引用数量特别多的仓库)在一些特定场景下有了一些优化,并算不上是国内的一些媒体夸大其词的大幅性能提升。从传输过程来看,Git 主要的对象依赖关系计算,对象声明协议格式,传输过程并没有改变。其号称节省了8倍数据量的非 pack 数据的传输量只占总传输量很小的比例, 总体算下来其确实节省了数据传输量,但是还远远无法达到大幅提升。
当然,我们仍然要感谢 Googler 对于开源的贡献仍然值得我们赞赏。看过此文,希望大家能以一个严谨的态度面对技术,不要人云亦云,Talk is cheap, show me your code!
再扯几句
PS:说起 Git 性能的大幅提升,历史上 Google 工程师在开发 JGit 的时候,贡献过一个 bitmap 索引理念给 Git,使得 Git 在做对象关系依赖解析的时候可以使用少量的空间节省大量的树节点遍历,这才是真正性能大幅提升的改进,目前 bitmap index 已经是 Git 新版本默认携带的一个功能了,下次有机会再将其原理分享给大家。
PS2: Git 协议中还有很多其他特性,这里为了讲明本文要点,文中没有提及其他特性。
PS3:Git 传输协议中对于本地已经有的对象的声明(have 指令)
GIT_TRACE=1 GIT_TRACE_PACKET=1 git fetch origin master
19:58:08.432172 git.c:344 trace: built-in: git 'fetch' 'origin' 'master'
19:58:08.438917 run-command.c:626 trace: run_command: 'ssh' 'git@git.coding.net' 'git-upload-pack '\''wzw/coding-demo.git'\'''
Warning: Permanently added the RSA host key for IP address '123.59.85.127' to the list of known hosts.
19:58:09.634163 pkt-line.c:80 packet: fetch< 8dccad22648e94c52335a7266c7cff5d947c9532 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/2.15.0
19:58:09.641777 pkt-line.c:80 packet: fetch< 8dccad22648e94c52335a7266c7cff5d947c9532 refs/heads/master
19:58:09.641846 pkt-line.c:80 packet: fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
19:58:09.641872 pkt-line.c:80 packet: fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
19:58:09.641891 pkt-line.c:80 packet: fetch< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
19:58:09.641903 pkt-line.c:80 packet: fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
19:58:09.641913 pkt-line.c:80 packet: fetch< 0000
19:58:09.642105 run-command.c:626 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
19:58:09.655120 pkt-line.c:80 packet: fetch> want 8dccad22648e94c52335a7266c7cff5d947c9532 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
19:58:09.655157 pkt-line.c:80 packet: fetch> 0000
19:58:09.655190 pkt-line.c:80 packet: fetch> have fdacba1d541c75bd48f2cd742ee18f77ea3517a1
19:58:09.655207 pkt-line.c:80 packet: fetch> have 1536ad10fc0a188c50680932ca191c8da46938c4
19:58:09.655221 pkt-line.c:80 packet: fetch> done
19:58:09.975282 pkt-line.c:80 packet: fetch< ACK fdacba1d541c75bd48f2cd742ee18f77ea3517a1 common
19:58:09.975382 pkt-line.c:80 packet: fetch< ACK 1536ad10fc0a188c50680932ca191c8da46938c4 common
19:58:09.975404 pkt-line.c:80 packet: fetch< ACK 1536ad10fc0a188c50680932ca191c8da46938c4
19:58:09.975728 pkt-line.c:80 packet: sideband< \2Counting objects: 3, done.
remote: Counting objects: 3, done.
19:58:09.975763 pkt-line.c:80 packet: sideband< \2Compressing objects: 50% (1/2) \15Compressing objects: 100% (2/2) \15
19:58:09.975798 pkt-line.c:80 packet: sideband< \2Compressing objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
19:58:10.065650 pkt-line.c:80 packet: sideband< PACK ...
19:58:10.065707 pkt-line.c:80 packet: sideband< \2Total 3 (delta 0), reused 0 (delta 0)
remote: Total 3 (delta 0), reused 0 (delta 0)
19:58:10.065714 run-command.c:626 trace: run_command: 'unpack-objects' '--pack_header=2,3'
19:58:10.065741 pkt-line.c:80 packet: sideband< 0000
19:58:10.071004 git.c:344 trace: built-in: git 'unpack-objects' '--pack_header=2,3'
Unpacking objects: 100% (3/3), done.
19:58:10.317201 run-command.c:626 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
19:58:10.322159 git.c:344 trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
From git.coding.net:wzw/coding-demo
* branch master -> FETCH_HEAD
fdacba1..8dccad2 master -> origin/master
19:58:10.328515 run-command.c:1452 run_processes_parallel: preparing to run up to 1 tasks
19:58:10.328564 run-command.c:1484 run_processes_parallel: done
19:58:10.328621 run-command.c:626 trace: run_command: 'gc' '--auto'
19:58:10.333115 git.c:344 trace: built-in: git 'gc' '--auto'
本文参考资料