文件是如何存储的
- 构造一个19M的大文件:
$ git checkout -b tmp
$ node -e 'console.log(Array(1024000).fill(0).map(e => Math.random()).join("\n"))' > x.txt
$ ls -lh x.txt
-rw-r--r-- 1 root root 19M Aug 9 15:22 x.txt
$ git add x.txt
$ git commit -m "big file test"
# 使用ls-tree查看文件记录
$ git ls-tree HEAD x.txt
100644 blob bcd5a67a32f212cdd63ea1d698aed38a3c8f5f06 x.txt
# 根据CommitID找到文件的存储路径,规则是.git/objects/CommitID[0:1]/CommitID[2:]
# 查看对应的object, 从19M压缩到9.2M,git使用zlib压缩文件
$ ls -lh .git/objects/bc/d5a67a32f212cdd63ea1d698aed38a3c8f5f06
-r--r--r-- 1 root root 9.2M Aug 9 15:23 .git/objects/bc/d5a67a32f212cdd63ea1d698aed38a3c8f5f06
另外,初始存储的文件被称为loose
(松散的),git
会适时将文件打包,删除原始的松散文件,参见:packfile
- 修改这个大文件,查看提交后内容
# 对x.txt添加一个字符
$ echo 'x' >> x.txt
$ ls -lh x.txt
-rw-r--r-- 1 root root 19M Aug 9 15:25 x.txt
$ git add x.txt
$ git commit -m "add char to big file"
$ git ls-tree HEAD x.txt
100644 blob a74fdf2ac9536e63c0b30fe9d59e700bcc874845 x.txt
# 根据CommitID找到文件的存储路径
$ ls -lh .git/objects/a7/4fdf2ac9536e63c0b30fe9d59e700bcc874845
-r--r--r-- 1 root root 9.2M Aug 9 15:25 .git/objects/a7/4fdf2ac9536e63c0b30fe9d59e700bcc874845
# 使用ls -i查看inode节点,它们确实属于不同的文件
$ ls -lh -i .git/objects/bc/d5a67a32f212cdd63ea1d698aed38a3c8f5f06 .git/objects/a7/4fdf2ac9536e63c0b30fe9d59e700bcc874845
57451729 -r--r--r-- 1 root root 9.2M Aug 9 15:25 ../../.git/objects/a7/4fdf2ac9536e63c0b30fe9d59e700bcc874845
57451279 -r--r--r-- 1 root root 9.2M Aug 9 15:23 ../../.git/objects/bc/d5a67a32f212cdd63ea1d698aed38a3c8f5f06
结论:对于大文件,即使一个字节变化,也会重新生成一个压缩对象
从仓库中删除提交的大文件
假设对应的分支是tmp
, 则先找到tmp
对应的CommitID
, 然后使用git cat-file -p
找到这个分支指向的tree
对象:
$ git rev-parse tmp # 解析tmp分支的CommitID
942f4aa0c32cf96829503ed1adac5bce1e1a742a
# 找到CommitID对应的TreeID
$ git cat-file -p 942f4aa0c32cf96829503ed1adac5bce1e1a742a
tree 51bb7ef1160b652e9a290e86712a8ee15968a915
parent d28d0ab202c50ef89b7d3e2fef5c18301e5ad9e7
author xx <xx@yy.com> 1628246769 +0800
committer xx <xx@yy.com> 1628246769 +0800
WIP
Change-Id: Ie0108abfb9fd94dda1171db941eb879e2bc15c25
然后,使用git ls-tree
找到对应tree
下面的所有文件,假设codeql/database
是需要删除的目录:
# 使用--full-tree 不受限于当前目录,与根目录结果相同
# 参数: codeql/database/, 指定的基准目录
$ git ls-tree --full-tree 51bb7ef1160b652e9a290e86712a8ee15968a915 codeql/database/
100644 blob bbcd1e9926c300cd484951c4a1faa5563a07df4f codeql/database/codeql-database.yml
040000 tree 5328662ce21bbe1d4578b97bdcd6ff0d67098307 codeql/database/db-go
040000 tree ca7bc18e91d43f33eb08997b322da4a143dfe613 codeql/database/log
100644 blob a3116fe8ea3ed95ad7a77af4714779a0c32bc4ae codeql/database/src.zip
# -r 递归列出, 即,如果一个项类型是tree,则列出其内容, -r时,不会列出tree的内容,只会列出blob的内容
$ git ls-tree --full-tree -r 51bb7ef1160b652e9a290e86712a8ee15968a915 codeql/database/
100644 blob bbcd1e9926c300cd484951c4a1faa5563a07df4f codeql/database/codeql-database.yml
100644 blob 69b34f08181c39a56f4e5d5539aeb2dc2b8e29ba codeql/database/db-go/default/array_length.rel
100644 blob 51828185b1b19b01acc8ba3dd1a6b5b4046567a6 codeql/database/log/build-tracer.log
100644 blob 2bc26689cac16a0b728dad578cefa59f29bee0d1 codeql/database/log/database-create-20210806.183800.641.log
100644 blob 1c3b02981dcc57ff0c8208be3fa3e412994100ef codeql/database/log/database-index-files-20210806.184202.441.log
100644 blob a3116fe8ea3ed95ad7a77af4714779a0c32bc4ae codeql/database/src.zip
# 列出文件占用的空间大小
# awk substr(s, start, length), start从1开始
$ git ls-tree --full-tree -r 51bb7ef1160b652e9a290e86712a8ee15968a915 codeql/database|awk '{printf ".git/objects/%s/%s\n",substr($3,1,2),substr($3,3)}'|xargs ls -lh
# 删除
$ git ls-tree --full-tree -r 51bb7ef1160b652e9a290e86712a8ee15968a915 codeql/database|awk '{printf ".git/objects/%s/%s\n",substr($3,0,2),substr($3,2)}'|xargs rm
只有确定了整个文件都是不需要的情况下,才能删除。
找回被删除的分支
# 使用git reflog找到操作记录
$ git reflog
942f4aa (HEAD -> codeql, wenwen_dev) HEAD@{0}: checkout: moving from 942f4aa0c32cf96829503ed1adac5bce1e1a742a to codeql
942f4aa (HEAD -> codeql, wenwen_dev) HEAD@{1}: checkout: moving from master to 942f4aa
70399d2 (origin/master, origin/HEAD, master) HEAD@{2}: reset: moving to origin/master
324e2ef HEAD@{3}: checkout: moving from tmp to master
de14deb (tmp) HEAD@{4}: commit: big file test
324e2ef HEAD@{5}: checkout: moving from master to tmp
324e2ef HEAD@{6}: checkout: moving from tmp to master
e1b97fc HEAD@{7}: commit: add char to big file
ee4ac0e HEAD@{8}: commit: big file test
324e2ef HEAD@{9}: checkout: moving from master to tmp
324e2ef HEAD@{10}: checkout: moving from tmp to master
942f4aa (HEAD -> codeql, wenwen_dev) HEAD@{11}: checkout: moving from tmp2 to tmp
e740f0c HEAD@{12}: commit: big file test
942f4aa (HEAD -> codeql, wenwen_dev) HEAD@{13}: checkout: moving from tmp to tmp2
942f4aa (HEAD -> codeql, wenwen_dev) HEAD@{14}: checkout: moving from tmp2 to tmp
030ce71 HEAD@{15}: commit: fuck
942f4aa (HEAD -> codeql, wenwen_dev) HEAD@{16}: checkout: moving from tmp to tmp2
942f4aa (HEAD -> codeql, wenwen_dev) HEAD@{17}: checkout: moving from wenwen_dev to tmp
942f4aa (HEAD -> codeql, wenwen_dev) HEAD@{18}: commit: WIP
d28d0ab HEAD@{19}: reset: moving to origin/master
46edf9c (origin/wenwen_dev) HEAD@{20}: reset: moving to HEAD
46edf9c (origin/wenwen_dev) HEAD@{21}: checkout: moving from master to wenwen_dev
324e2ef HEAD@{22}: clone: from git@codex.org:webcast/user_pack.git
# 找到对应的commit之后,使用git checkout
$ git checkout 942f4aa
$ git switch -c tmp # 创建新的分支
取消提交
git checkout origin/master -- go.sum go.mod
获取commit msg
# %s = message
git log --format=%s -n 1 HEAD
Alias
为了使git命令更加简洁, 可以通过alias来定义
$ git config --global alias.co checkout
$ git config --global alias.br branch
$ git config --global alias.ci commit
$ git config --global alias.st status
参考:git aliases
alias中可通过!执行嵌套命令。
git config --global alias.lm 'log --format=%s -n1'
git config --global alias.clm '!git commit --amend -m "$(git lm)"'
使用git clm
即可覆盖上一次提交的内容
git grep
git grep
用于搜索模式, -n
打印行号
-p
或 --show-function
打印相关的上下文信息