Git Principle Summary

23 篇文章 0 订阅

Git Principle Summary


Git is a content-addressable file system. The core is a key-value data store. You can insert anything and it will return a 40 length hash number by the SHA-1 algorithm.

1.The reflection of the command

Git command can be made to two parts: plumbing command and porcelain command.
这里写图片描述
Here I will mainly discuss the plumbing command.

Hash-object: store the data to the git database, will return the hash key value.

$ echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4

cat-file: get the data from the git database

$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
test content

the principle of the modification in git:
create a file and then write to the git database

$ echo 'version 1' > test.txt
$ git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30

Update the file and write again:

$ echo 'version 2' > test.txt
$ git hash-object -w test.txt
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a

The database will store the two version of this file:

$ find .git/objects -type f
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

Roll back to the version1(just like checkout,reset):

$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > test.txt
$ cat test.txt
version 1

The tree object: file name storage and manage files.

In Git all of the data is stored in the data object and tree object. Tree object will store the directory, data object will store the file content .Tree object will store its contents’ SHA-1 pointer .Here is the content of the master branch’s tree object :

$ git cat-file -p master^{tree}
100644 blob a906cb2a4a904a152e80877d4088654daad0c859      README
100644 blob 8f94139338f9404f26296befa88755fc2598c289      Rakefile
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0      lib

The lib is a directory, it is also a tree object:

$ git cat-file -p 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0
100644 blob 47c6340d6459e05787f644c2447d2595f5d3a54b      simplegit.rb

Tree structure:

这里写图片描述

Update-index: create a stage space for a specific file, just like ‘git add’:

$ git update-index --add --cacheinfo 100644 \
  83baae61804e65cc73a7201a7252750c76066a30 test.txt

100644: common file
100755: executable file
120000: pointer link
–add : if the file is untrack status(the stage workspace doesn’t have the file)
–cacheinfo: if the file is in the Git database but not the current directory.

Write-tree: create a new tree object, then write the content from the stage space to tree object

$ git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git cat-file -p d8329fc1cc938780ffdd9f94e0d364e0ea74f579
100644 blob 83baae61804e65cc73a7201a7252750c76066a30      test.txt

Check this SHA-1 hash number’s type:

$ git cat-file -t d8329fc1cc938780ffdd9f94e0d364e0ea74f579
tree

Then we can update the test.txt , add a new file ‘new.txt’ and write to a new tree:

$ echo 'new file' > new.txt
$ echo 'test file2' > test.txt
$ git update-index --cacheinfo 100644 \
  1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt
$ git update-index test.txt
$ git update-index --add new.txt

$ git write-tree
0155eb4229851634a0f03eb265b69f5a2d56f341
$ git cat-file -p 0155eb4229851634a0f03eb265b69f5a2d56f341
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a      test.txt

But we can find the the previous version of the test.txt is gone. If we want to remain the previous tree object(useful for rolling back or checkout),we can use ‘read-tree’:

$ git read-tree --prefix=bak d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git write-tree
3c4e9cd789d88d8d89c1073707c3585e41b0e614
$ git cat-file -p 3c4e9cd789d88d8d89c1073707c3585e41b0e614
040000 tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579      bak
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a      test.txt

这里写图片描述

Commit object: create a new commit object,store the create time, person info ,comments of the tree object

$ echo 'first commit' | git commit-tree d8329f
fdf4fc3344e67ab068f836878b6c4951e3b15f3d

d8329f is the first numbers of the tree object needed to be commit
the info of the commit object, the info will be stored in ‘.git/HEAD’:

$ git cat-file -p fdf4fc3
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Scott Chacon <schacon@gmail.com> 1243040974 -0700
committer Scott Chacon <schacon@gmail.com> 1243040974 -0700

first commit

we can commit more, every time we hook the current and the previous’ tree object using SHA-1 value:

$ echo 'second commit' | git commit-tree 0155eb -p fdf4fc3
cac0cab538b970a37ea1e769cbbde608743bc96d
$ echo 'third commit'  | git commit-tree 3c4e9c -p cac0cab
1a410efbd13591db07496601ebc7a059dd55cfe9

The commit log will be like:

$ git log --stat 1a410e
commit 1a410efbd13591db07496601ebc7a059dd55cfe9
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:15:24 2009 -0700

	third commit

 bak/test.txt | 1 +
 1 file changed, 1 insertion(+)

commit cac0cab538b970a37ea1e769cbbde608743bc96d
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:14:29 2009 -0700

	second commit

 new.txt  | 1 +
 test.txt | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

commit fdf4fc3344e67ab068f836878b6c4951e3b15f3d
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:09:34 2009 -0700

    first commit

 test.txt | 1 +
 1 file changed, 1 insertion(+)

So below is the principle of the ‘git add’ and ‘git commit’. Here is the structure chart:
这里写图片描述

2. Structure

The structure of the Git: workspace, buffer space, local repo, remote repo
这里写图片描述
The advantage of the stage space: partly add, will not influence the modification of the workspace, record the add info such as the add time and so on.

The local repo storage:
这里写图片描述
The local repo will not store 6 files but 4,each file will has a 40 length hash number using the ‘SHA1’ algorithm. So with the help of the hash number, the way of the storage is just like:
这里写图片描述

3.directory ‘.git/’stores all of the Git content.

这里写图片描述
Config file: stores the configuration parameters.
hooks directory: includes the client/server hooks scripts.
Info directory: includes the global exclude file, storing the content which will not be record in the .gitignore file.
Refs directory: store commit object pointer which will point to the data
Description file: used by the GitWeb progress(do not care)
The important:
HEAD file: store the current branch info
Index file: store the stage info
Objects directory: store all of the data content.
Refs directory: store commit object pointer which will point to the data/branch

Config file

The configuration file of the Git, stores the related branch, submodule info, fetch operation and so on
这里写图片描述

Objects directory

Git uses the algorithm ‘SHA1’ to create a 40 length hash number which is used to identify the specific file, and the hash numbers are stored in the ‘objects’ directory.
这里写图片描述
这里写图片描述
The file will be stored with the ‘Blob’ format ,The first two hash number will be the directory name and the remaining 38 number will be the specific file name.
The directory will be stored with the ‘Tree’ format.
The commit will be stored with the ‘Commit’ format.
这里写图片描述
The git will store the file just like:
这里写图片描述
Refs directory:
The SHA-1 number is hard to read and we use the reference file to point to the file we want.
It will store the reference files, such as local branch, remote branch, label:
这里写图片描述
For example we can make a reference for the ‘third commit’:

echo "1a410efbd13591db07496601ebc7a059dd55cfe9" > .git/refs/heads/master

then you can use the reference rather than the SHA-1 number:

$ git log --pretty=oneline  master
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

A safer way is to use ‘update-ref’:

$ git update-ref refs/heads/test cac0ca

$ git log --pretty=oneline test
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

and the structure chart is just like:
这里写图片描述

HEAD: HEAD file is a symbolic reference, which will point to the current branch(just like a pointer):

$ cat .git/HEAD
ref: refs/heads/master

if we checkout to another branch, for example ‘test’, it will be changed:

git checkout test
$ cat .git/HEAD
ref: refs/heads/test

When we use ‘git commit’, it will use the reference’s SHA-1 as the parent commit id .
Refs/remotes/origin/master: store the SHA-1 number of the remote master branch.
Refs/head: store the remote reference and remote branch, only read.

Index file:

This file is the stage space, so stage space is a file.
这里写图片描述
The second line is the file’s blob which stores the stage content. The third line is the conflict status of this file and the fourth one is the path of the file.

.git/HEAD:

will store the current branch:
这里写图片描述

4.the File status

这里写图片描述
The file has two status, stage space-local repo will use the green colour and the other one will use the red colour

5.Git package principle

Git will use zlib to compress files.
For the different versions of the same file, Git will just store the difference. When you want to push them, Git will pack all of the files.
For example ‘git gc’ will do the pack:

$ git gc
Counting objects: 18, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (18/18), done.
Total 18 (delta 3), reused 0 (delta 0)

18 objectes will be packed. And we can see the files after pack:

$ find .git/objects -type f
.git/objects/info/packs
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.idx
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack

.pack file includes all of the data info. .index file includes the excursion info, just like:

$ git verify-pack -v .git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.idx
2431da676938450a4d72e260db3bf7b0f587bbc1 commit 223 155 12
69bcdaff5328278ab1c0812ce0e07fa7d26a96d7 commit 214 152 167
80d02664cb23ed55b226516648c7ad5d0a3deb90 commit 214 145 319
43168a18b7613d1281e5560855a83eb8fde3d687 commit 213 146 464
092917823486a802e94d727c820a9024e14a1fc2 commit 214 146 610
702470739ce72005e2edff522fde85d52a65df9b commit 165 118 756
d368d0ac0678cbe6cce505be58126d3526706e54 tag    130 122 874
fe879577cb8cffcdf25441725141e310dd7d239b tree   136 136 996
d8329fc1cc938780ffdd9f94e0d364e0ea74f579 tree   36 46 1132
deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree   136 136 1178
d982c7cb2c2a972ee391a85da481fc1f9127a01d tree   6 17 1314 1 \
  deef2e1b793907545e50a2ea2ddb5ba6c58c4506
3c4e9cd789d88d8d89c1073707c3585e41b0e614 tree   8 19 1331 1 \
  deef2e1b793907545e50a2ea2ddb5ba6c58c4506
0155eb4229851634a0f03eb265b69f5a2d56f341 tree   71 76 1350
83baae61804e65cc73a7201a7252750c76066a30 blob   10 19 1426
fa49b077972391ad58037050f2a75f74e3671e92 blob   9 18 1445
b042a60ef7dff760008df33cee372b945b6e884e blob   22054 5799 1463
033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob   9 20 7262 1 \
  b042a60ef7dff760008df33cee372b945b6e884e
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a blob   10 19 7282

6.Git transformation protocol

Dumb protocol : the server doesn’t need the specific Git code, just using the HTTP GET request. It can’t guarantee the safety and privacy, also the client can’t send data to the server, so most of the Git servers do not use it.
Smart protocol: SSH or HTTP(S)
For example the push progress ‘git push origin master’ with the SSH:
the client runs the process ‘send-pack’ and server runs ‘receive-pack’ for upload the data.’send-pack’ process connects the server using SSH, then at the server will use command just like below to tell the client related status info:

$ ssh -x git@server "git-receive-pack 'simplegit-progit.git'"
00a5ca82a6dff817ec66f4437202690a93763949 refs/heads/master report-status \
	delete-refs side-band-64k quiet ofs-delta \
	agent=git/2:2.1.1+github-607-gfba4028 delete-refs
0000

Thern then client will judge which records is different from the server’s and tell the ‘receive-pack’ process which references will be updated. The response is just like:

0076ca82a6dff817ec66f44342007202690a93763949 15027957951b64cf874c3557a0f3547bd83b3ff6 \
	refs/heads/master report-status
006c0000000000000000000000000000000000000000 cdfdb42577e2506715f8cfeacdbabc092bf63e8d \
	refs/heads/experiment
0000

Evey line includes the length, old SHA-1 number, new SHA-1 number and the updated reference.Then the client will send the package file to the server. Finally the server will response like:

000eunpack ok

On the other hand, the download progress will use ‘fetch-back’ process(client) and ‘upload-pack’ process(server).
Also with the protocol SSH. The ‘fetch-back’ will connect to the ‘upload-pack’:

$ ssh -x git@server "git-upload-pack 'simplegit-progit.git'"

Then ‘upload-pack’ will return the response:

00dfca82a6dff817ec66f44342007202690a93763949 HEAD multi_ack thin-pack \
	side-band side-band-64k ofs-delta shallow no-progress include-tag \
	multi_ack_detailed symref=HEAD:refs/heads/master \
	agent=git/2:2.1.1+github-607-gfba4028
003fe2409a098dc3e53539a9028a94b6224db9d6a6b6 refs/heads/master
0000

Then ‘fetch-pack’ will tell the server the SHA-1 needed and already have:

003cwant ca82a6dff817ec66f44342007202690a93763949 ofs-delta
0032have 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
At last receive the SHA-1 files, recall the ‘done’ info:
0009done
0000

7.Git maintain and data recover

The Git will execute the command ‘auto gc’ to clean the garbage data or you can use the ‘git gc’ to clean.
Sometimes you lose a commit info or delete a branch, so how to recover it?
For example, here is the commit info:

$ git log --pretty=oneline
ab1afef80fac8e34258ff41fc1b867c702daa24b modified repo a bit
484a59275031909e19aadb7c92262719cfcdf19a added repo.rb
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

now we reset to the third one, 1a410…:

$ git reset --hard 1a410efbd13591db07496601ebc7a059dd55cfe9
HEAD is now at 1a410ef third commit
$ git log --pretty=oneline
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

how to recover the commit info? You can use the command ‘git reflog’:

$ git reflog
1a410ef HEAD@{0}: reset: moving to 1a410ef
ab1afef HEAD@{1}: commit: modified repo.rb a bit
484a592 HEAD@{2}: commit: added repo.rb

Then ‘git log -g’ for detailed info:

$ git log -g
commit 1a410efbd13591db07496601ebc7a059dd55cfe9
Reflog: HEAD@{0} (Scott Chacon <schacon@gmail.com>)
Reflog message: updating HEAD
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:22:37 2009 -0700

		third commit

commit ab1afef80fac8e34258ff41fc1b867c702daa24b
Reflog: HEAD@{1} (Scott Chacon <schacon@gmail.com>)
Reflog message: updating HEAD
Author: Scott Chacon <schacon@gmail.com>
Date:   Fri May 22 18:15:24 2009 -0700

       modified repo.rb a bit

and make a new local branch pointing to this commit:

$ git branch recover-branch ab1afef
$ git log --pretty=oneline recover-branch
ab1afef80fac8e34258ff41fc1b867c702daa24b modified repo a bit
484a59275031909e19aadb7c92262719cfcdf19a added repo.rb
1a410efbd13591db07496601ebc7a059dd55cfe9 third commit
cac0cab538b970a37ea1e769cbbde608743bc96d second commit
fdf4fc3344e67ab068f836878b6c4951e3b15f3d first commit

if the branch and the logs are all deleted, you can use ‘git fsck --full’, it will list all of the objects who don’t have the reference:

$ git branch -D recover-branch
$ rm -Rf .git/logs/

$ git fsck --full
Checking object directories: 100% (256/256), done.
Checking objects: 100% (18/18), done.
dangling blob d670460b4b4aece5915caf5c68d12f560a9fe3e4
dangling commit ab1afef80fac8e34258ff41fc1b867c702daa24b
dangling tree aea790b9a58f6cf6f2804eeac9f0abbe9631e4c9
dangling blob 7108f7ecb345ee9d0084193f147cdad4d2998293

8.Git environment variable

Often used:

GIT_EXEC_PATH:the sub-progress path
GIT_DIR:the path of .git directory
GIT_INDEX_FILE:the path of index file
GIT_OBJECT_DIRECTORY:the path of .git/objects
GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE:the aunthor name/email/date when commiting
GIT_MERGE_VERBOSITY:control the output info level when doing the merge,default value is 2(print the conflict and modified files) 

9.Add & Commit

  • ‘git add’: file from workspace to buffer space(unstage->stage),git will update the sha-1 number of related file, transform the content to blob and write to the database, then update the hash map.
  • ‘git commit’: git will transform the hash map to the tree object and write to the database. Then make a commit object , write to the database and update the branch pointer.

10.Rebase,Merge,Pull

Be clear:

git pull = git fetch + git merge
git pull –rebase = git fetch + git rebase

So the key is to discuss the ‘rebase’ and ‘merge’. For example you checkout a remote branch and do some commit in your own local branch, at the same time other guys do some modification in the remote branch, now the git tree is like:
这里写图片描述
If you use the ‘git pull(fetch+merge)’, it will be like:
这里写图片描述
it will pull the modification of the remote branch and then add together will your local modification, just like a merge commit.

the git rebase will cancel your commit in the local branch, store them with the patch format(in the path .git/rebase, here is the C5,C6),then update the local branch to the newest origin branch and combine the patch code to the local branch, then delete the patch code:
这里写图片描述
这里写图片描述
The difference:
The git log will be different For example, the C3 is 9:00, C5 is 10:00, C4 is 11:00, C6 is 12:00.
这里写图片描述
If you use the ‘git merge’,the ‘git log’ will be like: C7 ,C6,C4,C5,C3,C2,C1
If you use the ‘git rebase’,the ‘git log’ will be like: C7 ,C6‘,C5’,C4,C3,C2,C1(C6‘,C5’ are the clone of C6,C5).In user’s view, the order is like: C7 ,C6,C5,C4,C3,C2,C1

11.Git init:

Git will create the ‘.git/’ directory

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值