转载自:http://lostechies.com/joshuaflanagan/2010/09/03/use-gitk-to-understand-git/
Moving from subversion to git can be a struggle, trying to understand what terms like checkout, commit, branch, remote, rebase all mean in the git world. I learned by experimenting in a demo repository, trying out various commands, and using gitk to visualize their impact. This post is broken up into two parts – after reading this, you may want to read the second part.
The gitk screen
I created a simple repository on github to walk through some scenarios. I’ll start by creating a local copy of the repository:
d:code>git clone git@github.com:joshuaflanagan/gitk-demo.git Initialized empty Git repository in d:/code/gitk-demo/.git/ remote: Counting objects: 9, done. remote: Compressing objects: 100% (4/4), done. remote: Total 9 (delta 0), reused 0 (delta 0) Receiving objects: 100% (9/9), done. d:code>cd gitk-demo d:codegitk-demo>gitk --all
There is a lot of information in this single screenshot:
- The upper left pane shows the series of commits to this repository, with the most recent on top.
- There have been three commits, all by Tony Stark.
- The commit message for the most recent commit was “third commit”
- There is a single local branch, named “master’”, it points to the most recent commit
- There is a single remote reference branch: the “master” branch from the remote repository named “origin”, it also points to the most recent commit
- The yellow dot next to the top commit indicates that is the snapshot currently in my working folder (referred to as HEAD)
- I’ve highlighted the second commit, so that I can see its details in the lower pane
- The commit SHA (unique identifier, similar to subversion revision number) of the second commit is 3d024dd9e4a83d8c6a9a143a68b75d4b872115a6
- The lower right shows the list of files impacted by the second commit
- The lower left shows the commit details, including the full diff
- Clicking a file in the lower right pane scrolls the diff in the lower left pane to the corresponding section
A note about “master” and “origin”
When you first create a git repository, it starts with a single branch named “master”. There is nothing special about this branch, other than it is the default. You are free to create a new one, and delete master (although, I don’t see any reason to go against the default convention).
When you first clone a git repository, git will automatically create a remote for you named “origin”. A remote is just a name used to manage references (URLs) to other repositories. There is nothing special about the “origin” remote, other than it is created for you. You are free to create a new one and delete origin. In fact, if you are working with multiple remotes, I recommend you delete the origin remote and create a new one for the same repository, but using a more descriptive name. For example, when I work with the FubuMVC source code, in my local repository I have a remote named “darth” which refers to the main repository owned by DarthFubuMVC, and a remote named “josh”, which refers to my fork. If I had kept the name “origin”, I would always have to remember which one I cloned from.
Branching
What happens when I create a branch?
d:codegitk-demo>git branch issue123
Press CTRL-F5 in the gitk window to refresh the repository view
We see the new branch marker for the issue123 branch points to the same commit as master and origina/master. It is important to note that the “master” is bold, indicating that is still the current branch. The bold branch label is equivalent to the asterisk in the command line output:
d:codegitk-demo>git branch issue123 * master
Now if I switch to the new branch and refresh gitk:
d:codegitk-demo>git checkout issue123 Switched to branch 'issue123'
(We’re going to focus on information in the top pane from now on, so I’ll hide the bottom part of gitk)
Note: For convenience, I could have created and switched to the new branch in a single command: git checkout –b issue123
Making changes
When I refer to the “current branch”, I mean “the branch that will advance when I perform a commit”. This is where the gitk visualization really starts to help. I’ll make some changes to a file and then commit with the message “My first commit”:
d:codegitk-demo>git commit -am "My first commit" [issue123 f948bf8] My first commit 1 files changed, 2 insertions(+), 1 deletions(-)
The issue123 branch now points to my new commit. Neither the master nor origin/master branch pointers have moved.
As I continue to commit, the current branch pointer moves with me:
d:codegitk-demo>git commit -am "Added another fruit" [issue123 cac3c72] Added another fruit 1 files changed, 1 insertions(+), 0 deletions(-)
But I thought it was a branch
Since I was working in a branch, I expected to see a tree stucture, with nodes turning off from the main “trunk”. Something like this:
Instead, gitk shows all of the commits as a single straight line. When first using git, this was very confusing to me. My confusion stemmed from my misunderstanding of branches in git. Thinking about why gitk was showing all of the commits in a straight line finally brought the point home. In git, a branch is a label for a commit. The label moves to new commits as they are created. When you create a git branch, you are not changing anything in the structure of the repository or the source tree. You are just creating a new label.
Fast forward
After completing my work in the issue123 branch, I’ll want to merge the changes back into master. Usually when I think of a merge, I think of comparing two trees and applying the differences from one onto the other. I imagine each commit being replayed on the other branch. Merging issue123 into master would require applying each of my two commits to the master branch. However, this work has already been done, when I first performed the commits. Because the master label hasn’t moved since my work began on issue123, applying the diffs would end up with the same result. This is where the “single straight line” visualization really proves valuable – I can see that issue123 is directly ahead of master. Git is smart enough to recognize this situation and performs what it calls a fast-forward merge. A fast-forward merge isn’t really a merge at all – since no file content comparisons are necessary – it simply moves the master branch label up to point to the commit pointed at by the issue123 label.
To merge the changes from issue123 into master, I first checkout (switch to) the master branch and then do the merge:
d:codegitk-demo>git checkout master Switched to branch 'master'
d:codegitk-demo>git merge issue123 Updating bf37c64..cac3c72 Fast-forward fruits.txt | 1 + vegetables.txt | 3 ++- 2 files changed, 3 insertions(+), 1 deletions(-)
A few things to notice:
- The command-line output indicated the merge was a “Fast-forward”.
- A new commit was not created. There is no new snapshot of the repository required, since there is no new version of any files/folders that didn’t already exist in the repository.
- The remote origin/master branch has not moved. Everything we’ve done so far (except for the initial clone) has run completely locally, without contacting the github server.
Deleting a branch
The issue123 branch label is now redundant, since it points to the same commit as master. If there is no more work to do for issue123, we can safely get rid of the branch, without losing any historical information. If we later find out we need to make some changes to solve the issue, we can always create another branch (which is just a label). This is what it means when people say that branching in git is easy or lightweight.
d:codegitk-demo>git branch -d issue123 Deleted branch issue123 (was cac3c72).
Sharing with the world
As noted above, everything we’ve done so far has been in our local copy of the repository. The “master” branch at the “origin” remote has not moved. If I look at the github page for the repo, I can confirm that none of my commits exist.
To copy changes from my instance of the repository up to github’s servers, I need to push from my master branch to the “master” branch of my remote named “origin”.
d:codegitk-demo>git push origin master Counting objects: 9, done. Delta compression using up to 2 threads. Compressing objects: 100% (4/4), done. Writing objects: 100% (6/6), 609 bytes, done. Total 6 (delta 0), reused 0 (delta 0) To git@github.com:joshuaflanagan/gitk-demo.git bf37c64..cac3c72 master –> master
Take note of the SHA1 ID of the latest commit, cac3c723…. Look back at the previous screenshots and notice that this identifier has not changed through all of the operations (merge, deleting the branch, etc). When we refresh the github page, we can see it has updated with all of the work I did locally. Notice the commit identifier on the web page matches up with the SHA1 ID we see locally. Also, there is no indication that any of the work was done on a separate branch – nobody ever needs to know. You are free to branch as much as you want locally without impacting a shared repository.
In my initial overview, I demonstrated creating a branch, making a couple commits to that branch, and then merging them back into master. In that scenario, there were no changes in my local master (and since it was contrived, I knew there were no changes in the remote origin/master), so the merge was really just a fast-forward. In the real world, my workflow would be slightly different, as I would have to account for other people making changes to our shared repository (my origin remote).
To demonstrate, I’ll rewind time and pretend we’re back at the moment where we switched to master as we prepared to merge in the changes from the issue123 branch. The gitk visualization of the repository looked like:
Before I merge my changes into master, I want to make sure my master branch is in synch with the central repository on github (which I refer to using the remote “origin”). We can see in the screenshot that my master branch refers to the same commit as origin/master, but that’s because I haven’t communicated with origin in a long time. All of my previous operations were done locally. In order to get the latest state from the remote repository, I need to perform a fetch.
d:codegitk-demo>git fetch origin remote: Counting objects: 7, done. remote: Compressing objects: 100% (4/4), done. remote: Total 6 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (6/6), done. From github.com:joshuaflanagan/gitk-demo bf37c64..ec8d10f master -> origin/master
I’ve downloaded new commits to my local repository and moved the remote branch pointer, but I haven’t changed anything in my local branches. If I were to look in my working folder, I would see that none of my files have changed. To get the latest changes to the master branch from Tony, I need to merge them into my master branch.
d:codegitk-demo>git merge origin/master Updating bf37c64..ec8d10f Fast-forward dairy.txt | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) create mode 100644 dairy.txt
x
Once again, since there was a straight line from my local master to origin/master, git was able to perform a fast-forward merge. The master branch has moved to point to Tony’s latest commit. My working directory has been updated accordingly to have the changes he made.
Note that none of the changes I made for issue123 have been included in master yet. We need to merge the issue123 branch back into master, and ultimately push them to the shared repository on github. However, there is no straight line between issue123 and master – neither is a direct descendent of the other – which means we cannot do a fast-forward merge. We have to do either a “real” merge, or rebase.
Merge
To perform a “real” merge, we just use the merge command as we have all along. Doing a fast-forward vs. a real merge is handled by git – not something you specify.
d:codegitk-demo>git merge issue123 Merge made by recursive. fruits.txt | 1 + vegetables.txt | 3 ++- 2 files changed, 3 insertions(+), 1 deletions(-)
Previously with our fast-forward merges, no new commits were created – git just moved branch pointers. In this case, since there is a new snapshot of the repository that never existed before (includes Tony’s new changes, as well as my changes from issue123), a new commit is required. The commit is automatically created with an auto-generated commit message indicating it was a merge. The merge commit has multiple ancestors (indicated by the red line going to the “Forgot the yogurt” commit” and the blue line going to the “Added another fruit” commit). We can safely delete the issue123 branch now, but unlike in the fast-forward example, when we push our changes to the central server, there will be evidence that the issue123 message existed (in the merge commit message, and the repository history shows the branched paths).
d:codegitk-demo>git branch -d issue123 Deleted branch issue123 (was cac3c72). d:codegitk-demo>git push origin master Counting objects: 12, done. Delta compression using up to 2 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (8/8), 914 bytes, done. Total 8 (delta 0), reused 0 (delta 0) To git@github.com:joshuaflanagan/gitk-demo.git ec8d10f..5835415 master –> master
Rebase
There are a few reasons not to like the merge approach:
- Branching paths in the history can be unnecessarily complicated
- The extra merge commit.
- Your branch is now no longer a private, local concern. Everyone now knows that you worked in an issue123 branch. Why should they care?
Note: There are some scenarios where you want to preserve the fact that work was done in a separate branch. In those cases, the above “downsides” are not really downsides, but the desired behavior. However, in many cases, the merge is only necessary because of the timing of parallel work, and preserving that timeline is not important.
You can use git rebase to avoid these issues. If you have commits that have never been shared with anyone else, you can have git re-write them with a different starting point. If we go back in time to the point right after we merged in Tony’s changes, but before merging in issue123:
Currently, the issue123 commits branch off from the “third commit”. The rest of the world doesn’t need to know that is where we started our work. We can re-write history so that it appears like we started our work from Tony’s latest changes. We want the issue123 commits to branch off from master, the “Forgot the yogurt” commit.
d:codegitk-demo>git checkout issue123 Switched to branch 'issue123' d:codegitk-demo>git rebase master First, rewinding head to replay your work on top of it... Applying: My first commit Applying: Added another fruit
After a rebase, the “My first commit” now directly follows the “Forgot the yogurt”” commit, making the issue123 branch a direct descendent of the master branch. This means we can now do a fast-forward merge to bring issue123’s changes into master.
d:codegitk-demo>git checkout master Switched to branch 'master' d:codegitk-demo>git merge issue123 Updating ec8d10f..b5a86d6 Fast-forward fruits.txt | 1 + vegetables.txt | 3 ++- 2 files changed, 3 insertions(+), 1 deletions(-)
When we delete the issue123 branch and push these changes to the remote repository on github, there is no longer any evidence that the issue123 branch ever existed. Anyone that pulls down the repository will see a completely linear history, making it easier to understand.
d:codegitk-demo>git branch -d issue123 Deleted branch issue123 (was b5a86d6). d:codegitk-demo>git push origin master Counting objects: 9, done. Delta compression using up to 2 threads. Compressing objects: 100% (4/4), done. Writing objects: 100% (6/6), 626 bytes, done. Total 6 (delta 1), reused 0 (delta 0) To git@github.com:joshuaflanagan/gitk-demo.git ec8d10f..b5a86d6 master –> master