Version control systems (VCSs) are tools used to track changes to source code (or other collections of files and folders). While other VCSs exist, Git is the de facto standard for version control. Memorize these shell sommands and type them to sync up. If you get any errors, save your work elsewhere, delete the project, and download a refresh copy.
Git’s data model
Snapshots
In Git terminology, a file is calles a “blob”, and it’s just a bunch of bytes. A directory is calles a “tree”, and it maps names of blobs or trees. A snapshot is the top-level tree that is being tracked.
<root> (root)
|
+- foo (tree)
| |
| + bar.txt (blob, contents = "hello world")
|
+- baz.txt (blob, contents = "git is wonderful")
Modeling history: relating snapshots
In Git, a history is a directed acyclic graph (DAG) of snapshots. Git calls these snapshots "commit"s. In the ASCII art, the o
s correspond to individual commits (snapshots). The arrows point to the parent of each commit.
Data model, as pseudocode
On disk, all Git stores are objects and references: that’s all there is to Git’s data model.
// a file is a bunch of bytes
type blob = array<byte>
// a directory contains named files and directories
type tree = map<string, tree | blob>
// a commit has parents, metadata, and the top-level tree
type commit = struct {
parent: array<commit>
author: string
message: string
snapshot: tree
}
Objects and content-addressing
An “object” is a blob, tree, or commit.
type object = blob | tree | commit
In Git data store, all objects are content-addressed by their SHA-1 hash.
objects = map<string, object>
def store(object):
id = sha1(object)
objects[id] = object
def load(id):
return objects[id]
e.g. the tree for the example directory structure above (snopshot)
% git cat-file -p 698281bc680d1995c5f4caaf3359721a5a58d48d
100644 blob 4448adbf7ecd394f42ae135bbeed9676e894af85 baz.txt
040000 tree c68d233a33c5c06e0340e4c224f0afca87c8ce87 foo
The tree itself contains pointers to its contents. When objects reference other objects, they don’t actually contain them in their on-disk represention, but have a reference to them by their hash.
% git cat-file -p 4448adbf7ecd394f42ae135bbeed9676e894af85
git is wonderful
References
Git use human-readable names for SHA-1 hashes, called “references”. References are pointers to commits. For example, the master
reference usually points to the latest commit in the main branch of development.
references = map<string, string>
def update_reference(name, id):
references[name = id]
def read_reference(name):
return references[name]
def load_reference(name_or_id):
if name_or_id in references:
return load(references[name_or_id])
else:
returm load(name_or_id)
In Git, “where we currently are” is a special reference called “HEAD”. So, when we takes a new snapshot, we know what it is relative to (how we set the parents
field of the commit).
Repositories
We can define what (roughly) is a Git repository: it is the data objects and references. All git commands map to some manipulation of the commit DAG by adding objects and adding/updating references.
Whenever you’re typing in any command, think about what manipulation the command is making to the underlying graph data structure. Conversely, if you’re trying to make a particular kind of change to the commit DAG, e.g. “discard uncommitted chages and make the ‘master’ ref point to commit 5d83f9e
”, there’s probably a command to do it:
git checkout master
git reset --hard 5d83f9e
Staging area
Git accomodates such scenarios by allowing you to specify which modifications should be included in the next snapshot through a mechanism called the “staging area”.
Git command-line interface
Basic
git help <command>
git init # creats a new git repo, with data stored in the `.git` directory
git status # tells you what's going on
git add <filename> # adds files to staging area
git commit # creat a new commit
git log # show a flattened log of history
git log --all --graph --decorate # visualizes history as a DAG
Branching and merging
git branch <name> # creates a branch
git checkout -b <name> # creats a branch and switches to it. same as `git branch <name>; git checkout <name>
Remotes
git remote add <name> <url> # add a remote
git push <remote> <local branch>:<remote branch> # send objects to remote, and update remote reference
Undo
git commit --amend # edit a commit's contents/message
git reset HEAD <file> # unstage a file
git checkout -- <file> # discard changes
Exercises
git clone https://github.com/missing-semester/missing-semester