Introduction to Git, Part 1

Git is a (distributed) version control system. What is that? A version control system is software that allows a programmer to track and manage the history of a project, where such a project could be a single file, a set of files, or an entire open source project with multiple programmers contributing from around the world.

The distinction of “distributed” means that there is no central server where the source code and history of a project lives. When a Git project is cloned (“clone” is Git’s word for the initial checkout of a project), you receive the entire history of the project from the current state all the way back to the beginning of time. Having the entire project stored locally on your computer without a central server coordinating everything makes for extremely fast merging, diffing, and log history lookups.

Some of the more commonly used version control systems that you have likely heard about are CVS and Subversion. This tutorial will take a “forget everything you know about CVS or Subversion” approach. As someone who has used all three of these systems in the professional realm, I can testify that some knowledge of CVS or Subversion can be useful when approaching Git, but it is not necessary. The best way to learn Git is to start using Git for what Git is.

Why use Git?

That’s a perfectly cromulent question. If I am just one person working on my own website, why should I bother with version control? I have a pretty good memory and I know others won’t be overwriting my code.

Maybe you have a set of files for a website you are developing for a client, looking something like this.

sean@beerhaus:~/Workspace$ ls
images/  templates/  index.php.save1  index.php.save3
js/      index.php   index.php.save2  index.php.save4

Here, index.php is your current file and index.php.save# are a set of backups from previous days’ work.

Now you are editing index.php. You have just added a pretty neat feature after an overdue but deserved moment of inspiration. To be smart, you make a backup of this file called index.php.save5. The next day you go back to work on index.php and you decide the new feature from yesterday isn’t really going to work and you remove it from the file. A few weeks go by and by this time you’ve saved index.php.save27. The current state of the project is pretty stable so you no longer need the previous 27 saved files, and so you delete them all.

Several days after your cleanup you feel that feature from index.php.save5 deserves a second chance, but now it is gone forever. Sure, you might have some extra backup copies, and you might even have a file undelete tool. But is that really the best way to be managing your source code?

The problem gets worse when you add more people to your project. Even with two people working on the same project, there will be some unintentionally deleted and overwritten files. Maybe you email tarballs of the source code back and forth and you can look up and old email to get the archive with files you have just deleted. Good luck.

This is the part where 99 out of 100 tutorials would predictably repeat the prized cliche “enter Git.”

How to use Git

The first step in using Git is to get Git. If you are using an a variety of Linux, this is pretty easy. (Related fact: Git was initially designed and developed for distributed development on the Linux kernel by the same guy who created the kernel, Linus Torvalds.) Whatever distribution of Linux you are running, the package you are going to need will most likely be called “git” or “git-core”. If you are using Windows or Mac, or would just like to build Git from source, see www.git-scm.com for download and installation options.

Creating a Repository

Once you have Git installed you’ll want to add some information about yourself. This will be helpful in identifying who has committed what code when looking at the Git log for history information.

sean@beerhaus:~$ git config --global user.name "FirstName LastName"
sean@beerhaus:~$ git config --global user.email "your@email"

Now it’s time to create a repository. There are two ways to do this, but essentially one command. If you are starting a new project from scratch and you would like Git to look after it during its whole life span, you can create your directory that will contain all project files and initialize the directory for use with Git:

sean@beerhaus:~/$ mkdir new_project
sean@beerhaus:~/$ cd new_project/
sean@beerhaus:~/new_project$ git init
Initialized empty Git repository in /home/sean/new_project/.git/

You now have an empty Git repository ready for use. If you run a listing of all hidden files in your new directory, you will notice the hidden directory .git was created:

sean@beerhaus:~/new_project$ ls -al
total 12
drwxr-xr-x 3 sean sean 4096 2011-11-19 16:09 .
drwxr-xr-x 5 sean sean 4096 2011-11-19 16:09 ..
drwxr-xr-x 7 sean sean 4096 2011-11-19 16:09 .git

The .git directory is where Git stores all the information and history it needs to track your repository, and so it’s best to leave it alone.

git init will also work if you have a directory of existing source code. Simply enter your directory and run git init as I did in this example.

Adding/Staging Files

You should still be in your new_project directory that you’ve just initialized for Git. In version control terminology, this is now called a working directory, meaning it is a directory that is under the watch of some version control system.

Let’s create some files for Git to track, starting with a configuration file. Create the file config.php with the following contents in your editor of choice:

<?php
$database = array(
    "driver"   => "mysql",
    "host"     => "locahost",
    "username" => "user",
    "password" => "pass",
    "database" => "new_project");

Once you have saved the file as config.php we can run another Git command, git status.

sean@beerhaus:~/new_project$ git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	config.php
nothing added to commit but untracked files present (use "git add" to track)

The default branch in Git is known as master. This is the same idea as “trunk” in other version control systems. A branch is simply a line of development that Git is tracking.

Untracked files are files that exist in your working directory, but are not yet tracked by Git because you have not yet told Git to track them.

The output of git status also provides helpful messages, like the message one line down:

(use "git add <file>..." to include in what will be committed)

Go ahead and add congif.php file to Git’s list of tracked files.

sean@beerhaus:~/new_project$ git add config.php

Now if you run git status again you’ll see what has changed.

sean@beerhaus:~/new_project$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#	new file:   config.php
#

config.php is now being tracked by Git, as you can see by the change from “untracked file” to “new file”.

With Git, adding files in this fashion is known as staging. The staging area is the step after a file has been added but before it has been committed. The term adding does not have to refer to a new file. It can also mean a set of changes made to an existing file that will be included in the next commit. The staging area allows us to pick and choose which files to include in a commit before the commit takes place.

Committing Files

Now you can commit the file. Committing a file is how you tell Git to record a “snapshot” of the current state of your project. It is important that with each commit you provide a commit message. The message is helpful to remind yourself (or others) later what the purpose of each change was. You should keep it short but informative.

The quickest way to provide a commit message is to use the -m option with the git commit command. If you leave the -m option out of the command, an editor window will pop up and you can type your commit message there.

To configure which editor will be used, you can run git config --global core.editor vim. In my case I use vim, but emacs, pico, or any other editor should do.

sean@beerhaus:~/new_project$ git commit config.php -m "Initial commit. Added a configuration file."
[master (root-commit) c1d55de] Initial commit. Added a configuration file.
 1 files changed, 7 insertions(+), 0 deletions(-)
 create mode 100644 config.php

Now if you run git status again you’ll get a message that tells you there have been no new changes since the last commit.

sean@beerhaus:~/new_project$ git status
# On branch master
nothing to commit (working directory clean)

Viewing Project History

The git log command is a very helpful tool for viewing the timeline of your project. You can see each commit in detail, including the date and and who made the commit.

sean@beerhaus:~/new_project$ git log
commit c1d55debc7be8f50e363df462f84672ad029b703
Author: FirstName LastName <your@email>
Date:   Sat Nov 19 16:45:35 2011 -0400

    Initial commit. Added a configuration file.

There are a few things to note here:

The name and email address is the information you provided with git config.
The commit message is the message you provided in the first commit.
The word “commit” is followed by a 40-digit hex string.

The 40-character string of hexadecimal numbers is known as commit hash, in reference to the SHA1 cryptographic hash function used to generate it. All information related to a commit is stored in a commit object. The commit hash is the output of the hash function applied to the contents of the object, and is then used as a unique reference to that particular commit.

If you wanted to look up the log entry for a single commit, you can provide the commit hash:

sean@beerhaus:~/new_project$ git log c1d55debc7be8f50e363df462f84672ad029b703

To see which files were involved in a commit, you can use the --stat option:

sean@beerhaus:~/new_project$ git log --stat c1d55debc7be8f50e363df462f84672ad029b703

The log message now contains:

 config.php |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

Viewing Commit Diffs

Now lets makes some changes to config.php in order to see what git diff has to offer. Start with a simple change; change the database username to “Foo”. When you save the file and run git status, this time you’ll see config.php is now in the modified state:

sean@beerhaus:~/new_project$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   config.php
#
no changes added to commit (use "git add" and/or "git commit -a")

To see what has been modified, use git diff.

sean@beerhaus:~/new_project$ git diff
diff --git a/config.php b/config.php
index ab6e11d..02d30f8 100644
--- a/config.php
+++ b/config.php
@@ -2,6 +2,6 @@
 $database = array(
     "driver"   => "mysql",
     "host"     => "locahost",
-    "username" => "user",
+    "username" => "Foo",
     "password" => "pass",
     "database" => "new_project");

Your changes are pretty easy to spot. The leading minus tells you the line has been removed, and the leading plus tells you the line has been added.

The command git diff by default will compare the current state of your project with the previous commit. You can pass arguments to git diff command, such as a commit hash or the special pointer HEAD, which refers to the most recent commit of your current project.

Go ahead and commit the changed file so you can compare some commit hashes.

sean@beerhaus:~/new_project$ git commit -am "Changed database username to Foo."
[master e69a7f9] Changed database username to Foo.
 1 files changed, 1 insertions(+), 1 deletions(-)

When you run git log this time, you’ll see a timeline of changes building up.

sean@beerhaus:~/new_project$ git log
commit e69a7f9b3b55c116a5c2edf730cd03df7e093eda
Author: FirstName LastName <your@email>
Date:   Sun Nov 20 14:16:43 2011 -0400

    Changed database username to Foo.

commit c1d55debc7be8f50e363df462f84672ad029b703
Author: FirstName LastName <your@email>
Date:   Sat Nov 19 16:45:35 2011 -0400

    Initial commit. Added a configuration file.

When you run git diff using the two commit hashes, the output is identical to the diff you looked at before the last commit.

sean@beerhaus:~/new_project$ git diff c1d55debc7be8f50e363df462f84672ad029b703 e69a7f9b3b55c116a5c2edf730cd03df7e093eda
diff --git a/config.php b/config.php
index ab6e11d..02d30f8 100644
--- a/config.php
+++ b/config.php
@@ -2,6 +2,6 @@
 $database = array(
     "driver"   => "mysql",
     "host"     => "locahost",
-    "username" => "user",
+    "username" => "Foo",
     "password" => "pass",
     "database" => "new_project");

Try it again using the --color option. This makes changes a little more easier to see.

Summary

After getting used to Git and its cycle of file edits, git add, and git commit, you will never look back to the days of using file backups or your favorite editor’s undo feature as a substitute for proper version control. In this article I showed you the essential commands you’ll need to get started with Git. Other topics including branching, merging, and remote repositories will follow.