Introduction to git version control system

Sarath Pillai's picture
introduction to git version control

Hi all...this is the first post in the complete documentation of GIT version control system. This post will only be concentrating on the introductory part. We will be covering the following things in this post.

  1. What is VCS or Version Control System?
  2. What is GIT?
  3. How is GIT different from other version control system?

Lets understand What is Version control System:

Version Control System is a software package that allows you to control the versions of a code,document,project or anything. In other words changes in documents,programes,codes can be managed through a version control system. Version control system's primary job is to record changes in documents,over time, and will allow you to switch between different versions.

A version Control System allows you to do the following things.

  • Revert things to previous version
  • Restore some project entirely to a previous state
  • Analyse changes over different versions
  • Find out who did some specific change

In short we can say that, if you screw things up, while working in a version controlled environment, you can easily revert back.

Traditionally people used to store the entire project in another directory, probably a named directory with timstamp,so that they can revert back if something happens. but this approch is very error prone,cause somebody can sit in the wrong directory,and by mistake write to the wrong directory.

An old version control system that used a simple database to maintain, the versions was rcs, which is still used by people in MAC OSx. This rcs took an approch of keeping differences between files from one edit to another in the disk.

Now what if developers wants to work together by collaboration,with another developers on another systems. To solve this issue, they came up with the idea of a central server having all the data, and clients used to checkout files from that central server. Popular version control systems like VCS,Suversion etc followed this kind of a model for version controlling.

This cenrtal server model have some advantages and also some major disadvantages.

Advantages of Centralized Server Version Control System:

  • Developers know what other developers are doing
  • Administrators have more control,for who can do what
  • Easier Administration

Disadvantages of Centralized Version Control Systems:

  • Single failure in the central server can halt the developers for sometime from collaborating
  • A disk failure in the Central Server, will cause data loss(if no proper backups are taken)

 

Now Lets take an another approch in version control systems, ie. Distributed Version Control Systems.

In a DVCS(Distributed Version Control Sytem), Clients not only checkout the latest updates, but mirrors the entire project repository. So even if the server has some issues, the repository from the client can be copied back to the server.

Each and Every checkout in distributed version control system, are full backup of the entire data.

And you can also collaborate with different group of people in various ways, at the same time, in the same project.

Git,Bazaar, Darcs uses Distributed Version Control System.

History Of Git

During 2002, the developement of the linux kernel started using the proprietry Distributed Version Control System called as Bitkeeper. In 2005, due to some issues between the linux developement team and bitkeeper team, Linux Torvalds, the developer of the kernel, ended up developing another distributed version control system called GIT for the kernel developement.

Some noticable plus points about Git is that

  • Very Fast
  • can handle very large projects
  • branching system

The major difference between git and other distributed version control systems is the way it handles the data.Most of the Version control systems store all changes to a file. But git takes a different approch when it comes to handling data. GIT thinks of data as a snapshot of a small filesystem. Whenever some changes are made in the data,git stores the state of all the files at that point of time,and adds a reference to that state.

And if the file have not changed after the last edit, then git does not store the file again,but gives a link to the previous state.

Git's speed of operation is due to the fact that, the entire repository is local(as mentioned before, every machine working on a project will have the entire data repository). So for example inorder to check the history of some project, there is no need to connect to a remote server.

In version control systems like subversion and CVS, although you can edit files, but you will not be able to commit your changes until and unless you have connectivity to the central server.

GIT uses SHA-1 Hash

Every file on the repository is referenced by the checksum, GIT uses sha-1 algorithm for checksum. So data integrity is inbuilt in GIT. Due to the usage of checksum, git can easily detect any data corruption,changes etc.

In fact in GIT's database, data is not stored with file names but SHA-1 hash values.

 

Hope my Introduction to GIT was helpful!! The next post in GIT version control documentation series will be regarding Different states of a file in GIT,Installing GIT,and setting up GIT.

Thank You All..

Rate this article: 
Average: 5 (3 votes)

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Type the characters you see in this picture. (verify using audio)
Type the characters you see in the picture above; if you can't read them, submit the form and a new image will be generated. Not case sensitive.