Version Control System

Exploring the Depths of Git's Hidden Gems

·

7 min read

You've probably heard of Git, the popular version control system developers use worldwide. But have you ever wondered what it actually is, and why it's so important in modern software development? In this article, we'll take a closer look at version control systems in general and Git in particular, exploring why they are essential tools for any developer looking to work efficiently, collaborate effectively, and keep track of changes to their codebase over time

WHY VERSION CONTROL?

Version control systems (VCSs) are powerful tools designed to monitor changes to source code, folders, and other collections of files. They maintain a comprehensive history of these changes, facilitating collaboration and enabling users to:

  • View previous versions of a project

  • Track the reasons behind certain changes

  • Work on parallel development branches

  • Share code changes more easily

  • Resolve conflicts that may arise during concurrent development

VCSs also store metadata such as:

  • The creator of each snapshot

  • Associated messages

  • Other relevant information

Whether working solo or in a team, VCSs are an essential tool for managing projects effectively and ensuring that changes are tracked and recorded accurately.

Although there are other version control systems (VCSs) available, Git has become the most widely used and popular tool among developers. Its dominance is so significant that an XKCD comic was created to humorously reflect Git's reputation.

Git's interface can be confusing and difficult to learn, and simply memorizing a few commands can lead to errors and frustration. However, Git's underlying design and concepts are elegant and can be better understood through a bottom-up approach.

Git Data Model

Git follows a tree data structure approach to track changes to source code and other files. Specifically, Git uses a directed acyclic graph (DAG) to represent the history of a project. Each commit in the Git history is represented as a node in the graph, and the edges between nodes represent the parent-child relationships between commits.

Don't worry if you come across technical terms like "DAG" and "tree" while learning Git. These concepts may seem intimidating at first, but they're simply ways of describing the data structures that Git uses to manage changes to your project.

       o---o---o  master
      /         \
 o---o---o---o---o  develop
  • Git's tree structure is like a family tree for your project's files.

  • The root directory sits at the top, with all other directories and files contained within it.

  • Each directory can contain one or more subdirectories or files, forming a branching structure like a tree.

  • Git stores the contents of each directory and file as a snapshot.

  • Each snapshot is linked to its parent snapshot, allowing Git to track changes to individual files and directories over time while still keeping a complete history of the entire project.

  • Another approach of this structure is listed below.

GIT Repository Structure

<root> (main folder)
|
+- my_folder (folder)
|  |
|  + my_file.txt (file, contents = "Hello, world!")
|
+- README.md (file, contents = "This is my project's README file")

In the Git tree data structure shown above,

  • <root> represents the main folder in Git's tree data structure.

  • my_folder is a subfolder inside the root folder.

  • my_file.txt is a file located inside "my_folder".

  • README.md is a file located in the root folder, containing the project's README information.

Snapshots & Commits

When it comes to version control systems, there's a lot going on behind the scenes. One of the ways that Git manages its version history is by creating a directed acyclic graph (DAG) of snapshots. This might sound like a complicated idea, but it's actually pretty straightforward.

In a linear history, snapshots are simply listed in chronological order. But Git's approach is more sophisticated: each snapshot (which Git calls a "commit") has a set of parents, the snapshots that came before it. This set of parents can include multiple commits, as a result of merging different branches of development.

To visualize this, imagine a series of commits arranged in a graph-like structure, with arrows pointing from each commit to its parents. This might look something like this:

o <-- o <-- o <-- o
            ^
             \
              --- o <-- o

Each "o" represents a commit, and the arrows point to the commit's parent or parents. As development progresses, history can branch out into separate lines of development. In the future, these branches may be merged together to create a new commit that incorporates both sets of changes.


o <-- o <-- o <-- o <---- o
            ^            /
             \          v
              --- o <-- o

It's worth noting that commits in Git are immutable. This means that any changes to the commit history are actually creating entirely new commits, rather than modifying existing ones. Nevertheless, Git provides ways to manage these changes and keep track of references to different versions of the code.

Git Objects and Content-Addressing

  • Git uses objects to represent data in its repository, and every object has a unique content-based address called SHA-1 hash.

  • Objects in Git are blobs, trees, or commits, and they reference other objects by their hash rather than containing them in their on-disk representation.

  • Git uses references to provide human-readable names for SHA-1 hashes, which are pointers to specific commits, and the mutable nature of references allows them to be updated to point to a new commit.

Repository

Now you know a Git repository is nothing but data objects and references, which are stored on disk. Git commands manipulate the commit DAG by adding objects and updating references. When using Git commands, it's important to understand what manipulation the command is making to the underlying graph data structure.

Staging Area

The staging area, also known as the index, is a concept in Git that allows you to specify which modifications should be included in the next snapshot or commit. Unlike some version control tools that create a new snapshot based on the current state of the working directory, Git's staging area lets you create clean snapshots and selectively choose modifications to include in a commit. This is helpful when you want to create separate commits for different features or when you want to commit a bugfix while discarding debugging print statements.

Git Providers

Git is not GitHub. GitHub has a specific way of contributing code to other projects called pull requests. GitHub is not special: there are many Git repository hosts, like

  • GitLab: It is another popular Git hosting service that offers both free and paid options for public and private repositories. GitLab also provides continuous integration and deployment (CI/CD) pipelines, issue tracking, and other features.

  • Bitbucket: It is a Git hosting service owned by Atlassian that offers free private repositories for small teams and paid options for larger teams. It also includes issue tracking and integrates with other Atlassian products such as JIRA.

  • SourceForge: It is a free Git hosting service that provides unlimited public and private repositories. It also offers features such as issue tracking, project

Resources

If you have reached this point and have honestly read every part of this blog.

Give Yourself a Pat on back , keep Learning and Stay tuned for more such amazing blogs🚀🙌

Connect with Me

If you enjoyed this post and would like to stay updated on my work, feel free to connect with me on social media

  • Linkedin: Click here

  • Twitter: Click here

I'm always open to new connections and networking opportunities, so don't hesitate to reach out and say hello. Thank you for reading!

Did you find this article valuable?

Support Vaibhav by becoming a sponsor. Any amount is appreciated!

Â