How I built a version control system (VCS) using pure Go πŸš€

Abdulsamet Δ°LERΔ°
Level Up Coding
Published in
4 min readApr 14, 2022

--

Every single artifact related to the creation of your software should be under version control. [1]

A VCS is a system that tracks revisions (versions) of files over time. [2]

Demo

https://asciinema.org/a/487303

Source Code

https://github.com/Abdulsametileri/vX

Motivation

When I read a beautiful book [3] to understand event-driven systems and the idea of event sourcing better, I saw a very good example

β€œAs an analogy, imagine you are building a version control system like SVN or Git. When a user commits a file for the first time, the system saves the whole file to disk. Subsequent commits, reflecting changes to that file, might save only the delta β€” that is, just the lines that were added, changed, or removed. Then, when the user checks out a certain version, the system opens the version-0 file and applies all subsequent deltas, in order, to derive the version the user asked for.” [3]

I just wanted to implement a VCS system with this strategy. So I started an experimental hobby project called vX. All of this is just three days of my hard work. πŸ˜…

Architecture

First of all, all related files are within the .vx folder. (The Go tool ignores any directories or files which have names that begin with an β€œ_” or β€œ.”)

.vx
β”œβ”€β”€ checkout
β”‚ β”œβ”€β”€ v1
β”‚ └── v2
β”œβ”€β”€ commit
β”‚ β”œβ”€β”€ v1
β”‚ └── v2
β”œβ”€β”€ staging-area.txt
└── status.txt

v1, v2, .., vN are commit versions. I will say more detail in the next section within these folders.

checkout is a folder that includes the result of merging all files with the specified commit version. For example, checkout/v2 includes a combination of commit/v1 + commit/v2 .

It's reasonable to create a checkout directory because β€œa good practice in the UNIX world is to deploy each version of the application into a new directory and have a symbolic link that points to the current version [1]”. Currently, I did not implement this behavior but it's ready.

staging area is files that are going to be a part of the next commit. In this context, it’s just a basic text file with append-only mode. Because we do some updates after creation. For example, a part of the content of this file is like its formatted as file path | file modification time | File Status

"testdata/status.txt|2022-04-14 05:42:15|Created",
"testdata/z.go|2022-04-14 05:11:04|Created",
"README.md|2022-04-14 05:42:11|Created",
"testdata/a1.txt|2022-04-13 06:58:03|Created",
"README.md|2022-04-14 05:49:09|Updated",

For example, README.md a kind of file that is added twice with different modification times and statuses before the commit operation. So the latest state of this file is Updated at 2022-04-14 05:49:09 . This is very similar to the idea of event sourcing; that is, representing the changes to a database as a log of immutable.[4].

status is a text file that keeps track of all files persistently. I clear the contents of staging area after the successful commit. So I need to keep the files under the version control system persistently.

Project structure

I followed the project structure that was recommended for any Cobra-based application.

I used testdata directory. (The Go tool ignores any directories called testdata these scripts will be ignored when compiling your application.)

Not supported actions

Currently, I don't know how to detect the deleted files so I am just tracking created and modified files. This status is based on the file modification time provided by the File system.

Currently, in order to provide checkout functionality, it's a really hard job to implement storing only changes and merging them if needed so I delay this task to another release. After some research, I found rsync for this job. Because of this, at every commit operation, I saved files at the staging area as a whole.

Commands

init : creates directories and files.

status : reads staging area text files and parses them in appropriate struct and uses tablewriter to show results. If you look carefully, I created functions with io.Writer interface. At unit tests, I pass bytes.Buffer and assert easily. I recommend reading this great article about interfaces in Go.

history : show all commits. In order to implement this functionality, I keep a metadata.txt file in every commit directory. In this directory, I store commit messages and time separated with |.

.
β”œβ”€β”€ v1
β”‚ β”œβ”€β”€ ..
β”‚ β”œβ”€β”€ metadata.txt
└── v2
β”œβ”€β”€ ...
└── metadata.txt

add : adds the specified files and directories to status.txt and staging-area.txt . As previously mentioned, in order to show updated status for some files I keep the latest state of files status.txt so I truncate and write fresh data every time. staging-area.txt is an append-only data so no need to do any operation, just append new data. Duplicate data no problem. After the successful commit, I calculate the latest state.

commit : reads staging-area.txt file, copies with specific commit directory (v1, v2), and after the operation finishes truncate staging-area.txt.

For example, let’s suppose in the v1 commit, user-added README.md testdata/ and in the v2 commit, user-added Makefile. So, the commit folders will look like this

β”œβ”€β”€ commit
β”‚ β”œβ”€β”€ v1
β”‚ β”‚ β”œβ”€β”€ README.md
β”‚ β”‚ β”œβ”€β”€ metadata.txt
β”‚ β”‚ └── testdata
β”‚ β”‚ └── example
β”‚ β”‚ β”œβ”€β”€ a1.txt
β”‚ β”‚ β”œβ”€β”€ a2.txt
β”‚ β”‚ β”œβ”€β”€ example.go
β”‚ β”‚ β”œβ”€β”€ src
β”‚ β”‚ β”‚ └── hello.js
β”‚ β”‚ └── z.go
β”‚ └── v2
β”‚ β”œβ”€β”€ Makefile
β”‚ └── metadata.txt

checkout : rsync from commit/ to checkout/ directory with specific commit id. rsync also merges two same files for us.

β”œβ”€β”€ checkout
β”‚ β”œβ”€β”€ v1
β”‚ β”‚ β”œβ”€β”€ README.md
β”‚ β”‚ └── testdata
β”‚ β”‚ └── example
β”‚ β”‚ β”œβ”€β”€ a1.txt
β”‚ β”‚ β”œβ”€β”€ a2.txt
β”‚ β”‚ β”œβ”€β”€ example.go
β”‚ β”‚ β”œβ”€β”€ src
β”‚ β”‚ β”‚ └── hello.js
β”‚ β”‚ └── z.go

Source Code

https://github.com/Abdulsametileri/vX

References

[1] Continuous Integration: Improving Software Quality and Reducing Risk by Andrew Glover, Paul Duvall, and Steve Matyas

[2] Software Engineering at Google Lessons Learned from Programming Over Time by Titus Winters, Tom Manshreck, Hyrum Wright

[3] Designing Event-Driven Systems by Ben Stopford

[4] Making Sense of Stream Processing by Martin Kleppmann

--

--