How I built a version control system (VCS) using pure Go π
Every single artifact related to the creation of your software should be under version control. [1]
A VCS is a system that tracks revisions (versions) of files over time. [2]
Demo
https://asciinema.org/a/487303
Source Code
https://github.com/Abdulsametileri/vX
Motivation
When I read a beautiful book [3] to understand event-driven systems and the idea of event sourcing better, I saw a very good example
βAs an analogy, imagine you are building a version control system like SVN or Git. When a user commits a file for the first time, the system saves the whole file to disk. Subsequent commits, reflecting changes to that file, might save only the delta β that is, just the lines that were added, changed, or removed. Then, when the user checks out a certain version, the system opens the version-0 file and applies all subsequent deltas, in order, to derive the version the user asked for.β [3]
I just wanted to implement a VCS system with this strategy. So I started an experimental hobby project called vX. All of this is just three days of my hard work. π
Architecture
First of all, all related files are within the .vx
folder. (The Go tool ignores any directories or files which have names that begin with an β_β or β.β)
.vx
βββ checkout
β βββ v1
β βββ v2
βββ commit
β βββ v1
β βββ v2
βββ staging-area.txt
βββ status.txt
v1, v2, .., vN
are commit versions. I will say more detail in the next section within these folders.
checkout
is a folder that includes the result of merging all files with the specified commit version. For example, checkout/v2
includes a combination of commit/v1
+ commit/v2
.
It's reasonable to create a checkout directory because βa good practice in the UNIX world is to deploy each version of the application into a new directory and have a symbolic link that points to the current version [1]β. Currently, I did not implement this behavior but it's ready.
staging area
is files that are going to be a part of the next commit. In this context, itβs just a basic text file with append-only mode. Because we do some updates after creation. For example, a part of the content of this file is like its formatted as file path | file modification time | File Status
"testdata/status.txt|2022-04-14 05:42:15|Created",
"testdata/z.go|2022-04-14 05:11:04|Created",
"README.md|2022-04-14 05:42:11|Created",
"testdata/a1.txt|2022-04-13 06:58:03|Created",
"README.md|2022-04-14 05:49:09|Updated",
For example, README.md
a kind of file that is added twice with different modification times and statuses before the commit operation. So the latest state of this file is Updated
at 2022-04-14 05:49:09
. This is very similar to the idea of event sourcing; that is, representing the changes to a database as a log of immutable.[4].
status
is a text file that keeps track of all files persistently. I clear the contents of staging area
after the successful commit. So I need to keep the files under the version control system persistently.
Project structure
I followed the project structure that was recommended for any Cobra-based application.
I used testdata
directory. (The Go tool ignores any directories called testdata
these scripts will be ignored when compiling your application.)
Not supported actions
Currently, I don't know how to detect the deleted files so I am just tracking created and modified files. This status is based on the file modification time provided by the File system.
Currently, in order to provide checkout functionality, it's a really hard job to implement storing only changes and merging them if needed so I delay this task to another release. After some research, I found rsync for this job. Because of this, at every commit operation, I saved files at the staging area as a whole.
Commands
init
: creates directories and files.
status
: reads staging area text files and parses them in appropriate struct and uses tablewriter to show results. If you look carefully, I created functions with io.Writer
interface. At unit tests, I pass bytes.Buffer
and assert easily. I recommend reading this great article about interfaces in Go.
history
: show all commits. In order to implement this functionality, I keep a metadata.txt
file in every commit directory. In this directory, I store commit messages and time separated with |
.
.
βββ v1
β βββ ..
β βββ metadata.txt
βββ v2
βββ ...
βββ metadata.txt
add
: adds the specified files and directories to status.txt
and staging-area.txt
. As previously mentioned, in order to show updated status for some files I keep the latest state of files status.txt
so I truncate and write fresh data every time. staging-area.txt
is an append-only data so no need to do any operation, just append new data. Duplicate data no problem. After the successful commit, I calculate the latest state.
commit
: reads staging-area.txt
file, copies with specific commit directory (v1, v2), and after the operation finishes truncate staging-area.txt
.
For example, letβs suppose in the v1 commit, user-added README.md
testdata/
and in the v2 commit, user-added Makefile
. So, the commit folders will look like this
βββ commit
β βββ v1
β β βββ README.md
β β βββ metadata.txt
β β βββ testdata
β β βββ example
β β βββ a1.txt
β β βββ a2.txt
β β βββ example.go
β β βββ src
β β β βββ hello.js
β β βββ z.go
β βββ v2
β βββ Makefile
β βββ metadata.txt
checkout
: rsync from commit/ to checkout/ directory with specific commit id. rsync also merges two same files for us.
βββ checkout
β βββ v1
β β βββ README.md
β β βββ testdata
β β βββ example
β β βββ a1.txt
β β βββ a2.txt
β β βββ example.go
β β βββ src
β β β βββ hello.js
β β βββ z.go
Source Code
https://github.com/Abdulsametileri/vX
References
[1] Continuous Integration: Improving Software Quality and Reducing Risk by Andrew Glover, Paul Duvall, and Steve Matyas
[2] Software Engineering at Google Lessons Learned from Programming Over Time by Titus Winters, Tom Manshreck, Hyrum Wright
[3] Designing Event-Driven Systems by Ben Stopford
[4] Making Sense of Stream Processing by Martin Kleppmann