# Content Addressable System
Marfeel uses a git based flow to deploy files. Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it. This means that at the core, Git is a simple key-value data store. You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
Git uses merkle trees (opens new window) as their fundamental underlying data structure. Essentially a merkle tree is a tree where each node is labeled with the cryptographic hash (SHA-1) value of their contents, which includes the labels of its children.
Using content addressable references (SHA-1 of the content) rather than file names as identifiers gives strong guarantees that we serve the same content regardless the file name.
That also means that each deploy is immutable. We always serve the contents of the same tree under a domain. When we finish processing new deploys, we only swap the tree to serve. Having immutable trees also prevents us from showing mixed content (serving different files from different branches).
No changes go live on your site’s public URL before all changes have been uploaded. Once all the changes are ready, the new version of the site immediately goes live on the CDN.
This means deploys are atomic, and your site is never in an inconsistent state while you’re uploading a new deploy.
With FTP or S3 uploads, each file is just pushed live one after the other, so you can easily get into situations where a new HTML page is live before the supporting assets (images, scripts, CSS) have been uploaded. And if your connection cuts out in the middle of an upload, your site could get stuck in a broken state for a long time.
Atomic deploys guarantee that your site is always consistent.
To fully understand the internal mechanics of a content addressable repository let’s go through the process of creating a simple html page with an associated CSS and JS file.
foo@bar:~/Developer$ mkdir contentAddressable
foo@bar:~/Developer$ cd contentAddressable
foo@bar:~/Developer/contentAddressable$ git init
Initialized empty Git repository in /Developer/contentAddressable/.git/
Add the following files index.jsp
:
<!doctype html>
<head>
<title>The HTML5 Herald</title>
<link rel="stylesheet" href="css/main.css">
<script src="js/main.js"></script>
</head>
<body>
<div class="container"></div>
</body>
</html>
Source css/main.css
:
.container {
border: 1px solid black;
background: #999;
}
Source scripts/main.js
:
document.querySelector(".container").style.border = "3px solid red"
Commit the files to git:
foo@bar:~/Developer/contentAddressable|master$ git status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
index.jsp
scripts/
styles/
nothing added to commit but untracked files present (use "git add" to track)
foo@bar:~/Developer/contentAddressable|master$ git add .
foo@bar:~/Developer/contentAddressable|master$ git commit -m "First Commit"
[master (root-commit) c3694c6] First Commit
3 files changed, 17 insertions(+)
create mode 100644 index.jsp
create mode 100644 scripts/main.js
create mode 100644 styles/main.css
Git internally stores these files under the folder .git/objects
using the SHA-1 of the content as their file name:
foo@bar:~/Developer/contentAddressable|master$ tree .git/objects
.git/objects
├── 22
│ └── 2c559fe2643740b3220215d9b97f369870cb13
├── 72
│ └── 1203cff658f05463d577a66cff6627b0071221
├── 76
│ └── 5eba6cdb7cd4f9cb9b63deb71a2762056695e2
├── 92
│ └── eedb5da79a153ff2b3685ceb2e67c2b5e2718d
├── c3
│ └── 694c6459957d559142755096641706c1ee2ee1
├── c7
│ └── 860dc9e8f829432b95a77ef65eee3bc56730e6
├── eb
│ └── bb1a3d69e040f7ecffc8fbedf9756cf9ae0390
├── info
└── pack
You can inspect the content of any of these files using git cat-file -p
. Bear in mind as part of the hash you need to include the 2 characters of the parent folder :
foo@bar:~/Developer/contentAddressable|master$ git cat-file -p 765eba6cdb7cd4f9cb9b63deb71a2762056695e2
<!doctype html>
<head>
<title>The HTML5 Herald</title>
<link rel="stylesheet" href="css/styles.css">
<script src="js/scripts.js"></script>
</head>
<body>
<div class="container"></div>
</body>
</html>
Similarly you could run git show 765eba6cdb7cd4f9cb9b63deb71a2762056695e2
Let’s now do some simple changes on the code and rename class=“container”
to class=“c1”
This means we have to change both the html (markup) and the js file (queryselector expression).
foo@bar:~/Developer/contentAddressable|master$ git branch feature1
foo@bar:~/Developer/contentAddressable|master$ git checkout feature1
foo@bar:~/Developer/contentAddressable|feature1$ vi index.jsp
foo@bar:~/Developer/contentAddressable|feature1$ vi scripts/main.js
foo@bar:~/Developer/contentAddressable|feature14$ git commit -a -m "Rename container selector to c1"
[master d4ecea6] Rename container selector to c1
2 files changed, 2 insertions(+), 2 deletions(-)
At this point we have 2 branches:
- Master: contains the old
class=“container”
- Feature1: contains the new
class=“c1”
with changes in main.js and index.jsp
We can assess this running git ls-tree -r
on both branches:
foo@bar:~/Developer/contentAddressable|feature1$ git ls-tree -r master
100644 blob 765eba6cdb7cd4f9cb9b63deb71a2762056695e2 index.jsp
100644 blob 92eedb5da79a153ff2b3685ceb2e67c2b5e2718d scripts/main.js
100644 blob c7860dc9e8f829432b95a77ef65eee3bc56730e6 styles/main.css
foo@bar:~/Developer/contentAddressable|feature1$ git ls-tree -r feature1
100644 blob 72378aac61444489ca5d48611322f5d4f511ea7d index.jsp
100644 blob 6165b405f423d195d5da81fa8661fe6bcef2b195 scripts/main.js
100644 blob c7860dc9e8f829432b95a77ef65eee3bc56730e6 styles/main.css
As expected the SHA-1 of index.jsp
and main.js
on the two branches is different, while the one for main.css
remains is the same (thus the Object is reused).
git ls-tree
allows you to get the matching list of mappings across hashes and file names and paths. You can also use git ls-files —staged
For any given file on the git object store you can get its SHA-1 running:
foo@bar:~/Developer/contentAddressable|feature1$ git hash-object -w index.jsp
72378aac61444489ca5d48611322f5d4f511ea7d