5 min read

It’s been a big two weeks in Xgit land!

When I last posted, I had just made the first public release of the from-the-ground-up version of Xgit.

Since then, there have been two significant releases and two others that were twiddling bits trying to get the documentation to a happy place.

In these two releases, I doubled down on a couple of development patterns that I like:

  • Make modules (and files) as small as possible. A module should have one clear purpose and do that one thing well. You’ll find that Xgit already has many small modules and correspondingly many test scripts. Further, when a git command has multiple forms that have essentially incompatible argument sets, I break those into independent modules. You’ll see an example of that later in this article.

  • Make releases as small and as frequent as possible. Each of the two releases I called “significant” introduce one new developer-facing feature to Xgit. (A “developer-facing feature” is a plumbing or porcelain command.) I take that feature and drill down recursively to implement whatever is necessary to make that top-level feature possible. (You’ll also find that I apply the same logic to commits to the master branch.)

So What’s New?

git cat-file

I’ll talk to the first of the releases since it’s the smaller one.

I had previously implemented git hash-object, which wrote a single object (blob, tag, tree, commit) into the loose object store.

In the 0.1.1 release, I implemented the inverse of that: git cat-file, which finds an object and returns its type, size, and content. For now, since Xgit doesn’t understand pack files, it can only read from the loose object store.

This command is implemented as Xgit.Plumbing.CatFile.

In this release, I also changed the pattern of error responses from {:error, "reason"} to {:error, :reason} and added @spec documentation to call out all possible error responses.

git ls-files --stage

As I mentioned in the previous post, I’m reading the git objects chapter of the git internals book, implementing each portion in Elixir as I read it. I’m now working through the portion titled Tree Objects.

After hash-object and cat-file, the next command that is discussed is git update-index --add. I started to implement that, but realized that it involved writing code for both reading and writing the .git/index file (and creating abstractions for both actions at several layers).

In the “keep it simple” spirit, I decided to build only the reading side of it. That command is git ls-files --stage.

Given that ls-files has many distinct flavors and response patterns, depending on how it is invoked, I decided to split this apart more finely than git itself does.

I implemented only the version that lists the contents of the index file, and it is known in Xgit as Xgit.Plumbing.LsFiles.Stage.

This involved creating new abstractions at several levels. I’ll walk through them from top to bottom:

  • Xgit.Plumbing.LsFiles.Stage, which implements the developer-facing API.

  • Xgit.Repository.WorkingTree, which implements the on-disk manifestation of a working tree. A larger concept of working tree (index file and checked-out file content) will evolve here; only the .git/index file support is implemented in this version.

  • Xgit.Repository.WorkingTree.ParseIndexFile, which specifically reads the version of the .git/index file that I have encountered most frequently (version 2).

  • Xgit.Core.DirCache, which provides an abstract concept of a directory cache, independent from any specific file format.

This distinction points up an important piece of the design work inherent in a project like this: drawing abstractions at the right places. It was initially tempting to create a DirCache module that did all the things. Doing so would have been at cross-purposes to a core goal of Xgit: Storage of a git repository need not be file-system based. Doing the extra work of splitting out the file format parsing from the concept of a directory cache better supported that goal.

All of this is now available on Hex and HexDocs as version 0.1.4.

Help Wanted

There are a handful of open issues in GitHub. While I’m always grateful for any attention to any of those issues, many of them are just “to do” list items that I’ve punted on in the interest of getting into deeper and more interesting topics.

There are a few issues that are particularly challenging. Some feedback from people with deeper git or Elixir knowledge would be helpful.

  • Elixir/OTP design advice wantedIssue #88: Xgit.Repository.WorkingTree.dir_cache/1 output is potentially large. This function returns a full copy of the Xgit.Core.DirCache struct, complete with all of its entries. That’s fine for a small repo, but risky for a potentially large repo, where the number of entries is potentially unbounded. I’d love advice on how to pass back a snapshot of the DirCache struct without overwhelming the VM message-passing mechanism.

  • Git internals knowledge wantedIssue #67: There is content at the end of a .git/index file that I don’t understand. I’ve read the git index format specification closely. Every index file I’ve been able to construct comes up as a version 2 file and the index entries match the specification as written. But what follows the entries (where I would expect to find the extensions portion) makes no sense. Where I would expect to to see the first extension signature, there is instead a 32-byte block of data that I can’t comprehend. If someone knows what is actually being written there, I’d love to hear about it. (Update: I’ve since figured this out.)

  • Erlang zlib library knowledge wantedIssue #50: No code coverage for zlib :continue case. The documentation for function :zlib.safeInflate/2 says that the function may return {:continue, data}, but I have not been able to construct any test data for which that occurs. If someone knows how to trigger that … (Update: Got the help I needed. Problem solved. Thank you!)

If you have knowledge that can help – especially on these three issues – please respond in comments here, on Elixir Forum, or in the GitHub issues themselves. Thank you!

What’s Next

Having now taught Xgit how to read index files, I now want to teach it how to write them. So, as I have time coming up, I’ll be working on an implementation of git update-index.

Thank you for taking the time to follow along!