I have been working on a personal project that incorporates Git and managed to figure out a few issues that you might have to face as well if you’re thinking about building your own Git server. So the problems I had to deal with are:
Building a basic, working Git HTTP server
Intercepting commits to work with other parts of the project(i.e. storing commits into the database, etc)
Building a Git HTTP server that just works
As for the first bullet, there’s a pretty simple and well-known solution: invoking the git-*-pack binaries such as git-receive-pack, git-upload-pack. Here’s the code:
Although I coded it with Flask to keep everything simple, you can easily make this code suitable in other frameworks such as Django, since the code is not that framework-dependent. As you can see above, all we had to implement was three endpoints: repo/info/refs, repo/git-receive-pack, repo/git-upload-pack. The code above is almost a literal translation of the Node.js codes on this webpage1.
Now, let’s try to run it. Type the following lines in your terminal:
Try to clone http://localhost:5000/test and see what you get.
Yes– simple as that.
Intercepting Git commits to store into the database
Although I did this to store commits into the database, this approach can be used in various applications(an IRC bot, CI, etc.). First of all, to understand how it’s gonna be done, let’s look into the data they have been sending each other. (between Git client, git-*-pack binaries)
This is the HTTP packet when you push to the remote repository. What we have to look into here is the data after the ‘PACK’. The PACK2 data has the Git objects you’re trying to apply to the current repository.
In the Node.js world, it wasn’t too hard to find a simple solution to decode a PACK file.
But in the Python world, it was pretty hard to find equivalent ones like git-list-pack, git-object-commit, etc. So I decided to hack into Dulwich3 to use its internal undocumented functions4 to decode a PACK file. As a result of hacking, I figured out which function I should use: dulwich.pack.PackStream.read_objects. So here’s the sample code I managed to write:
Let’s integrate this code with the original Flask project.
Run the code and try to push any commits. Here’s what you get:
With the objects decoded, you can do whatever you like to do with Git. Hooray!
I would like to briefly mention this as well, because it can be easily done with HTTP Authentication(BasicAuth, DigestAuth).
All we have to do is add a decorator on every endpoint.
If you’re building a commercial product, please note that you must use HTTPS, since HTTP Authentication is very vulnerable to packet sniffing.
Here’s the full source code of the sample Git HTTP server I used in this post: https://gist.github.com/stewartpark/1b079dc0481c6213def9
For those who are looking for a general explanation of Git HTTP server, you might as well take a look at it. Link↩