Article Image
Article Image
read

I have been working on a personal project that incorporates Git and managed to figure out a few issues that you might have to face as well if you’re thinking about building your own Git server. So the problems I had to deal with are:

  • Building a basic, working Git HTTP server
  • Intercepting commits to work with other parts of the project(i.e. storing commits into the database, etc)
  • Security

Building a Git HTTP server that just works

As for the first bullet, there’s a pretty simple and well-known solution: invoking the git-*-pack binaries such as git-receive-pack, git-upload-pack. Here’s the code:

#!/usr/bin/env python

from flask import Flask, make_response, request, abort
import subprocess, os.path

app = Flask(__name__)

@app.route('/<string:project_name>/info/refs')
def info_refs(project_name):
    service = request.args.get('service')
    if service[:4] != 'git-':
        abort(500)
    p = subprocess.Popen([service, '--stateless-rpc', '--advertise-refs', os.path.join('.', project_name)], stdout=subprocess.PIPE)
    packet = '# service=%s\n' % service
    length = len(packet) + 4
    _hex = '0123456789abcdef'
    prefix = ''
    prefix += _hex[length >> 12 & 0xf]
    prefix += _hex[length >> 8  & 0xf]
    prefix += _hex[length >> 4 & 0xf]
    prefix += _hex[length & 0xf]
    data = prefix + packet + '0000'
    data += p.stdout.read()
    res = make_response(data)
    res.headers['Expires'] = 'Fri, 01 Jan 1980 00:00:00 GMT'
    res.headers['Pragma'] = 'no-cache'
    res.headers['Cache-Control'] = 'no-cache, max-age=0, must-revalidate'
    res.headers['Content-Type'] = 'application/x-%s-advertisement' % service
    p.wait()
    return res

@app.route('/<string:project_name>/git-receive-pack', methods=('POST',))
def git_receive_pack(project_name):
    p = subprocess.Popen(['git-receive-pack', '--stateless-rpc', os.path.join('.', project_name)], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    p.stdin.write(request.data)
    data = p.stdout.read()
    res = make_response(data)
    res.headers['Expires'] = 'Fri, 01 Jan 1980 00:00:00 GMT'
    res.headers['Pragma'] = 'no-cache'
    res.headers['Cache-Control'] = 'no-cache, max-age=0, must-revalidate'
    res.headers['Content-Type'] = 'application/x-git-receive-pack-result'
    p.wait()
    return res

@app.route('/<string:project_name>/git-upload-pack', methods=('POST',))
def git_upload_pack(project_name):
    p = subprocess.Popen(['git-upload-pack', '--stateless-rpc', os.path.join('.', project_name)], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    p.stdin.write(request.data)
    data = p.stdout.read()
    res = make_response(data)
    res.headers['Expires'] = 'Fri, 01 Jan 1980 00:00:00 GMT'
    res.headers['Pragma'] = 'no-cache'
    res.headers['Cache-Control'] = 'no-cache, max-age=0, must-revalidate'
    res.headers['Content-Type'] = 'application/x-git-upload-pack-result'
    p.wait()
    return res

if __name__ == '__main__':
    app.run()

Although I coded it with Flask to keep everything simple, you can easily make this code suitable in other frameworks such as Django, since the code is not that framework-dependent. As you can see above, all we had to implement was three endpoints: repo/info/refs, repo/git-receive-pack, repo/git-upload-pack. The code above is almost a literal translation of the Node.js codes on this webpage1.

Now, let’s try to run it. Type the following lines in your terminal:

$ mkdir test
$ git init --bare test
$ python git-server.py

Try to clone http://localhost:5000/test and see what you get. Yes– simple as that.

Intercepting Git commits to store into the database

Although I did this to store commits into the database, this approach can be used in various applications(an IRC bot, CI, etc.). First of all, to understand how it’s gonna be done, let’s look into the data they have been sending each other. (between Git client, git-*-pack binaries)

POST /test/git-receive-pack HTTP/1.1
User-Agent: git/1.9.3 (Apple Git-50)
Host: localhost:5000
Accept-Encoding: gzip
Content-Type: application/x-git-receive-pack-request
Accept: application/x-git-receive-pack-result
Content-Length: 618

x0a30000000000000000000000000000000000000000 87b980a711c0e9fa74535dffab7b6453ac723d21 refs/heads/master report-status side-band-64k agent=git/1.9.3.(Apple.Git-50)0000PACKxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx....

This is the HTTP packet when you push to the remote repository. What we have to look into here is the data after the ‘PACK’. The PACK2 data has the Git objects you’re trying to apply to the current repository.

In the Node.js world, it wasn’t too hard to find a simple solution to decode a PACK file.

var list = require('git-list-pack');
var Buffer = require('Buffer').Buffer
var commit = require('git-object-commit')
var fs = require('fs');


fs.createReadStream('./pack-file.pack').pipe(list())
.on('data', function(obj){
    console.log(commit.read(obj.data));
});

But in the Python world, it was pretty hard to find equivalent ones like git-list-pack, git-object-commit, etc. So I decided to hack into Dulwich3 to use its internal undocumented functions4 to decode a PACK file. As a result of hacking, I figured out which function I should use: dulwich.pack.PackStream.read_objects. So here’s the sample code I managed to write:

import dulwich.pack
import io

a = dulwich.pack.PackStreamReader(io.open('./pack-file.pack', 'rb').read).read_objects()

for obj in a:
  print obj

Let’s integrate this code with the original Flask project.

from StringIO import StringIO
from dulwich.pack import PackStreamReader

@app.route('/<string:project_name>/git-receive-pack', methods=('POST',))
def git_receive_pack(project_name):
    p = subprocess.Popen(['git-receive-pack', '--stateless-rpc', os.path.join('.', project_name)], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    data_in = request.data
    pack_file = data_in[data_in.index('PACK'):]
    objects = PackStreamReader(StringIO(pack_file).read)
    for obj in objects.read_objects():
        if obj.obj_type_num == 1: # Commit
            print obj
    p.stdin.write(data_in)
    data_out = p.stdout.read()
    res = make_response(data_out)
    res.headers['Expires'] = 'Fri, 01 Jan 1980 00:00:00 GMT'
    res.headers['Pragma'] = 'no-cache'
    res.headers['Cache-Control'] = 'no-cache, max-age=0, must-revalidate'
    res.headers['Content-Type'] = 'application/x-git-receive-pack-result'
    p.wait()
    return res

Run the code and try to push any commits. Here’s what you get:

UnpackedObject(offset=12L, _sha=None, obj_type_num=1, obj_chunks=['tree dc9bcf301e0cb555c62efb0685ff47520085800d\nparent 57fb4bf49aa08c95ace708d30fd2484d8ee92077\nauthor Stewart Park <stewartpark92> 1420243630 -0800\ncommitter Stewart Park <stewartpark92> 1420243630 -0800\n\nThis is a test commit.\n'], pack_type_num=1, delta_base=None, comp_chunks=None, decomp_chunks=['tree dc9bcf301e0cb555c62efb0685ff47520085800d\nparent 57fb4bf49aa08c95ace708d30fd2484d8ee92077\nauthor Stewart Park <[email protected]> 1420243630 -0800\ncommitter Stewart Park <[email protected]> 1420243630 -0800\n\nThis is a test commit.\n'], decomp_len=247, crc32=None)
127.0.0.1 - - [02/Jan/2015 16:07:57] "POST /test/git-receive-pack HTTP/1.1" 200 -

With the objects decoded, you can do whatever you like to do with Git. Hooray!

Security

I would like to briefly mention this as well, because it can be easily done with HTTP Authentication(BasicAuth, DigestAuth).

All we have to do is add a decorator on every endpoint.

from flask.ext.httpauth import HTTPBasicAuth

auth = HTTPBasicAuth()

users = {
    "john": "hello",
    "susan": "bye"
}

@auth.get_password
def get_pw(username):
    if username in users:
        return users.get(username)
    return None

@app.route('/<string:project_name>/info/refs')
@auth.login_required
def info_refs(project_name):
    # ditto

If you’re building a commercial product, please note that you must use HTTPS, since HTTP Authentication is very vulnerable to packet sniffing.

Here’s the full source code of the sample Git HTTP server I used in this post: https://gist.github.com/stewartpark/1b079dc0481c6213def9

  1. For those who are looking for a general explanation of Git HTTP server, you might as well take a look at it. Link

  2. Git pack file format

  3. Pure Python implementation of the Git file formats and protocols

  4. I thought the functions I used are undocumented since it’s not introduced on the official, latest documentation but eventually I found this precious, hard-to-google documentation.

Blog Logo

Stewart Park

A problem solver with the mind of an engineer


Published

Image

Stewart Park

Entrepreneurial Software Engineer

Back to home