Blog's control panel: | Home | Tags | Index | Rss 2.0

Svn:externals, data segregation for cheap

Mon, 19 Feb 2007 | Permalink | Tags: ,

To simplify things it's common to place different projects under the same repository, and this can also come handy when different projects share a common one. But this is not always convenient, and sometimes even possible. To mitigate an increased management complexity of multiple repositories SVN offers a special property called svn:externals. We can then define that when checking a directory A under the repoX, the directory B from repoY will be pulled in as well.
This site itself can serve as an example: I'm hosting a svn repository for all my stuff, part of which is private. I'm making the whole website available via a svn view, which let you browse the website as a bunch of files. This allow easily download of entire sections or to see past version of documents, included the site itself.
With a svn web frontend pointed against the repository all the content, included the private files, would be visible. And if one argues that some frontends allow to filter out dirs/files, he should be reminded how a bug discovered in the frontend software might allow to bypass those restrictions. Bottom line, data segregation helps; a lot!. In a simple scenario where the private and the public data are already separated, ie, html/ and docs/ , former public and latter private, it'd be easy to setup 2 repos and check them out separately within a "spikelab" directory. Unfortunately that's not the case, private and public files are mixed up and separating them into different directories would disrupt the order and make management harder. But as mentioned earlier it's possible to tell svn that when a specific directory is checked out, content from other repositories should be fetched as well. This way it becomes easy to manage a mixed repo and at the same time achieve maximum security.
Below I explain how I migrated the spikelab.org's repository to something whose working copy looks exactly the same but is actually made of two repositories.
This was the original layout:

`-- trunk
    |-- blog
    |-- cgi-bin
    |-- docs
    |-- flavours
    |   `-- html.flav
    |-- img
    `-- plugins
At the moment of writing I dont feel like publishing flavors, plugins and my cgi-bin (if you are thinking "tell us the truth, simply you are ashamed of the poor code you wrote", I'm afraid you are right :) ).So, I need to split it up and have two repos:

A public one:
`-- trunk
    |-- blog
    |-- docs
    `-- img
And a private one:
`-- trunk
    |-- cgi-bin
    |-- flavours
    |   `-- html.flav
    `-- plugins
I am not going to cover the whole migration process because it's already very well documented here.
Pay attention to the bit saying "Simply give it either a list of paths you wish to keep, or a list of paths you wish to not keep": that means absolute paths starting with the repo name, in my case 'spikelab/trunk/cgi-bin' and so on.
Once that's done you want to checkout the private one and edit the svn:externals properties for the trunk directory (obviously you want to bring the public bits into the private repo rather than the other way around):
[spike@bebop ~/wrk/svn/private/]$ svn propedit svn:externals trunk
That will bring up your default editor and allow you to input dst:src pairs, with dst being the directory where you want to fetch src into. Don't forget to commit afterward, or you'll use your properties. In my scenario, if I use propget to list my settings, I get:
svn propget svn:externals trunk
img     svn+ssh://bebop/var/lib/svn/spikelab.org/slpub/trunk/img/
docs    svn+ssh://bebop/var/lib/svn/spikelab.org/slpub/trunk/docs/
blog    svn+ssh://bebop/var/lib/svn/spikelab.org/slpub/trunk/blog/
Trailing slashes work in a rsync's fashion: with a slash the content gets copied, without the directory is also created. You can't use '.' , '..', '/' to specify a destination, neither filenames are allowed as src, they must be directories.
That's it! It's not perfect, tho, if I want to branch I have to create branches on both repos, and then sync public into private one, but after all it is not much of an hassle considering all the advantages and that I don't branch much in this project.




SpikeLab.org is a Filippo Spike Morelli copyright 2005-2008
This work is licensed under Creative Commons Att-SA License.