Subversion: Miscellaneous Documentation

Here's an opportunity to play with Subversion in some hands-on examples. The Subversion commands demoed here are just small examples of what Subversion can do; see Chapter 3 in the Subversion Book for full explanations of each.

Make a Repository

The Subversion client has an abstract interface for accessing a repository. Three “Repository Access” (RA) implementations currently exist as libraries. You can see which methods are available to your svn client like so:

$ svn --version
svn, version 1.0.4 (r9844)
   compiled May 23 2004, 14:04:22

Copyright (C) 2000-2004 CollabNet.
Subversion is open source software, see http://subversion.tigris.org/
This product includes software developed by CollabNet (http://www.Collab.Net/).

The following repository access (RA) modules are available:

* ra_dav : Module for accessing a repository via WebDAV (DeltaV) protocol.
  - handles 'http' scheme
  - handles 'https' scheme
* ra_local : Module for accessing a repository on local disk.
  - handles 'file' scheme
* ra_svn : Module for accessing a repository using the svn network protocol.
  - handles 'svn' scheme

If you don't see ra_local, it probably means that Berkeley DB (or relevant database back-end) wasn't found when compiling your client binary. To continue with these examples, you'll need to have ra_local available.

Start by creating a new, empty repository using the svnadmin tool:

$ svnadmin create myrepos

Let's assume you have a directory someproject which contains files that you wish to place under version control:

someproject/foo
            bar
            baz/
            baz/gloo
            baz/bloo

Once the repository exists, you can initially import your data into it, using the ra_local access method (invoked by using a “file” URL):

$ svn import someproject file:///absolute/path/to/myrepos/trunk/someproject
…
Committed revision 1.

The example above creates a new directory tree trunk/someproject in the root of the repository's filesystem, and copies all the data from someproject into it.

Make Some Working Copies

Now check out a fresh “working copy” of your project. To do this, we specify a URL to the exact directory within the repository that we want. The parameter after the URL allows us to name the working copy we check out.

$ svn checkout file:///absolute/path/to/myrepos/trunk/someproject wc
A  wc/foo
A  wc/bar
A  wc/baz
A  wc/baz/gloo
A  wc/baz/bloo

Now we have a working copy in a local directory called wc, which represents the location /trunk/someproject in the repository (assuming the repository's root is file:///absolute/path/to/myrepos.)

For the sake of example, let's duplicate the working copy, and pretend it belongs to someone else:

$ cp -R wc wc2

From here, let's make some changes within our original working copy:

$ cd wc
$ echo "new text" >> bar     # change bar's text
$ svn propset color green foo      # add a metadata property to foo
$ svn delete baz                   # schedule baz directory for deletion
$ touch newfile    
$ svn add newfile                  # schedule newfile for addition

That's a lot of changes! If we were to leave and come back tomorrow, how could we remember what changes we'd made? Easy. The status command will show us all of the “local modifications” in our working copy:

$ svn status                   # See what's locally modified
M   ./bar
_M  ./foo
A   ./newfile
D   ./baz
D   ./baz/gloo
D   ./baz/bloo

According to this output, three items are scheduled to be (D)eleted from the repository, one item is scheduled to be (A)dded to the repository, and two items have had their contents (M)odified in some way. For more details, be sure to read about svn status in Chapter 3 of the Subversion Book.

Now we decide to commit our changes, creating Revision 2 in the repository:

$ svn commit -m "fixed bug #233"
Sending    bar
Sending    foo
Adding     newfile
Deleting   baz
Transmitting data...
Committed revision 2.

The -m argument is a way of specifying a log message: that is, a specific description of your change-set sent to the repository. The log message is now attached to Revision 2. A future user might peruse repository log messages, and now will know what your Revision 2 changes were for.

Finally, pretend that you are now Felix, or some other collaborator. If you go wc2 (that other working copy you made), it will need the svn update command to receive the Revision 2 changes:

$ cd ../wc2                # change to the back-up working copy

$ svn update               # get changes from repository
U   ./bar
_U  ./foo
A   ./newfile
D   ./baz

The output of the svn update command tells Felix that baz was (D)eleted from his working copy, newfile was (A)dded to his working copy, and that bar and foo had their contents (U)pdated.

If for some reason bar contained some local changes made by Felix, then the server changes would be merged into bar: that is, bar would now contain both sets of changes. Whenever server changes are merged into a locally-modified file, two possible things can happen:

The merge can go smoothly. That is, the two sets of changes do not overlap. In this case, svn update prints a G (``mer(G)ed'').
The sets of changes overlap, and a C for (C)onflict is printed. See section ??? for information about how conflict resolution works.

Chapter 2. Best Practices

Table of Contents

Source Code Formatting

When You Have To Reformat
Ignoring Whitespace Differences
Line Endings

When you commit

Binary Files

Tips to use Subversion more effectively.

In this chapter, we'll focus on how to avoid some pitfalls of version control systems in general and Subversion specifically.

Source Code Formatting

Subversion diffs and merges text files work on a line-by-line basis. They don't understand the syntax of programming languages or even know when you've just reflowed text to a different line width.

Given this design, it's important to avoid unnecessary reformatting. It creates unnecessary conflicts when merging branches, updating working copies, and applying patches. It also can drown you in noise when viewing differences between revisions.

You can avoid these problems by following clearly-defined formatting rules. The Subversion project's own hacking.html document (http://svn.collab.net/repos/svn/trunk/www/hacking.html) and the Code Conventions for the Java Programming Language (http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html), are good examples.

Tabs are particularly important. Some projects, like Subversion, do not use tabs at all in the source tree. Others always use them and define a particular tab size.

It can be very helpful to have an editor smart enough to help adhere to these rules. For example, vim can do this on a per-project basis with .vimrc commands like the following:

autocmd BufRead,BufNewFile */rapidsvn/*.{cpp,h}
    setlocal ts=4 noexpandtab
autocmd BufRead,BufNewFile */subversion/*.[ch]
    setlocal sw=2 expandtab cinoptions=>2sn-s{s^-s:s

Check your favorite editor's documentation for more information.

When You Have To Reformat

In the real world, we're not always so perfect. Formatting preferences may change over time, or we may just make mistakes. There are things you can do to minimize the problems of reformatting.

These are good guidelines to follow:

If you're making a sweeping reformatting change, do it in a single commit with no semantic changes. Give precise directions on duplicating formatting changes.
If you've made semantic changes to some area of code and see inconsistent formatting in the immediate context, it's okay to reformat. Causing conflicts is not as great a concern because your semantic changes are likely to do that anyway.

Here's an example of a sweeping reformat:

$ svn co file:///repo/path/trunk indent_wc
$ indent -gnu indent_wc/src/*.[ch]
$ svn commit -m 'Ran indent -gnu src/*.[ch]' indent_wc

This follows all rules: there were no semantic changes mixed in (no files were changed other than through indent). The indent commandline was given, so the changes can be very easily duplicated. All the reformatting was done in a single revision.

Let's say these changes occurred to the trunk at revision 26. The head revision is now 42. You created a branch at revision 13 and now want to merge it back into the trunk. Ordinarily you'd do this:

$ svn co file://repo/path/trunk merge_wc
$ svn merge -r 13:head file://repo/path/branches/mybranch merge_wc
… # resolve conflicts
$ svn commit -m 'Merged branch'

But with the reformatting changes, there will be many, many conflicts. If you follow these rules, you can merge more easily:

$ svn co -r 25 file://repo/path/trunk merge_wc
$ svn merge -r 13:head file://repo/path/branches/mybranch merge_wc
… # resolve conflicts
$ indent -gnu src/*.[ch]
$ svn up
… # resolve conflicts
$ svn commit -m 'Merged branch'

In English, the procedure is:

Check out a pre-reformatting trunk working copy.
Merge all branch changes. Fix conflicts.
Reformat in the same manner.
Update to the head revision. Fix conflicts.
Check in the merged working copy.

Ignoring Whitespace Differences

When viewing differences between revisions, you can customize svn diff output to hide whitespace changes. The -x argument passes arguments through to GNU diff. Here are some useful arguments:

Table 2.1. Some useful GNU diff arguments

Option	Description
`-b`	Ignore differences in whitespace only.
`-B`	Ignore added/removed blank lines.
`-i`	Ignore changes in case.
`-t`	Expand tabs to spaces to preserve alignment.
`-T`	Output a tab rather than a space at the beginning of each line to start on a tab stop.

The commit emails always show whitespace-only changes. commit-email.pl uses svnlook diff to get differences, which doesn't support the -x option.

Line Endings

Different platforms (Unix, Windows, Mac OS) have different conventions for marking the line endings of text files. Simple editors may rewrite line endings, causing problems with diff and merge. This is a subset of the formatting problems.

Subversion has built-in support for normalizing line endings. To enable it, set the svn:eol-style property to ``native''. See Properties in the Subversion book for more information.

When you commit

It pays to take some time before you commit to review your changes and create an appropriate log message. You are publishing the newly changed project anew every time you commit. This is true in two senses:

When you commit, you are potentially destabilizing the head revision. Many projects have a policy that the head revision is “stable”—it should always parse/compile, it should always pass unit tests, etc. If you don't get something right, you may be inconveniencing an arbitrary number of people until someone commits a fix.
You cannot easily remove revisions. (There is no equivalent to cvs admin -o.) If you might not want something to be in the repository, make sure it is not included in your commit. Check for sensitive information, autogenerated files, and unnecessary large files.

If you later don't like your log message, it is possible to change it. The svnadmin setlog command will do this locally. You can set up the script http://svn.collab.net/repos/svn/trunk/tools/cgi/tweak-log.cgi,tweak-log.cgi to allow the same thing remotely. All the same, creating a good log message beforehand helps clarify your thoughts and avoid committing a mistake.

You should run a svn diff before each commit and ask yourself:

do these changes belong together? It's best that each revision is a single logical change. It's very easy to forget that you've started another change.
do I have a log entry for these changes?

Defining a log entry policy is also helpful --- the Subversion hacking.html document http://svn.collab.net/repos/svn/trunk/www/hacking.html is a good model. If you always embed filenames, function names, etc. then you can easily search through the logs with search-svnlog.pl http://svn.collab.net/repos/svn/trunk/tools/client-side/search-svnlog.pl.

You may want to write the log entry as you go. It's common to create a file changes with your log entry in progress. When you commit, use svn ci -F changes.

If you do not write log entries as you go, you can generate an initial log entry file using the output of svn status which contains a list of all modified files and directories and write a comment for each one.

Binary Files

Subversion does not have any way to merge or view differences of binary files, so it's critical that these have accurate log messages. Since you can't review your changes with svn diff immediately before committing, it's a particularly good idea to write the log entry as you go.

Chapter 3. Directory Versioning

Table of Contents

Directory Revisions

The Lagging Directory

The Problem
The Solution

The Overeager Directory

The Problem
The Solution

User Impact

The three cardinal virtues of a master technologist are: laziness, impatience, and hubris." —Larry Wall

This describes some of the theoretical pitfalls around the (possibly arrogant) notion that one can simply version directories just as one versions files.

Directory Revisions

To begin, recall that the Subversion repository is an array of trees. Each tree represents the application of a new atomic commit, and is called a revision. This is very different from a CVS repository, which stores file histories in a collection of RCS files (and doesn't track tree-structure.)

So when we refer to “revision 4 of foo.c” (written foo.c:4) in CVS, this means the fourth distinct version of foo.c—but in Subversion this means “the version of foo.c in the fourth revision (tree)”. It's quite possible that foo.c has never changed at all since revision 1! In other words, in Subversion, different revision numbers of the same versioned item do not imply different contents.

Nevertheless, the content of foo.c:4 is still well-defined. The file foo.c in revision 4 has specific text and properties.

Suppose, now, that we extend this concept to directories. If we have a directory DIR, define DIR:N to be “the directory DIR in the fourth revision.” The contents are defined to be a particular set of directory entries (dirents) and properties.

So far, so good. The concept of versioning directories seems fine in the repository—the repository is very theoretically pure anyway. However, because working copies allow mixed revisions, it's easy to create problematic use-cases.

The Lagging Directory

The Problem

This is the first part of the “Greg Hudson” problem, so named because he was the first one to bring it up and define it well. :-)

Suppose our working copy has directory DIR:1 containing file foo:1, along with some other files. We remove foo and commit.

Already, we have a problem: our working copy still claims to have DIR:1. But on the repository, revision 1 of DIR is defined to contain foo—and our working copy DIR clearly does not have it anymore. How can we truthfully say that we still have DIR:1?

One answer is to force DIR to be updated when we commit foo's deletion. Assuming that our commit created revision 2, we would immediately update our working copy to DIR:2. Then the client and server would both agree that DIR:2 does not contain foo, and that DIR:2 is indeed exactly what is in the working copy.

This solution has nasty, un-user-friendly side effects, though. It's likely that other people may have committed before us, possibly adding new properties to DIR, or adding a new file bar. Now pretend our committed deletion creates revision 5 in the repository. If we instantly update our local DIR to 5, that means unexpectedly receiving a copy of bar and some new propchanges. This clearly violates a UI principle: ``the client will never change your working copy until you ask it to.'' Committing changes to the repository is a server-write operation only; it should not modify your working data!

Another solution is to do the naive thing: after committing the deletion of foo, simply stop tracking the file in the .svn administrative directory. The client then loses all knowledge of the file.

But this doesn't work either: if we now update our working copy, the communication between client and server is incorrect. The client still believes that it has DIR:1—which is false, since a “true” DIR:1 contains foo. The client gives this incorrect report to the repository, and the repository decides that in order to update to revision 2, foo must be deleted. Thus the repository sends a bogus (or at least unnecessary) deletion command.

The Solution

After deleting foo and committing, the file is not totally forgotten by the .svn directory. While the file is no longer considered to be under version control, it is still secretly remembered as having been “deleted”.

When the user updates the working copy, the client correctly informs the server that the file is already missing from its local DIR:1; therefore the repository doesn't try to re-delete it when patching the client up to revision 2.

Note to developers

How the “deleted” flag works under the hood.

The svn status command won't display a deleted item, unless you make the deleted item the specific target of status.
When a deleted item's parent is updated, one of two things will happen:
1. The repository will re-add the item, thereby overwriting the entire entry. (no more “deleted” flag)
2. The repository will say nothing about the item, which means that it's fully aware that your item is gone, and this is the correct state to be in. In this case, the entire entry is removed. (no more “deleted” flag)
If a user schedules an item for addition that has the same name as a “deleted” entry, then entry will have both flags simultaneously. This is perfectly fine:
1. The commit-crawler will notice both flags and do a delete() and then an add(). This ensures that the transaction is built correctly. (without the delete(), the add() would be on top of an already-existing item.)
2. When the commit completes, the client rewrites the entry as normal. (no more “deleted” flag)

The Overeager Directory

This is the 2nd part of the “Greg Hudson” problem.

The Problem

Again, suppose our working copy has directory DIR:1 containing file foo:1, along with some other files.

Now, unbeknownst to us, somebody else adds a new file bar to this directory, creating revision 2 (and DIR:2).

Now we add a property to DIR and commit, which creates revision 3. Our working-copy DIR is now marked as being at revision 3.

Of course, this is false; our working copy does not have DIR:3, because the “true” DIR:3 on the repository contains the new file bar. Our working copy has no knowledge of bar at all.

Again, we can't follow our commit of DIR with an automatic update (and addition of bar). As mentioned previously, commits are a one-way write operation; they must not change working copy data.

The Solution

Let's enumerate exactly those times when a directory's local revision number changes:

When a directory is updated:: If the directory is either the direct target of an update command, or is a child of an updated directory, it will be bumped (along with many other siblings and children) to a uniform revision number.
When a directory is committed:: A directory can only be considered a “committed object” if it has a new property change. (Otherwise, to “commit a directory” really implies that its modified children are being committed, and only such children will have local revisions bumped.)

In this light, it's clear that our “overeager directory” problem only happens in the second situation—those times when we're committing directory propchanges.

Thus the answer is simply not to allow property-commits on directories that are out-of-date. It sounds a bit restrictive, but there's no other way to keep directory revisions accurate.

User Impact

Really, the Subversion client seems to have two difficult—almost contradictory—goals.

First, it needs to make the user experience friendly, which generally means being a bit “sloppy” about deciding what a user can or cannot do. This is why it allows mixed-revision working copies, and why it tries to let users execute local tree-changing operations (delete, add, move, copy) in situations that aren't always perfectly, theoretically “safe” or pure.

Second, the client tries to keep the working copy in correctly in sync with the repository using as little communication as possible. Of course, this is made much harder by the first goal!

So in the end, there's a tension here, and the resolutions to problems can vary. In one case (the “lagging directory”), the problem can be solved through a bit of clever entry tracking in the client. In the other case (“the overeager directory”), the only solution is to restrict some of the theoretical laxness allowed by the client.