Table of Contents
List of Tables
Table of Contents
Here's an opportunity to play with Subversion in some hands-on examples. The Subversion commands demoed here are just small examples of what Subversion can do; see Chapter 3 in the Subversion Book for full explanations of each.
The Subversion client has an abstract interface for accessing a repository. Three “Repository Access” (RA) implementations currently exist as libraries. You can see which methods are available to your svn client like so:
$ svn --version svn, version 1.0.4 (r9844) compiled May 23 2004, 14:04:22 Copyright (C) 2000-2004 CollabNet. Subversion is open source software, seehttp://subversion.tigris.org/
This product includes software developed by CollabNet (http://www.Collab.Net/
). The following repository access (RA) modules are available: * ra_dav : Module for accessing a repository via WebDAV (DeltaV) protocol. - handles 'http' scheme - handles 'https' scheme * ra_local : Module for accessing a repository on local disk. - handles 'file' scheme * ra_svn : Module for accessing a repository using the svn network protocol. - handles 'svn' scheme
If you don't see ra_local, it probably means that Berkeley DB (or relevant database back-end) wasn't found when compiling your client binary. To continue with these examples, you'll need to have ra_local available.
Start by creating a new, empty repository using the svnadmin tool:
$ svnadmin create myrepos
Let's assume you have a directory someproject
which contains files that you wish to place under version
control:
someproject/foo bar baz/ baz/gloo baz/bloo
Once the repository exists, you can initially import your data into it, using the ra_local access method (invoked by using a “file” URL):
$ svn import someproject file:///absolute/path/to/myrepos/trunk/someproject … Committed revision 1.
The example above creates a new directory tree
trunk/someproject
in the root of the repository's
filesystem, and copies all the data from
someproject
into it.
Now check out a fresh “working copy” of your project. To do this, we specify a URL to the exact directory within the repository that we want. The parameter after the URL allows us to name the working copy we check out.
$ svn checkout file:///absolute/path/to/myrepos/trunk/someproject wc A wc/foo A wc/bar A wc/baz A wc/baz/gloo A wc/baz/bloo
Now we have a working copy in a local directory called
wc
, which represents the location
/trunk/someproject
in the repository (assuming
the repository's root is file:///absolute/path/to/myrepos
.)
For the sake of example, let's duplicate the working copy, and pretend it belongs to someone else:
$ cp -R wc wc2
From here, let's make some changes within our original working copy:
$ cd wc $ echo "new text" >> bar # change bar's text $ svn propset color green foo # add a metadata property to foo $ svn delete baz # schedule baz directory for deletion $ touch newfile $ svn add newfile # schedule newfile for addition
That's a lot of changes! If we were to leave and come back tomorrow, how could we remember what changes we'd made? Easy. The status command will show us all of the “local modifications” in our working copy:
$ svn status # See what's locally modified M ./bar _M ./foo A ./newfile D ./baz D ./baz/gloo D ./baz/bloo
According to this output, three items are scheduled to be (D)eleted from the repository, one item is scheduled to be (A)dded to the repository, and two items have had their contents (M)odified in some way. For more details, be sure to read about svn status in Chapter 3 of the Subversion Book.
Now we decide to commit our changes, creating Revision 2 in the repository:
$ svn commit -m "fixed bug #233" Sending bar Sending foo Adding newfile Deleting baz Transmitting data... Committed revision 2.
The -m argument is a way of specifying a log message: that is, a specific description of your change-set sent to the repository. The log message is now attached to Revision 2. A future user might peruse repository log messages, and now will know what your Revision 2 changes were for.
Finally, pretend that you are now Felix, or some other
collaborator. If you go wc2
(that other working
copy you made), it will need the svn update
command to receive the Revision 2 changes:
$ cd ../wc2 # change to the back-up working copy $ svn update # get changes from repository U ./bar _U ./foo A ./newfile D ./baz
The output of the svn update command tells Felix that baz was (D)eleted from his working copy, newfile was (A)dded to his working copy, and that bar and foo had their contents (U)pdated.
If for some reason bar
contained some local
changes made by Felix, then the server changes would be
merged into bar
: that is,
bar
would now contain both sets of changes.
Whenever server changes are merged into a locally-modified
file, two possible things can happen:
The merge can go smoothly. That is, the two sets of
changes do not overlap. In this case, svn
update prints a G
(``mer(G)ed'').
The sets of changes overlap, and a
C
for (C)onflict is printed. See
section ??? for information about how conflict resolution
works.
Table of Contents
Tips to use Subversion more effectively.
In this chapter, we'll focus on how to avoid some pitfalls of version control systems in general and Subversion specifically.
Subversion diffs and merges text files work on a line-by-line basis. They don't understand the syntax of programming languages or even know when you've just reflowed text to a different line width.
Given this design, it's important to avoid unnecessary reformatting. It creates unnecessary conflicts when merging branches, updating working copies, and applying patches. It also can drown you in noise when viewing differences between revisions.
You can avoid these problems by following clearly-defined
formatting rules. The Subversion project's own
hacking.html
document (http://svn.collab.net/repos/svn/trunk/www/hacking.html
)
and the Code Conventions for the Java Programming Language
(http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html
),
are good examples.
Tabs are particularly important. Some projects, like Subversion, do not use tabs at all in the source tree. Others always use them and define a particular tab size.
It can be very helpful to have an editor smart enough to
help adhere to these rules. For example, vim
can do this on a per-project basis with
.vimrc
commands like the following:
autocmd BufRead,BufNewFile */rapidsvn/*.{cpp,h} setlocal ts=4 noexpandtab autocmd BufRead,BufNewFile */subversion/*.[ch] setlocal sw=2 expandtab cinoptions=>2sn-s{s^-s:s
Check your favorite editor's documentation for more information.
In the real world, we're not always so perfect. Formatting preferences may change over time, or we may just make mistakes. There are things you can do to minimize the problems of reformatting.
These are good guidelines to follow:
If you're making a sweeping reformatting change, do it in a single commit with no semantic changes. Give precise directions on duplicating formatting changes.
If you've made semantic changes to some area of code and see inconsistent formatting in the immediate context, it's okay to reformat. Causing conflicts is not as great a concern because your semantic changes are likely to do that anyway.
Here's an example of a sweeping reformat:
$ svn co file:///repo/path/trunk indent_wc $ indent -gnu indent_wc/src/*.[ch] $ svn commit -m 'Ran indent -gnu src/*.[ch]' indent_wc
This follows all rules: there were no semantic changes mixed in (no files were changed other than through indent). The indent commandline was given, so the changes can be very easily duplicated. All the reformatting was done in a single revision.
Let's say these changes occurred to the trunk at revision 26. The head revision is now 42. You created a branch at revision 13 and now want to merge it back into the trunk. Ordinarily you'd do this:
$ svn co file://repo/path/trunk merge_wc $ svn merge -r 13:head file://repo/path/branches/mybranch merge_wc … # resolve conflicts $ svn commit -m 'Merged branch'
But with the reformatting changes, there will be many, many conflicts. If you follow these rules, you can merge more easily:
$ svn co -r 25 file://repo/path/trunk merge_wc $ svn merge -r 13:head file://repo/path/branches/mybranch merge_wc … # resolve conflicts $ indent -gnu src/*.[ch] $ svn up … # resolve conflicts $ svn commit -m 'Merged branch'
In English, the procedure is:
Check out a pre-reformatting trunk working copy.
Merge all branch changes. Fix conflicts.
Reformat in the same manner.
Update to the head revision. Fix conflicts.
Check in the merged working copy.
When viewing differences between revisions, you can
customize svn diff output to hide whitespace
changes. The -x
argument passes arguments
through to GNU diff. Here are some useful arguments:
Table 2.1. Some useful GNU diff arguments
Option | Description |
---|---|
-b | Ignore differences in whitespace only. |
-B | Ignore added/removed blank lines. |
-i | Ignore changes in case. |
-t | Expand tabs to spaces to preserve alignment. |
-T | Output a tab rather than a space at the beginning of each line to start on a tab stop. |
The commit emails always show whitespace-only changes.
commit-email.pl
uses svnlook diff to get
differences, which doesn't support the -x
option.
Different platforms (Unix, Windows, Mac OS) have different conventions for marking the line endings of text files. Simple editors may rewrite line endings, causing problems with diff and merge. This is a subset of the formatting problems.
Subversion has built-in support for normalizing line endings. To enable it, set the svn:eol-style property to ``native''. See Properties in the Subversion book for more information.
It pays to take some time before you commit to review your changes and create an appropriate log message. You are publishing the newly changed project anew every time you commit. This is true in two senses:
When you commit, you are potentially destabilizing the head revision. Many projects have a policy that the head revision is “stable”—it should always parse/compile, it should always pass unit tests, etc. If you don't get something right, you may be inconveniencing an arbitrary number of people until someone commits a fix.
You cannot easily remove revisions. (There is no equivalent to cvs admin -o.) If you might not want something to be in the repository, make sure it is not included in your commit. Check for sensitive information, autogenerated files, and unnecessary large files.
If you later don't like your log message, it is possible to
change it. The svnadmin setlog command will
do this locally. You can set up the script http://svn.collab.net/repos/svn/trunk/tools/cgi/tweak-log.cgi,tweak-log.cgi
to allow the same thing remotely. All the same, creating a good
log message beforehand helps clarify your thoughts and avoid
committing a mistake.
You should run a svn diff before each commit and ask yourself:
do these changes belong together? It's best that each revision is a single logical change. It's very easy to forget that you've started another change.
do I have a log entry for these changes?
Defining a log entry policy is also helpful --- the
Subversion hacking.html
document
http://svn.collab.net/repos/svn/trunk/www/hacking.html
is a good model. If you always embed filenames, function names,
etc. then you can easily search through the logs with
search-svnlog.pl http://svn.collab.net/repos/svn/trunk/tools/client-side/search-svnlog.pl
.
You may want to write the log entry as you go. It's common
to create a file changes
with your log
entry in progress. When you commit, use svn ci -F
changes.
If you do not write log entries as you go, you can generate an initial log entry file using the output of svn status which contains a list of all modified files and directories and write a comment for each one.
Table of Contents
The three cardinal virtues of a master technologist are: laziness, impatience, and hubris." —Larry Wall
This describes some of the theoretical pitfalls around the (possibly arrogant) notion that one can simply version directories just as one versions files.
To begin, recall that the Subversion repository is an array of trees. Each tree represents the application of a new atomic commit, and is called a revision. This is very different from a CVS repository, which stores file histories in a collection of RCS files (and doesn't track tree-structure.)
So when we refer to “revision 4 of
foo.c
” (written
foo.c:4
) in CVS, this means the fourth
distinct version of foo.c
—but in
Subversion this means “the version of
foo.c
in the fourth revision
(tree)”. It's quite possible that
foo.c
has never changed at all since
revision 1! In other words, in Subversion, different revision
numbers of the same versioned item do not
imply different contents.
Nevertheless, the content of foo.c:4
is still well-defined. The file foo.c
in
revision 4 has specific text and properties.
Suppose, now, that we extend this concept to directories.
If we have a directory DIR
, define
DIR:N
to be “the directory DIR in the
fourth revision.” The contents are defined to be a
particular set of directory entries (dirents
)
and properties.
So far, so good. The concept of versioning directories seems fine in the repository—the repository is very theoretically pure anyway. However, because working copies allow mixed revisions, it's easy to create problematic use-cases.
This is the first part of the “Greg Hudson” problem, so named because he was the first one to bring it up and define it well. :-)
Suppose our working copy has directory
DIR:1
containing file
foo:1
, along with some other files. We
remove foo
and commit.
Already, we have a problem: our working copy still claims
to have DIR:1
. But on the repository,
revision 1 of DIR
is
defined to contain
foo
—and our working copy
DIR
clearly does not have it anymore.
How can we truthfully say that we still have
DIR:1
?
One answer is to force DIR
to be
updated when we commit foo
's deletion.
Assuming that our commit created revision 2, we would
immediately update our working copy to
DIR:2
. Then the client and server would
both agree that DIR:2
does not contain
foo, and that DIR:2
is indeed exactly
what is in the working copy.
This solution has nasty, un-user-friendly side effects,
though. It's likely that other people may have committed
before us, possibly adding new properties to
DIR
, or adding a new file
bar
. Now pretend our committed deletion
creates revision 5 in the repository. If we instantly update
our local DIR
to 5, that means
unexpectedly receiving a copy of bar
and
some new propchanges. This clearly violates a UI principle:
``the client will never change your working copy until you ask
it to.'' Committing changes to the repository is a
server-write operation only; it should
not modify your working data!
Another solution is to do the naive thing: after
committing the deletion of foo
, simply
stop tracking the file in the .svn
administrative directory. The client then loses all knowledge
of the file.
But this doesn't work either: if we now update our working
copy, the communication between client and server is
incorrect. The client still believes that it has
DIR:1
—which is false, since a
“true” DIR:1
contains
foo
. The client gives this incorrect
report to the repository, and the repository decides that in
order to update to revision 2, foo
must
be deleted. Thus the repository sends a bogus (or at least
unnecessary) deletion command.
After deleting foo
and committing,
the file is not totally forgotten by the
.svn
directory. While the file is no
longer considered to be under version control, it is still
secretly remembered as having been
“deleted”.
When the user updates the working copy, the client
correctly informs the server that the file is already missing
from its local DIR:1
; therefore the
repository doesn't try to re-delete it when patching the
client up to revision 2.
This is the 2nd part of the “Greg Hudson” problem.
Again, suppose our working copy has directory
DIR:1
containing file
foo:1
, along with some other files.
Now, unbeknownst to us, somebody else adds a new file
bar
to this directory, creating revision
2 (and DIR:2
).
Now we add a property to DIR
and
commit, which creates revision 3. Our working-copy
DIR
is now marked as being at revision
3.
Of course, this is false; our working copy does
not have DIR:3
,
because the “true” DIR:3
on
the repository contains the new file bar
.
Our working copy has no knowledge of bar
at all.
Again, we can't follow our commit of
DIR
with an automatic update (and
addition of bar
). As mentioned
previously, commits are a one-way write operation; they must
not change working copy data.
Let's enumerate exactly those times when a directory's local revision number changes:
If the directory is either the direct target of an update command, or is a child of an updated directory, it will be bumped (along with many other siblings and children) to a uniform revision number.
A directory can only be considered a “committed object” if it has a new property change. (Otherwise, to “commit a directory” really implies that its modified children are being committed, and only such children will have local revisions bumped.)
In this light, it's clear that our “overeager directory” problem only happens in the second situation—those times when we're committing directory propchanges.
Thus the answer is simply not to allow property-commits on directories that are out-of-date. It sounds a bit restrictive, but there's no other way to keep directory revisions accurate.
Really, the Subversion client seems to have two difficult—almost contradictory—goals.
First, it needs to make the user experience friendly, which generally means being a bit “sloppy” about deciding what a user can or cannot do. This is why it allows mixed-revision working copies, and why it tries to let users execute local tree-changing operations (delete, add, move, copy) in situations that aren't always perfectly, theoretically “safe” or pure.
Second, the client tries to keep the working copy in correctly in sync with the repository using as little communication as possible. Of course, this is made much harder by the first goal!
So in the end, there's a tension here, and the resolutions to problems can vary. In one case (the “lagging directory”), the problem can be solved through a bit of clever entry tracking in the client. In the other case (“the overeager directory”), the only solution is to restrict some of the theoretical laxness allowed by the client.