Darwinweb

Creating and Applying External Patches with Subversion

July 25, 2006     

I’ve been delving deeper into Subversion recently for work and run into some interesting patching situations where Subversion’s native tools really fall short. Although Subversion offers the ability to use external diff commands, it’s more complicated than I had hoped.

Messy Codebase

In this case Subversion was only half the problem. I happen to work with a codebase that has been replicated for hundreds of clients with untracked changes some disturbing code irregularities. The most striking is randomly mixed tabs and spaces. Even worse are mixed line endings, encodings, and even some malformed files.

My first mistake after importing the code was to try to fix some of the formatting issues. This resulted in diff returning lines that didn’t have any code changes. This made it difficult to patch older versions of the code that had been customized independently.

Subversion Patching Process

In Subversion, patching is applied directly to a working copy. In the case of conflicts, the options are showed inline in the file, with the original file and the two compared files (left and right) stored as new files in the case of a conflict.

Issues

Subversion handles merges more elegantly than CVS, but there’s still a lot to be desired. For instance:

  1. Merges aren’t tracked. It’s a good idea to adopt a convention for annotating merges. I usually use something like MERGED -r50:100, and I try to commit merges separately from other changes.
  2. Can’t ignore whitespace changes. If too many whitespace changes creep into your files Subversion merging can quickly become useless. In my case, some of my files would pick up changes automatically from BBEdit or TextMate as their malformedness was repaired.
  3. Can’t tweak the patch. Basically you can review the changes via svn diff, but svn patch applies the changes directly. In case of the aforementioned whitespace problems and other scenarios it can be quite helpful to manually edit a patch file before applying it.
  4. GNU patch has a different interface. Instead of showing conflicts directly, patch does what it can, and leaves the rest in a .rej file. This isn’t necessarily better than svn patch, but I’ve found it easier to work with.
  5. Code isn’t a working copy. This is probably a rare scenario, but since I have hundreds of copies of this code in production with no version control it makes sense to have a process to quickly apply individual patches without the overhead of setting up a subversion branch.

Howto

The process I use is straightforward. Simply capture svn diff output in a file like so:


svn diff -r 20:30 >application.patch

From there I can open up the patch in TextMate (which has beautiful syntax hiliting for patches) and make any necessary tweaks. When I’m ready to apply the patch I move it to the target directory and apply it (with backup):


patch -p0 -b <application.patch

Diff Format Incompatibility

The first thing is that svn diff output may produce slightly incompatible output. Specifically any line starting with a backslash is asking for trouble. svn diff produces a lot of these:


\ No newline at end of file

After removing those lines the patch generally goes off without a hitch. There may be other incompatibilities, but I haven’t run into them.

Patch Tweaking

Each hunk of a unified diff has a line that looks like this:


@@ -1263,7 +1263,8 @@

These are the line numbers from each file followed by the number of respective lines displayed in the patch file (including context lines). When a patch is applied patch will look around the plus line number for lines marked with a minus (-) and attempt to replace them with the plus ( + ) lines. In order to manually edit a patch file you need to understand this because the numbers most likely will have to be changed. If do screw up the numbers the worst that will happen is a patch error. Although I’m not intimately familiar with the exact algorithm, I believe the number of lines (numbers after the commas) are more important than the exact line numbers (numbers before the commas). The reason for this is that the target file may have been changed, so the exact location of the lines to remove may move.

The Sub Diff

In order to solve my whitespace problems, I copy the plus and minus sections of the particular hunk into two separate files in BBEdit. I then run the following regexp search and replace:


^(\+|-)

Which turns them into regular files. I also do any necessary whitespace normalization and then run a diff comparison directly in BBEdit. If you don’t have BBEdit you can save them to disk and run a diff on the command line:


diff -ub temp_file1.txt temp_file2.txt

This will show you the real differences (non-whitespace) between the files so you can go back and tweak the patch.