Tom Lord - Diagnosing Subversion

People have recently said things here along the lines of "svn fails to
significantly improve upon CVS and, to the degree it does, meta-CVS
and dcvs do the same job in a better way" (I pretty much agree) and
"it looks like an ego driven project" (perhaps, but then I'd like to
think that arch is a pride driven project and ultimately, isn't that
just a slight difference in spin?).

I've thought a lot about "what went wrong" with svn (and take it as
axiomatic, on this list, that _something_ went wrong) for two reasons:
(1) like Bob, I really tried to like svn; (2) as I started to think
about "what went wrong" -- it seemed like what went wrong was a bunch
of mistakes of exactly the sort that I am inclined towards myself and
therefore have to actively resist: there, but for the grace of
something, stand I.

Here's what I think went wrong.  This is just my unscientific
impression based on following news of the project over the years.

A) It started with a brilliant idea for a hack: a transactional,
   write-once, versioned, hierarchical filesystem database.

   Around the time svn started, that idea was "going around" -- I 
   even had my own version for a little while.

   As an abstract data structure, that kind of database is a neat
   thing with many potential applications.  If you ever spend time
   trying to write robust data-intensive apps on top of a unix
   filesystem without using a database, you really long for that kind
   of functionality.

   Moreover, it's _conceptually_ simple to implement: it's essentially
   just trees written in a functional (side-effectless) style.  To
   "modify" a tree, you build a new tree, sharing unmodified nodes
   with the previous tree.  Seems relatively cheap and transactions
   just fall out of that nearly for free.

   So here's the first mistake:  the idea of a transactional FS is 
   like a shiny new hammer.  It's pretty natural to let it possess
   you and start running around looking for nails.


B) It took off from there with an underdeveloped notion of revision
   control.

   Suppose you have the same intuition that Walter expressed a while
   back, which I'll paraphrase as:  "The first and most fundamental
   task of a revision control system is to take snapshots of working
   directories." 

   If you don't believe that that's a seductive (even though wrong)
   intuition, go back and look at how I replied.  It took many, quite
   abstract paragraphs.  What revision control is really about
   (archival, access, and manipulation of changesets) is subtle and
   _non_-intuitive.  (Anecodtally, a few years before arch, I made an
   earlier attempt at revision control based on, guess what:
   snapshotting.)  What's worse is that a set of working tree
   snapshots combined with a little meta-data is a kind of dual space
   to the kinds of records rev ctl is really about (they're logically
   interconvertable representations).  Anything you say to a
   snapshotting fan about what you want to do with a
   changeset-librarian orientation they can reply to with "Yeah, but
   we could do that, too."  So it's not even that the snapshot
   intuition is completely wrong: it's just putting an emphasis on the
   wrong details.

   Now the transactional filesystem DB takes snapshots handily.  It's
   ideal for that.  So if you have the snapshot-intuition, and the
   transactional fs hammer -- you're apt to leap to a wrong
   conclusion: you've solved the problem!

   And if, as some of the original svn contributors were, you're
   coming from hacking CVS and it's screwy (historically constrained)
   repository format, an apparent escape route from that mess is just
   going to strengthen your convictions.

   Second mistake: The assumption that "a filesystem DB pretty much
   solves revision control -- all the rest is just a small matter of
   hacking".


C) It underwent fuzzy design conceptualization.

   I infer from some of the design documents and other materials that,
   early on, there must have been some bull sessions to plan out
   how svn would work.  As an example, "history-sensitive merging" 
   has been part of the plan (such as it is) for as long as I've been
   aware of the project.

   Whatever planning there was: it didn't nail the details.  Instead,
   it reduced a lot of problems, in a sort of hand-wavy manner, to
   meta-data mechanisms.  I'm guessing (and inferring from docs), for
   example, that somebody straw-manned an intelligent merge operator,
   never really worried about its limitations, but worried more about 
   what kind of meta-data it needed.   Since functionality like file 
   properties seemed more than adequate to record that meta-data,
   the problem was considered reduced to "a small matter of hacking".

   Well, that's the problem with some design patterns like attaching
   property lists to everything under the sun:  they don't really
   solve design problems but they give you an operational language
   in which to _restate_ design problems.  It's sometimes very hard to
   recognize the difference between a restatement of a design problem
   in operational terms and its actual solution.   Application of
   patterns like property lists in a design bull session all too easily
   gives rise to the feeling that "all the problems we're thinking
   about have natural solutions in this design" even though all you're
   really saying is "the problems we need to solve can be expressed
   in terms of associative lookup".

   Third mistake: insufficient skepticism about their own design 
   sketches, early on.


D) Narrow design focus combined with grand technology ambitions

   The original contributors included people who worked on CVS,
   people who used CVS, and people working on products that 
   incorporate CVS.   In some sense, the itch they must have had
   in common was "Get this CVS monkey off my back;  I'm sick of it."

   At the same time, they (justifiably) had their eyes on a real
   treasure -- that transactional filesystem database.

   In that context, it'd be hard to get behind the idea of just
   incrementally fixing CVS.  It'd be hard to invent meta-CVS, for
   example.

   As the project has progressed, over the years, those conflicting
   goals have tended to be resolved in the "get 1.0 out the door"
   direction -- a scaling back of functionality ambitions in the
   direction of CVS necessitated by the degree of difficulty of the
   grand technology ambition.

   Fourth mistake: conflicting goals at opposite ends of the ambition
   spectrum -- the low end ultimately defining official success, the
   high end providing the personal motivations.


E) Leaping to unstable proto-standards.

   SVN came into being at a time when it looked to many like HTTP and
   Apache were the spec and best implementation of the new distributed
   OS for the world that would solve everything beautifully.  There
   was a kind of dog-pile onto everything W3C based on irrational
   exhuberence.  Well, they weren't that OS and they don't solve
   everything beautifully.

   Fifth mistake: jumping on the W3C bandwagon.


F) The API Fallacy

   When you lack confidence about your intended way to implement
   something, a common pattern is to decide to hide the implementation
   under an API.   That way you can always change the implementation
   later, right?

   The problems are: (1) unless you have at least one fully worked
   design for how to implement your API, you shouldn't have any
   confidence that good implementations can exist; (2) unless you have
   at least two fully worked designs for how to implement your API,
   and they make usefully contrary trade-offs, you should really start
   to wonder whether doing extra work to make an abstraction here is
   the right way to proceed.

   Sixth mistake: assuming that defining APIs all over the place 
   would improve chances for success.



G) Collision with reality.

   The transactional filesystem idea is, indeed, conceptually simple
   and elegant.   The reality of implementing it, however, is a swamp
   of external considerations and inconvenient realities.

   Supposing you want to achieve high transaction rates and
   size-scalability.  You have a _lot_ to consider: locking
   (contention over the root of the tree is especially fun),
   deadlocks, logging, crash recovery, physical layout of data, I/O
   bottlenecks, network protocols, etc. etc.

   In short, implementing a really good, high-performance,
   transactional fs is an undertaking comperable in scope and
   complexity to implementing a really good, high-performance,
   relational database storage manager -- only while there's tons of
   literature and experience about RDB implementation, transactional
   filesystems are fresh territory.  (As an aside, if you were to
   seriously undertake to make a transactional FS, I think you would
   not want to burden yourself with the extra work of concurrently 
   building a revision control system on top of it -- give that task
   to a separate team after you have something working.)

   Wanting to make progress simply and quickly, they spotted the
   Berekeley DB library.   After all: it provides transactions with
   ACID properties for our favorite handwavy design tool -- the
   associative lookup table.   As we all know, design problems can be
   "solved" simply by restating them in terms of associative tables.
   And anyway, even if Berkeley DB isn't the _best_ choice to
   implement this, it'll be the fastest way to get something working,
   and anyway it'll be hidden behind an API.

   Well, I think Berkeley DB is a lousy choice for this application.
   It creates administrative headaches, and it's optimized for simple
   associations, not hierarchical filessytems.  It doesn't natively
   provide any sort of delta-compression -- you'll have to layer that.
   Ultimately _all_ that it buys you is transactions and locks --
   every other aspect is a force-fit.

   And what resulted?  Sure enough: years of fighting against
   excessive space consumption, disk-filling log files, and poor
   performance, characterized by substantial rewrites of core
   functionality and API changes.

   A similar mistake happened with network protocols: W3C solves
   everything from authentication to proxying to browsers-for-free;
   WebDAV just sweetens the deal, right?  Except, no, as with Berkeley
   DB for physical storage, a lot is left out and, again, the result
   has been years of rewrites and API changes trying to get somewhere
   in the neighborhood of good performance, plus lots of dependencies
   on unstable externally developed libraries.

   Seventh mistake: underestimating the degree of difficulty of a 
   transactional FS.

   Eigth mistake: overconfidence in dubiously selected building
   blocks.


H) Failure to Fail

   If a team went away for six months and came back with SVN as it
   works today, I think it'd be pretty easy to say: "That's a great and
   even useful prototype.  It definately proves, at least
   schematically, the idea of using a transactional FS to back
   revision control.  There are clearly some bad choices in the
   implementation, and clearly some important neglected revision
   control issues that competing projects are starting to leapfrog you
   over.  And there's a heck of a lot of code here for what it does.
   Let's suspend development on it for a little while, and invest in
   a design effort to see how to get this thing _right_".

   That isn't the situation, though.  A team spent _years_ on this
   and they justified the project institutionally and publicly not by
   saying "let's build a proof of concept" but by saying "let's
   replace CVS".

   And, sure enough, if you suggest to them a stop-and-think phase at
   this late date you get back, basically: "Um...no...not gonna
   happen."

   I won't label that one of their mistakes because I don't think
   the root cause is something they could have easily avoided.
   I'll label it:

   First bad circumstance: crappy socio-economic circumstances for
   smart, ambitious programmers in the free software industry and
   community -- way too much weight given to what supposedly
   successful projects "smell like" and too much resulting pressure
   on hackers to project an image resembling that false idol.

So, in summary, I don't think they're a bunch of egomaniacs or
anything.  I think they rushed into something that looked like it
would be easier than it is, got boxed in by the mythologies of open
source and W3C, and now have way too much momentum and contraints to
do much about it.

What's really disappointed me most, though is that, while I do
perceive them as smart and ambitious, they don't seem terribly
open-minded about stepping back to review their project for deep
structural mistakes that need fixing.  My sense is that most of them
are pretty young and several have been associated with some successful
projects (like Apache and CVS) -- good, young programmers, since they
tend to be capable of so much more than their average peers, often
fall into a pit of overconfidence which is hard to recognize from the
inside until you've experienced a few disasters.  The situation is
made worse since there's so little effective mentoring in the industry
from old-salts who are good at making a religion of the
K.I.S.S. principle and making fun of the wealth of bloated, crappy,
yet slow-to-fail stall-ware projects that dominate so much of the
landscape.  If you ask me, explosive growth during the dot-com bubble
really blunted the technology edges of the free software movment and
our industry generally.  It left us collectively struggling to do
things the hard way, svn being just one small example.

- -t