Tom Lord - Diagnosing Subversion
People have recently said things here along the lines of "svn fails to
significantly improve upon CVS and, to the degree it does, meta-CVS
and dcvs do the same job in a better way" (I pretty much agree) and
"it looks like an ego driven project" (perhaps, but then I'd like to
think that arch is a pride driven project and ultimately, isn't that
just a slight difference in spin?).
I've thought a lot about "what went wrong" with svn (and take it as
axiomatic, on this list, that _something_ went wrong) for two reasons:
(1) like Bob, I really tried to like svn; (2) as I started to think
about "what went wrong" -- it seemed like what went wrong was a bunch
of mistakes of exactly the sort that I am inclined towards myself and
therefore have to actively resist: there, but for the grace of
something, stand I.
Here's what I think went wrong. This is just my unscientific
impression based on following news of the project over the years.
A) It started with a brilliant idea for a hack: a transactional,
write-once, versioned, hierarchical filesystem database.
Around the time svn started, that idea was "going around" -- I
even had my own version for a little while.
As an abstract data structure, that kind of database is a neat
thing with many potential applications. If you ever spend time
trying to write robust data-intensive apps on top of a unix
filesystem without using a database, you really long for that kind
of functionality.
Moreover, it's _conceptually_ simple to implement: it's essentially
just trees written in a functional (side-effectless) style. To
"modify" a tree, you build a new tree, sharing unmodified nodes
with the previous tree. Seems relatively cheap and transactions
just fall out of that nearly for free.
So here's the first mistake: the idea of a transactional FS is
like a shiny new hammer. It's pretty natural to let it possess
you and start running around looking for nails.
B) It took off from there with an underdeveloped notion of revision
control.
Suppose you have the same intuition that Walter expressed a while
back, which I'll paraphrase as: "The first and most fundamental
task of a revision control system is to take snapshots of working
directories."
If you don't believe that that's a seductive (even though wrong)
intuition, go back and look at how I replied. It took many, quite
abstract paragraphs. What revision control is really about
(archival, access, and manipulation of changesets) is subtle and
_non_-intuitive. (Anecodtally, a few years before arch, I made an
earlier attempt at revision control based on, guess what:
snapshotting.) What's worse is that a set of working tree
snapshots combined with a little meta-data is a kind of dual space
to the kinds of records rev ctl is really about (they're logically
interconvertable representations). Anything you say to a
snapshotting fan about what you want to do with a
changeset-librarian orientation they can reply to with "Yeah, but
we could do that, too." So it's not even that the snapshot
intuition is completely wrong: it's just putting an emphasis on the
wrong details.
Now the transactional filesystem DB takes snapshots handily. It's
ideal for that. So if you have the snapshot-intuition, and the
transactional fs hammer -- you're apt to leap to a wrong
conclusion: you've solved the problem!
And if, as some of the original svn contributors were, you're
coming from hacking CVS and it's screwy (historically constrained)
repository format, an apparent escape route from that mess is just
going to strengthen your convictions.
Second mistake: The assumption that "a filesystem DB pretty much
solves revision control -- all the rest is just a small matter of
hacking".
C) It underwent fuzzy design conceptualization.
I infer from some of the design documents and other materials that,
early on, there must have been some bull sessions to plan out
how svn would work. As an example, "history-sensitive merging"
has been part of the plan (such as it is) for as long as I've been
aware of the project.
Whatever planning there was: it didn't nail the details. Instead,
it reduced a lot of problems, in a sort of hand-wavy manner, to
meta-data mechanisms. I'm guessing (and inferring from docs), for
example, that somebody straw-manned an intelligent merge operator,
never really worried about its limitations, but worried more about
what kind of meta-data it needed. Since functionality like file
properties seemed more than adequate to record that meta-data,
the problem was considered reduced to "a small matter of hacking".
Well, that's the problem with some design patterns like attaching
property lists to everything under the sun: they don't really
solve design problems but they give you an operational language
in which to _restate_ design problems. It's sometimes very hard to
recognize the difference between a restatement of a design problem
in operational terms and its actual solution. Application of
patterns like property lists in a design bull session all too easily
gives rise to the feeling that "all the problems we're thinking
about have natural solutions in this design" even though all you're
really saying is "the problems we need to solve can be expressed
in terms of associative lookup".
Third mistake: insufficient skepticism about their own design
sketches, early on.
D) Narrow design focus combined with grand technology ambitions
The original contributors included people who worked on CVS,
people who used CVS, and people working on products that
incorporate CVS. In some sense, the itch they must have had
in common was "Get this CVS monkey off my back; I'm sick of it."
At the same time, they (justifiably) had their eyes on a real
treasure -- that transactional filesystem database.
In that context, it'd be hard to get behind the idea of just
incrementally fixing CVS. It'd be hard to invent meta-CVS, for
example.
As the project has progressed, over the years, those conflicting
goals have tended to be resolved in the "get 1.0 out the door"
direction -- a scaling back of functionality ambitions in the
direction of CVS necessitated by the degree of difficulty of the
grand technology ambition.
Fourth mistake: conflicting goals at opposite ends of the ambition
spectrum -- the low end ultimately defining official success, the
high end providing the personal motivations.
E) Leaping to unstable proto-standards.
SVN came into being at a time when it looked to many like HTTP and
Apache were the spec and best implementation of the new distributed
OS for the world that would solve everything beautifully. There
was a kind of dog-pile onto everything W3C based on irrational
exhuberence. Well, they weren't that OS and they don't solve
everything beautifully.
Fifth mistake: jumping on the W3C bandwagon.
F) The API Fallacy
When you lack confidence about your intended way to implement
something, a common pattern is to decide to hide the implementation
under an API. That way you can always change the implementation
later, right?
The problems are: (1) unless you have at least one fully worked
design for how to implement your API, you shouldn't have any
confidence that good implementations can exist; (2) unless you have
at least two fully worked designs for how to implement your API,
and they make usefully contrary trade-offs, you should really start
to wonder whether doing extra work to make an abstraction here is
the right way to proceed.
Sixth mistake: assuming that defining APIs all over the place
would improve chances for success.
G) Collision with reality.
The transactional filesystem idea is, indeed, conceptually simple
and elegant. The reality of implementing it, however, is a swamp
of external considerations and inconvenient realities.
Supposing you want to achieve high transaction rates and
size-scalability. You have a _lot_ to consider: locking
(contention over the root of the tree is especially fun),
deadlocks, logging, crash recovery, physical layout of data, I/O
bottlenecks, network protocols, etc. etc.
In short, implementing a really good, high-performance,
transactional fs is an undertaking comperable in scope and
complexity to implementing a really good, high-performance,
relational database storage manager -- only while there's tons of
literature and experience about RDB implementation, transactional
filesystems are fresh territory. (As an aside, if you were to
seriously undertake to make a transactional FS, I think you would
not want to burden yourself with the extra work of concurrently
building a revision control system on top of it -- give that task
to a separate team after you have something working.)
Wanting to make progress simply and quickly, they spotted the
Berekeley DB library. After all: it provides transactions with
ACID properties for our favorite handwavy design tool -- the
associative lookup table. As we all know, design problems can be
"solved" simply by restating them in terms of associative tables.
And anyway, even if Berkeley DB isn't the _best_ choice to
implement this, it'll be the fastest way to get something working,
and anyway it'll be hidden behind an API.
Well, I think Berkeley DB is a lousy choice for this application.
It creates administrative headaches, and it's optimized for simple
associations, not hierarchical filessytems. It doesn't natively
provide any sort of delta-compression -- you'll have to layer that.
Ultimately _all_ that it buys you is transactions and locks --
every other aspect is a force-fit.
And what resulted? Sure enough: years of fighting against
excessive space consumption, disk-filling log files, and poor
performance, characterized by substantial rewrites of core
functionality and API changes.
A similar mistake happened with network protocols: W3C solves
everything from authentication to proxying to browsers-for-free;
WebDAV just sweetens the deal, right? Except, no, as with Berkeley
DB for physical storage, a lot is left out and, again, the result
has been years of rewrites and API changes trying to get somewhere
in the neighborhood of good performance, plus lots of dependencies
on unstable externally developed libraries.
Seventh mistake: underestimating the degree of difficulty of a
transactional FS.
Eigth mistake: overconfidence in dubiously selected building
blocks.
H) Failure to Fail
If a team went away for six months and came back with SVN as it
works today, I think it'd be pretty easy to say: "That's a great and
even useful prototype. It definately proves, at least
schematically, the idea of using a transactional FS to back
revision control. There are clearly some bad choices in the
implementation, and clearly some important neglected revision
control issues that competing projects are starting to leapfrog you
over. And there's a heck of a lot of code here for what it does.
Let's suspend development on it for a little while, and invest in
a design effort to see how to get this thing _right_".
That isn't the situation, though. A team spent _years_ on this
and they justified the project institutionally and publicly not by
saying "let's build a proof of concept" but by saying "let's
replace CVS".
And, sure enough, if you suggest to them a stop-and-think phase at
this late date you get back, basically: "Um...no...not gonna
happen."
I won't label that one of their mistakes because I don't think
the root cause is something they could have easily avoided.
I'll label it:
First bad circumstance: crappy socio-economic circumstances for
smart, ambitious programmers in the free software industry and
community -- way too much weight given to what supposedly
successful projects "smell like" and too much resulting pressure
on hackers to project an image resembling that false idol.
So, in summary, I don't think they're a bunch of egomaniacs or
anything. I think they rushed into something that looked like it
would be easier than it is, got boxed in by the mythologies of open
source and W3C, and now have way too much momentum and contraints to
do much about it.
What's really disappointed me most, though is that, while I do
perceive them as smart and ambitious, they don't seem terribly
open-minded about stepping back to review their project for deep
structural mistakes that need fixing. My sense is that most of them
are pretty young and several have been associated with some successful
projects (like Apache and CVS) -- good, young programmers, since they
tend to be capable of so much more than their average peers, often
fall into a pit of overconfidence which is hard to recognize from the
inside until you've experienced a few disasters. The situation is
made worse since there's so little effective mentoring in the industry
from old-salts who are good at making a religion of the
K.I.S.S. principle and making fun of the wealth of bloated, crappy,
yet slow-to-fail stall-ware projects that dominate so much of the
landscape. If you ask me, explosive growth during the dot-com bubble
really blunted the technology edges of the free software movment and
our industry generally. It left us collectively struggling to do
things the hard way, svn being just one small example.
- -t