S-exp vs XML, by Erik Naggum
From: Erik Naggum erik@naggum.no
Newsgroups: comp.lang.lisp
Subject: Re: S-exp vs XML, HTML, LaTeX (was: Why lisp is growing)
Date: 28 Dec 2002 03:08:55 +0000
Organization: Naggum Software, Oslo, Norway
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
* thelif...@gmx.net (thelifter)
| I don't understand your criticism of XML.
I sometimes regret that human memory is such a great tool for one's
personal life that coming to rely on the wider context it provides
in one's communication with others is so fragile. I have explained
this dozens of times, but I guess each repetition adds something.
| Basically XML is just another way of writing S-expr or Trees or
| whatever you want to call it.
They are not identical. The aspects you are willing to ignore are
more important than the aspects you are willing to accept. Robbery
is not just another way of making a living, rape is not just another
way of satisfying basic human needs, torture is not just another way
of interrogation. And XML is not just another way of writing S-exps.
There are some things in life that you do not do if you want to be a
moral being and feel proud of what you have accomplished.
SGML was a major improvement on the markup languages that preceded
it (including GML), which helped create better publishing systems
and helped people think about information in much improved ways, but
when the zealots forgot the publishing heritage and took the notion
that information can be separated from presentation out of the world
of publishing into general data representation because SGML had had
some success in "database publishing", something went awry, not only
superficially, but fundamentally. It is not unlike when a baby,
whose mother satisfies its every need before it is even aware that
it has been expressed, grows up to believe that the world in general
is both influenced by and obliged to satisfy its whims. Even though
nobody in their right mind would argue that babies should fend for
themselves and earn their own living, at some point in the child's
life, it must begin a progression towards independence, which is not
merely a quantitative difference from having every need satisfied by
crying, but a qualitative difference of enormous consequence. Many
an idea or concept not only looks, but /is/ good in its infancy, yet
turns destructive later in life. Scaling and maturation are not the
obvious processes they appear to be because they take so much time
that the accumulated effort is easy to overlook. To be successful,
they must also be very carefully guided by people who can envision
the end result, but that makes it appear to many as if it merely
"happens". Take a good idea out of its infancy, let it age without
guidance so it does not mature, and it generally goes bad. If GML
was an infant, SGML is the bright youngster far exceeds expectations
and made its parents too proud, but XML is the drug-addicted gang
member who had committed his first murder before he had sex, which
was rape.
SGML is a good idea when the markup overhead is less than 2%. Even
attributes is a good idea when the textual element contents is the
"real meat" of the document and attributes only aid processing, so
that the printed version of a fully marked-up document has the same
characters as the document sans tags. Explicit end-tags is a good
idea when the distance between start- and end-tag is more than the
20-line terminal the document is typed on. Minimization is a good
idea in an already sparsely tagged document, both because tags are
hard to keep track of and because clusters of tags are so intrusive.
Character entities is a good idea when your entire character set is
EBCDIC or ASCII. Validating the input prior to processing is a good
idea when processing would take minutes, if not hours, and consume
costly resources, only to abend. SGML had an important potential in
its ability to let the information survive changes in processing
equipment or software where its predecessors clearly failed. But,
to continue the baby metaphor, you have to go into fetishism to keep
using diapers as you age but fail to mature. (I note in passing that
the stereotypical American male longs for much larger than natural
female breasts, presumably to maintain the proportion to his own
size from his infancy, which has caused the stereotypical American
female to feel a need for breasts that will give the next generation
a demand for even more disproportionally large breasts.) When the
markup overhead exceeds 200%, when attributes values and element
contents compete for the information, when the distance between 99%
of the "tags" is /zero/, when the character set is Unicode, and when
validation takes more time than processing, not to mention the sorry
fact that information longevity is more /threatened/ by XML than by
any other data representation in the history of computing, then SGML
has gone from good kid, via bad teenager, to malfunctioning, evil
adult as XML. SGML was in many ways smarter than necessary at the
time it was a bright idea, it was evidence of too much intelligence
applied to the problems it solved. A problem mankind has not often
had to deal with is that of excessive intelligence; more often than
not, technological solutions are barely intelligent enough to solve
the problem at hand. If a solution is much smarter than the problem
and really stupid people notice it, they believe they have got their
hands on something /great/, and so they destroy it, not unlike how
giving stupid people too much power can threaten world peace and
unravel legal concepts like due process and presumption of innocence.
I once believed that it would be very beneficial for our long-term
information needs to adorn the text with as much meta-information as
possible. I still believe that the world would be far better off if
it had evolved standardized syntactic notations for time, location,
proper names, language, etc, and that even prose text would be
written in such a way that precision in these matters would not be
sacrificed, but most people are so obsessively concerned with their
immediate personal needs that anything that could be beneficial on a
much larger scale have no chance of surviving. Look at the United
States of America, with its depressingly moronic units instead of
going metric, with its inability to write dates in either ascending
or descending order of unit size, and with its insistence upon the
12-hour clock, clearly evidencing the importance of the short-term
pain threshold and resistance to doing anyone else's bidding. And
now the one-time freest nation of the world has turned dictatorship
with a dangerous moron in charge, set to attack Iraq to revenge his
father's loss. Those who laughed when I said that stupidity is the
worst threat to mankind laugh no more; they wait with bated breath
to see if the world's most powerful incoherent moron will launch the
world into a world war simply because he is too fucking stupid. But
what really pisses me off is the spineless American people who fails
to stop this madness. Presidents have been shot and killed before.
I seem to be digressing -- the focal point is that the masses, those
who exert no effort to better themselves, cannot be expected to help
solve any problems larger than their own, and so they must be forced
by various means, such as compulsory education, spelling checkers,
newspaper editors who do /not/ publish their letters to the editor,
and not least by the courts that restrain the will to revenge, in
order to keep a modicum of sanity in the frail structure that is
human society. We are clearly not at the stage of human development
where writers are willing to accept the burden of communicating to
the machine what they are thinking. One has to marvel at the wide
acceptance of our existing punctuation marks and the sociology of
their acceptance. "Tagging" text for semantic constructs that the
human mind is able to discern from context must be millennia off.
In many ways, the current American presidency and XML have much in
common. Both have clear lineages back to very intelligent people.
Both demonstrate what happens when you give retards the tools of the
intelligent. Some Americans obsess over gun control, to limit the
number of handguns in the hands of their civilians, but support the
most out-of-control nutcase in the young history of the nation and
rally behind his world-threatening abuse of guns. The once noble
concern over validation to curb excessive costs of too powerful a
tool for the people who used it, has turned into an equally insane
abuse of power in the XML world. How could such staggering idiots
as have become "leaders" of the XML world and the free world come to
their power? Clearly, they gain support from the masses who have no
concerns but their immediate needs, no ability to look for long-term
solutions and stability, no desire to think further ahead than that
each individual decision they make be the best for them. Lethargy
and pessimism, lack of long-term goals, apathy towards consequences,
they are all symptoms of depressed people, and it is perhaps no
coincidence that the world economy is now in a depression. My take
on it is that it is because too much growth also rewarded people of
such miniscule intellectual prowess that they turned to fraud rather
than tackle the coming negative trends intelligently. Whether Enron
or W3C or the GOP, everyone knows that fraud does pay in the short
term and that bad money drives out good. When even the staggering
morons are rewarded, the honest and intelligent must lose, and even
the best character will have a problem when being honest means that
he forfeits a chance to received a hundred million dollars. In both
the Bush administration and the W3C standards administration, we see
evidence that large groups of people did not believe that it would
matter who assumed power. I am quite certain that just as Bush is
supposed to be a thoroughly /likable/ person, the people who work up
the most demented "standards" in the W3C lack that personality trait
that is both abrasive and exhibit leadership potential. When the
overall growth of something is so rapid that an idiotic decision no
longer causes any immediate losses, the number of such decisions
will grow without bounds until the losses materialize, such as in an
economic depression. When the losses are so diffused as to not even
affect the idiots behind the decisions, they can stay in power for a
very long time until they are blamed for a large number of ills they
had no power to predict, but that is precisely what caused them.
| I use XML on a daily basis and think it is a simple and intelligent
| way to represent data.
A comment on this statement is by now entirely superfluous.
| I would like to hear why you think it is so bad, can you be more
| specific please?
If you really need more information, search the Net, please.
| And how would you improve on it?
A brief summary, then: Remove the syntactic mess that is attributes.
(You will then find that you do not need them at all.) Enclose the
/element/ in matching delimiters, not the tag. These simple things
makes people think differently about how they use the language.
Contrary to the foolish notion that syntax is immaterial, people
optimize the way they express themselves, and so express themselves
differently with different syntaxes. Next, introduce macros that
look exactly like elements, but that are expanded in place between
the reader and the "object model". Then, remove the obnoxious
character entities and escape special characters with a single
character, like \, and name other entities with letters following
the same character. If you need a rich set of publishing symbols,
discover Unicode. Finally, introduce a language for micro-parsers
than can take more convenient syntaxes for commonly used elements
with complex structure and make them /return/ element structures
more suitable for processing on the receiving end, and which would
also make validation something useful. The overly simple regular
expression look-alike was a good idea when processing was expensive
and made all decisions at the start-tag, but with a DOM and less
stream-like processing, a much better language should be specified
that could also do serious computation before validating a document
-- so that once again processing could become cheaper because of the
"markup", not more expensive because of it.
But the one thing I would change the most from a markup language
suitable for marking up the incidental instruction to a type-setter
to the data representation language suitable for the "market" that
XML wants, is to go for a binary representation. The reasons for
/not/ going binary when SGML competed with ODA have been reversed:
When information should survive changes in the software, it was an
important decision to make the data format verbose enough that it
was easy to implement a processor for it and that processors could
liberally accept what other processors conservatively produced, but
now that the data formats that employ XML are so easily changed
that the software can no longer keep up with it, we need to slam on
the breaks and tell the redefiners to curb their enthusiasm, get it
right before they share their experiments with the world, and show
some respect for their users. One way to do that is to increase the
cost of changes to implementations without sacrificing readability
and without making the data format more "brittle", by going binary.
Our information infrastructure has become so much better that the
nature of optimization for survivability has changed qualitatively.
The question of what we humans need to read and write no longer has
any bearing on what the computers need to work with. One of the
most heinous crimes against computing machinery is therefore to
force them to parse XML when all they want is the binary data. As
an example, think of the Internet Protocol and Transmission Control
Protocol in XML terms. Implementors of SNMP regularly complained
that parsing the ASN.1 encodings took a disproportionate amount of
processing time, but they also acknowledged that properly done, it
mapped directly to the values they needed to exchange. Now, think
of what would have happened had it not been a Simple, but instead
some moronic excuse for an eXtensible Network Management Protocol.
Another thing is that we have long had amazingly rich standards for
such "display attributes" as many now use HTML and the like. The
choice to use SGML for web publication was not entirely braindead,
but it should have been obvious from the outset that page display
would become important, if not immediately, then after watching what
people were trying to do with HTML. The Web provided me with a much
needed realization that information cannot be /fully/ separated from
its presentation, and showed me something I knew without verbalizing
explicitly, that the presentation form we choose communicates real
information. Encoding all of it via markup would require a very
fine level of detail, not to mention /awareness/ of issues so widely
dispersed in the population that only a handful of people per
million grasp them. Therefore, to be successful, there must be an
upper limit to the complexity of the language defined with SGML, and
one must go on to solve the next problem, not sit idle with a set of
great tools and think "I ought to use these tools for something".
Stultifying as the language of content models may be, it amazes me
that people do not grasp that they need to use something else when
it becomes too painful to express with SGML, but I am in the highly
privileged position of knowing a lot more than SGML when I pronounce
my judgment on XML. For one thing, I knew Lisp before I saw SGML,
so I know what brilliant minds can do under optimal conditions and
when they ensure that the problem is still bigger than the solution.
--
Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.