From: Erik Naggum <e...@naggum.no>
Newsgroups: comp.lang.lisp
Subject: Re: S-exp vs XML, HTML, LaTeX (was: Why lisp is growing)
Message-ID: <3250033735497397@naggum.no
Date: 28 Dec 2002 03:08:55 +0000

* thelif...@gmx.net (thelifter)
| I don't understand your criticism of XML.

I sometimes regret that human memory is such a great tool for one's personal
life that coming to rely on the wider context it provides in one's
communication with others is so fragile.  I have explained this dozens of
times, but I guess each repetition adds something.

| Basically XML is just another way of writing S-expr or Trees or
| whatever you want to call it.

They are not identical.  The aspects you are willing to ignore are more
important than the aspects you are willing to accept.  Robbery is not just
another way of making a living, rape is not just another way of satisfying
basic human needs, torture is not just another way of interrogation.  And XML
is not just another way of writing S-exps.  There are some things in life that
you do not do if you want to be a moral being and feel proud of what you have
accomplished.

SGML was a major improvement on the markup languages that preceded it
(including GML), which helped create better publishing systems and helped
people think about information in much improved ways, but when the zealots
forgot the publishing heritage and took the notion that information can be
separated from presentation out of the world of publishing into general data
representation because SGML had had some success in "database publishing",
something went awry, not only superficially, but fundamentally.  It is not
unlike when a baby, whose mother satisfies its every need before it is even
aware that it has been expressed, grows up to believe that the world in general
is both influenced by and obliged to satisfy its whims.  Even though nobody in
their right mind would argue that babies should fend for themselves and earn
their own living, at some point in the child's life, it must begin a
progression towards independence, which is not merely a quantitative difference
from having every need satisfied by crying, but a qualitative difference of
enormous consequence.  Many an idea or concept not only looks, but /is/ good in
its infancy, yet turns destructive later in life.  Scaling and maturation are
not the obvious processes they appear to be because they take so much time that
the accumulated effort is easy to overlook.  To be successful, they must also
be very carefully guided by people who can envision the end result, but that
makes it appear to many as if it merely "happens".  Take a good idea out of its
infancy, let it age without guidance so it does not mature, and it generally
goes bad.  If GML was an infant, SGML is the bright youngster far exceeds
expectations and made its parents too proud, but XML is the drug-addicted gang
member who had committed his first murder before he had sex, which was rape.

SGML is a good idea when the markup overhead is less than 2%.  Even attributes
is a good idea when the textual element contents is the "real meat" of the
document and attributes only aid processing, so that the printed version of a
fully marked-up document has the same characters as the document sans tags.
Explicit end-tags is a good idea when the distance between start- and end-tag
is more than the 20-line terminal the document is typed on.  Minimization is a
good idea in an already sparsely tagged document, both because tags are hard to
keep track of and because clusters of tags are so intrusive.  Character
entities is a good idea when your entire character set is EBCDIC or ASCII.
Validating the input prior to processing is a good idea when processing would
take minutes, if not hours, and consume costly resources, only to abend.  SGML
had an important potential in its ability to let the information survive
changes in processing equipment or software where its predecessors clearly
failed.  But, to continue the baby metaphor, you have to go into fetishism to
keep using diapers as you age but fail to mature. (I note in passing that the
stereotypical American male longs for much larger than natural female breasts,
presumably to maintain the proportion to his own size from his infancy, which
has caused the stereotypical American female to feel a need for breasts that
will give the next generation a demand for even more disproportionally large
breasts.)  When the markup overhead exceeds 200%, when attributes values and
element contents compete for the information, when the distance between 99% of
the "tags" is /zero/, when the character set is Unicode, and when validation
takes more time than processing, not to mention the sorry fact that information
longevity is more /threatened/ by XML than by any other data representation in
the history of computing, then SGML has gone from good kid, via bad teenager,
to malfunctioning, evil adult as XML.  SGML was in many ways smarter than
necessary at the time it was a bright idea, it was evidence of too much
intelligence applied to the problems it solved.  A problem mankind has not
often had to deal with is that of excessive intelligence; more often than not,
technological solutions are barely intelligent enough to solve the problem at
hand.  If a solution is much smarter than the problem and really stupid people
notice it, they believe they have got their hands on something /great/, and so
they destroy it, not unlike how giving stupid people too much power can
threaten world peace and unravel legal concepts like due process and
presumption of innocence.

I once believed that it would be very beneficial for our long-term information
needs to adorn the text with as much meta-information as possible.  I still
believe that the world would be far better off if it had evolved standardized
syntactic notations for time, location, proper names, language, etc, and that
even prose text would be written in such a way that precision in these matters
would not be sacrificed, but most people are so obsessively concerned with
their immediate personal needs that anything that could be beneficial on a much
larger scale have no chance of surviving.  Look at the United States of
America, with its depressingly moronic units instead of going metric, with its
inability to write dates in either ascending or descending order of unit size,
and with its insistence upon the 12-hour clock, clearly evidencing the
importance of the short-term pain threshold and resistance to doing anyone
else's bidding.  And now the one-time freest nation of the world has turned
dictatorship with a dangerous moron in charge, set to attack Iraq to revenge
his father's loss.  Those who laughed when I said that stupidity is the worst
threat to mankind laugh no more; they wait with bated breath to see if the
world's most powerful incoherent moron will launch the world into a world war
simply because he is too fucking stupid.  But what really pisses me off is the
spineless American people who fails to stop this madness.  Presidents have been
shot and achine what they are thinking.  One has to marvel at the wide
acceptance of our existing punctuation marks and the sociology of their
acceptance.  "Tagging" text for semantic constructs that the human mind is able
to discern from context must be millennia off.

In many ways, the current American presidency and XML have much in common.
Both have clear lineages back to very intelligent people.  Both demonstrate
what happens when you give retards the tools of the intelligent.  Some
Americans obsess over gun control, to limit the number of handguns in the hands
of their civilians, but support the most out-of-control nutcase in the young
history of the nation and rally behind his world-threatening abuse of guns.
The once noble concern over validation to curb excessive costs of too powerful
a tool for the people who used it, has turned into an equally insane abuse of
power in the XML world.  How could such staggering idiots as have become
"leaders" of the XML world and the free world come to their power?  Clearly,
they gain support from the masses who have no concerns but their immediate
needs, no ability to look for long-term solutions and stability, no desire to
think further ahead than that each individual decision they make be the best
for them.  Lethargy and pessimism, lack of long-term goals, apathy towards
consequences, they are all symptoms of depressed people, and it is perhaps no
coincidence that the world economy is now in a depression.  My take on it is
that it is because too much growth also rewarded people of such miniscule
intellectual prowess that they turned to fraud rather than tackle the coming
negative trends intelligently.  Whether Enron or W3C or the GOP, everyone knows
that fraud does pay in the short term and that bad money drives out good.  When
even the staggering morons are rewarded, the honest and intelligent must lose,
and even the best character will have a problem when being honest means that he
forfeits a chance to received a hundred million dollars.  In be W3C standards
administration, we see evidence that large groups of people did not believe
that it would matter who assumed power.  I am quite certain that just as Bush
is supposed to be a thoroughly /likable/ person, the people who work up the
most demented "standards" in the W3C lack that personality trait that is both
abrasive and exhibit leadership potential.  When the overall growth of
something is so rapid that an idiotic decision no longer causes any immediate
losses, the number of such decisions will grow without bounds until the losses
materialize, such as in an economic depression.  When the losses are so
diffused as to not even affect the idiots behind the decisions, they can stay
in power for a very long time until they are blamed for a large number of ills
they had no power to predict, but that is precisely what caused them.

| I use XML on a daily basis and think it is a simple and intelligent
| way to represent data.

A comment on this statement is by now entirely superfluous.

| I would like to hear why you think it is so bad, can you be more
| specific please?

If you really need more information, search the Net, please.

| And how would you improve on it?

A brief summary, then: Remove the syntactic mess that is attributes.  (You will
then find that you do not need them at all.)  Enclose the /element/ in matching
delimiters, not the tag.  These simple things makes people think differently
about how they use the language.  Contrary to the foolish notion that syntax is
immaterial, people optimize the way they express themselves, and so express
themselves differently with different syntaxes.  Next, introduce macros that
look exactly like elements, but that are expanded in place between the reader
and the "object model".  Then, remove the obnoxious character entities and
escape special characters with a single character, like \, and name other
entities with letters following the same character.  If you need a rich set of
publishing symbols, discover Unicode.  Finally, introduce a language for
micro-parsers than can take more convenient syntaxes for commonly used elements
with complex structure andtable for processing on the receiving end, and which
would also make validation something useful.  The overly simple regular
expression look-alike was a good idea when processing was expensive and made
all decisions at the start-tag, but with a DOM and less stream-like processing,
a much better language should be specified that could also do serious
computation before validating a document -- so that once again processing could
become cheaper because of the "markup", not more expensive because of it.

But the one thing I would change the most from a markup language suitable for
marking up the incidental instruction to a type-setter to the data
representation language suitable for the "market" that XML wants, is to go for
a binary representation.  The reasons for /not/ going binary when SGML competed
with ODA have been reversed: When information should survive changes in the
software, it was an important decision to make the data format verbose enough
that it was easy to implement a processor for it and that processors could
liberally accept what other processors conservatively produced, but now that
the data formats that employ XML are so easily changed that the software can no
longer keep up with it, we need to slam on the breaks and tell the redefiners
to curb their enthusiasm, get it right before they share their experiments with
the world, and show some respect for their users.  One way to do that is to
increase the cost of changes to implementations without sacrificing readability
and without making the data format more "brittle", by going binary.  Our
information infrastructure has become so much better that the nature of
optimization for survivability has changed qualitatively.  The question of what
we humans need to read and write no longer has any bearing on what the
computers need to work with.  One of the most heinous crimes against computing
machinery is therefore to force them to parse XML when all they want is the
binary data.  As an example, think of the Internet Protocol and Transmission
Control Protocol in XML terms.  Implementors of SNMP regularly complained that
parsing the ASN.1 encodinunt of processing time, but they also acknowledged
that properly done, it mapped directly to the values they needed to exchange.
Now, think of what would have happened had it not been a Simple, but instead
some moronic excuse for an eXtensible Network Management Protocol.

Another thing is that we have long had amazingly rich standards for such
"display attributes" as many now use HTML and the like.  The choice to use SGML
for web publication was not entirely braindead, but it should have been obvious
from the outset that page display would become important, if not immediately,
then after watching what people were trying to do with HTML.  The Web provided
me with a much needed realization that information cannot be /fully/ separated
from its presentation, and showed me something I knew without verbalizing
explicitly, that the presentation form we choose communicates real information.
Encoding all of it via markup would require a very fine level of detail, not to
mention /awareness/ of issues so widely dispersed in the population that only a
handful of people per million grasp them.  Therefore, to be successful, there
must be an upper limit to the complexity of the language defined with SGML, and
one must go on to solve the next problem, not sit idle with a set of great
tools and think "I ought to use these tools for something".  Stultifying as the
language of content models may be, it amazes me that people do not grasp that
they need to use something else when it becomes too painful to express with
SGML, but I am in the highly privileged position of knowing a lot more than
SGML when I pronounce my judgment on XML.  For one thing, I knew Lisp before I
saw SGML, so I know what brilliant minds can do under optimal conditions and
when they ensure that the problem is still bigger than the solution.

-- 
Erik Naggum, Oslo, Norway