cat -v harmful stuff

S-exp vs XML, by Erik Naggum

From: Erik Naggum erik@naggum.no

Newsgroups: comp.lang.lisp
Subject: Re: S-exp vs XML, HTML, LaTeX (was: Why lisp is growing)
Date: 28 Dec 2002 03:08:55 +0000
Organization: Naggum Software, Oslo, Norway
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

* thelif...@gmx.net (thelifter)
| I don't understand your criticism of XML.

  I sometimes regret that human memory is such a great tool for one's
  personal life that coming to rely on the wider context it provides
  in one's communication with others is so fragile.  I have explained
  this dozens of times, but I guess each repetition adds something.

| Basically XML is just another way of writing S-expr or Trees or
| whatever you want to call it.

  They are not identical.  The aspects you are willing to ignore are
  more important than the aspects you are willing to accept.  Robbery
  is not just another way of making a living, rape is not just another
  way of satisfying basic human needs, torture is not just another way
  of interrogation.  And XML is not just another way of writing S-exps.
  There are some things in life that you do not do if you want to be a
  moral being and feel proud of what you have accomplished.

  SGML was a major improvement on the markup languages that preceded
  it (including GML), which helped create better publishing systems
  and helped people think about information in much improved ways, but
  when the zealots forgot the publishing heritage and took the notion
  that information can be separated from presentation out of the world
  of publishing into general data representation because SGML had had
  some success in "database publishing", something went awry, not only
  superficially, but fundamentally.  It is not unlike when a baby,
  whose mother satisfies its every need before it is even aware that
  it has been expressed, grows up to believe that the world in general
  is both influenced by and obliged to satisfy its whims.  Even though
  nobody in their right mind would argue that babies should fend for
  themselves and earn their own living, at some point in the child's
  life, it must begin a progression towards independence, which is not
  merely a quantitative difference from having every need satisfied by
  crying, but a qualitative difference of enormous consequence.  Many
  an idea or concept not only looks, but /is/ good in its infancy, yet
  turns destructive later in life.  Scaling and maturation are not the
  obvious processes they appear to be because they take so much time
  that the accumulated effort is easy to overlook.  To be successful,
  they must also be very carefully guided by people who can envision
  the end result, but that makes it appear to many as if it merely
  "happens".  Take a good idea out of its infancy, let it age without
  guidance so it does not mature, and it generally goes bad.  If GML
  was an infant, SGML is the bright youngster far exceeds expectations
  and made its parents too proud, but XML is the drug-addicted gang
  member who had committed his first murder before he had sex, which
  was rape.

  SGML is a good idea when the markup overhead is less than 2%.  Even
  attributes is a good idea when the textual element contents is the
  "real meat" of the document and attributes only aid processing, so
  that the printed version of a fully marked-up document has the same
  characters as the document sans tags.  Explicit end-tags is a good
  idea when the distance between start- and end-tag is more than the
  20-line terminal the document is typed on.  Minimization is a good
  idea in an already sparsely tagged document, both because tags are
  hard to keep track of and because clusters of tags are so intrusive.
  Character entities is a good idea when your entire character set is
  EBCDIC or ASCII.  Validating the input prior to processing is a good
  idea when processing would take minutes, if not hours, and consume
  costly resources, only to abend.  SGML had an important potential in
  its ability to let the information survive changes in processing
  equipment or software where its predecessors clearly failed.  But,
  to continue the baby metaphor, you have to go into fetishism to keep
  using diapers as you age but fail to mature. (I note in passing that
  the stereotypical American male longs for much larger than natural
  female breasts, presumably to maintain the proportion to his own
  size from his infancy, which has caused the stereotypical American
  female to feel a need for breasts that will give the next generation
  a demand for even more disproportionally large breasts.)  When the
  markup overhead exceeds 200%, when attributes values and element
  contents compete for the information, when the distance between 99%
  of the "tags" is /zero/, when the character set is Unicode, and when
  validation takes more time than processing, not to mention the sorry
  fact that information longevity is more /threatened/ by XML than by
  any other data representation in the history of computing, then SGML
  has gone from good kid, via bad teenager, to malfunctioning, evil
  adult as XML.  SGML was in many ways smarter than necessary at the
  time it was a bright idea, it was evidence of too much intelligence
  applied to the problems it solved.  A problem mankind has not often
  had to deal with is that of excessive intelligence; more often than
  not, technological solutions are barely intelligent enough to solve
  the problem at hand.  If a solution is much smarter than the problem
  and really stupid people notice it, they believe they have got their
  hands on something /great/, and so they destroy it, not unlike how
  giving stupid people too much power can threaten world peace and
  unravel legal concepts like due process and presumption of innocence.

  I once believed that it would be very beneficial for our long-term
  information needs to adorn the text with as much meta-information as
  possible.  I still believe that the world would be far better off if
  it had evolved standardized syntactic notations for time, location,
  proper names, language, etc, and that even prose text would be
  written in such a way that precision in these matters would not be
  sacrificed, but most people are so obsessively concerned with their
  immediate personal needs that anything that could be beneficial on a
  much larger scale have no chance of surviving.  Look at the United
  States of America, with its depressingly moronic units instead of
  going metric, with its inability to write dates in either ascending
  or descending order of unit size, and with its insistence upon the
  12-hour clock, clearly evidencing the importance of the short-term
  pain threshold and resistance to doing anyone else's bidding.  And
  now the one-time freest nation of the world has turned dictatorship
  with a dangerous moron in charge, set to attack Iraq to revenge his
  father's loss.  Those who laughed when I said that stupidity is the
  worst threat to mankind laugh no more; they wait with bated breath
  to see if the world's most powerful incoherent moron will launch the
  world into a world war simply because he is too fucking stupid.  But
  what really pisses me off is the spineless American people who fails
  to stop this madness.  Presidents have been shot and killed before.
  I seem to be digressing -- the focal point is that the masses, those
  who exert no effort to better themselves, cannot be expected to help
  solve any problems larger than their own, and so they must be forced
  by various means, such as compulsory education, spelling checkers,
  newspaper editors who do /not/ publish their letters to the editor,
  and not least by the courts that restrain the will to revenge, in
  order to keep a modicum of sanity in the frail structure that is
  human society.  We are clearly not at the stage of human development
  where writers are willing to accept the burden of communicating to
  the machine what they are thinking.  One has to marvel at the wide
  acceptance of our existing punctuation marks and the sociology of
  their acceptance.  "Tagging" text for semantic constructs that the
  human mind is able to discern from context must be millennia off.

  In many ways, the current American presidency and XML have much in
  common.  Both have clear lineages back to very intelligent people.
  Both demonstrate what happens when you give retards the tools of the
  intelligent.  Some Americans obsess over gun control, to limit the
  number of handguns in the hands of their civilians, but support the
  most out-of-control nutcase in the young history of the nation and
  rally behind his world-threatening abuse of guns.  The once noble
  concern over validation to curb excessive costs of too powerful a
  tool for the people who used it, has turned into an equally insane
  abuse of power in the XML world.  How could such staggering idiots
  as have become "leaders" of the XML world and the free world come to
  their power?  Clearly, they gain support from the masses who have no
  concerns but their immediate needs, no ability to look for long-term
  solutions and stability, no desire to think further ahead than that
  each individual decision they make be the best for them.  Lethargy
  and pessimism, lack of long-term goals, apathy towards consequences,
  they are all symptoms of depressed people, and it is perhaps no
  coincidence that the world economy is now in a depression.  My take
  on it is that it is because too much growth also rewarded people of
  such miniscule intellectual prowess that they turned to fraud rather
  than tackle the coming negative trends intelligently.  Whether Enron
  or W3C or the GOP, everyone knows that fraud does pay in the short
  term and that bad money drives out good.  When even the staggering
  morons are rewarded, the honest and intelligent must lose, and even
  the best character will have a problem when being honest means that
  he forfeits a chance to received a hundred million dollars.  In both
  the Bush administration and the W3C standards administration, we see
  evidence that large groups of people did not believe that it would
  matter who assumed power.  I am quite certain that just as Bush is
  supposed to be a thoroughly /likable/ person, the people who work up
  the most demented "standards" in the W3C lack that personality trait
  that is both abrasive and exhibit leadership potential.  When the
  overall growth of something is so rapid that an idiotic decision no
  longer causes any immediate losses, the number of such decisions
  will grow without bounds until the losses materialize, such as in an
  economic depression.  When the losses are so diffused as to not even
  affect the idiots behind the decisions, they can stay in power for a
  very long time until they are blamed for a large number of ills they
  had no power to predict, but that is precisely what caused them.

| I use XML on a daily basis and think it is a simple and intelligent
| way to represent data.

  A comment on this statement is by now entirely superfluous.

| I would like to hear why you think it is so bad, can you be more
| specific please?

  If you really need more information, search the Net, please.

| And how would you improve on it?

  A brief summary, then: Remove the syntactic mess that is attributes.
  (You will then find that you do not need them at all.)  Enclose the
  /element/ in matching delimiters, not the tag.  These simple things
  makes people think differently about how they use the language.
  Contrary to the foolish notion that syntax is immaterial, people
  optimize the way they express themselves, and so express themselves
  differently with different syntaxes.  Next, introduce macros that
  look exactly like elements, but that are expanded in place between
  the reader and the "object model".  Then, remove the obnoxious
  character entities and escape special characters with a single
  character, like \, and name other entities with letters following
  the same character.  If you need a rich set of publishing symbols,
  discover Unicode.  Finally, introduce a language for micro-parsers
  than can take more convenient syntaxes for commonly used elements
  with complex structure and make them /return/ element structures
  more suitable for processing on the receiving end, and which would
  also make validation something useful.  The overly simple regular
  expression look-alike was a good idea when processing was expensive
  and made all decisions at the start-tag, but with a DOM and less
  stream-like processing, a much better language should be specified
  that could also do serious computation before validating a document
  -- so that once again processing could become cheaper because of the
  "markup", not more expensive because of it.

  But the one thing I would change the most from a markup language
  suitable for marking up the incidental instruction to a type-setter
  to the data representation language suitable for the "market" that
  XML wants, is to go for a binary representation.  The reasons for
  /not/ going binary when SGML competed with ODA have been reversed:
  When information should survive changes in the software, it was an
  important decision to make the data format verbose enough that it
  was easy to implement a processor for it and that processors could
  liberally accept what other processors conservatively produced, but
  now that the data formats that employ XML are so easily changed
  that the software can no longer keep up with it, we need to slam on
  the breaks and tell the redefiners to curb their enthusiasm, get it
  right before they share their experiments with the world, and show
  some respect for their users.  One way to do that is to increase the
  cost of changes to implementations without sacrificing readability
  and without making the data format more "brittle", by going binary.
  Our information infrastructure has become so much better that the
  nature of optimization for survivability has changed qualitatively.
  The question of what we humans need to read and write no longer has
  any bearing on what the computers need to work with.  One of the
  most heinous crimes against computing machinery is therefore to
  force them to parse XML when all they want is the binary data.  As
  an example, think of the Internet Protocol and Transmission Control
  Protocol in XML terms.  Implementors of SNMP regularly complained
  that parsing the ASN.1 encodings took a disproportionate amount of
  processing time, but they also acknowledged that properly done, it
  mapped directly to the values they needed to exchange.  Now, think
  of what would have happened had it not been a Simple, but instead
  some moronic excuse for an eXtensible Network Management Protocol.

  Another thing is that we have long had amazingly rich standards for
  such "display attributes" as many now use HTML and the like.  The
  choice to use SGML for web publication was not entirely braindead,
  but it should have been obvious from the outset that page display
  would become important, if not immediately, then after watching what
  people were trying to do with HTML.  The Web provided me with a much
  needed realization that information cannot be /fully/ separated from
  its presentation, and showed me something I knew without verbalizing
  explicitly, that the presentation form we choose communicates real
  information.  Encoding all of it via markup would require a very
  fine level of detail, not to mention /awareness/ of issues so widely
  dispersed in the population that only a handful of people per
  million grasp them.  Therefore, to be successful, there must be an
  upper limit to the complexity of the language defined with SGML, and
  one must go on to solve the next problem, not sit idle with a set of
  great tools and think "I ought to use these tools for something".
  Stultifying as the language of content models may be, it amazes me
  that people do not grasp that they need to use something else when
  it becomes too painful to express with SGML, but I am in the highly
  privileged position of knowing a lot more than SGML when I pronounce
  my judgment on XML.  For one thing, I knew Lisp before I saw SGML,
  so I know what brilliant minds can do under optimal conditions and
  when they ensure that the problem is still bigger than the solution.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.