Why is SVG going to be REALLY BIG?


Abstract


Why is SVG going to be REALLY BIG? There are all the ordinary reasons that SVG will be big. Most of us involved in the SVG community can recite them by heart: it is an open standard; it uses XML; the graphics are scalable; server-side processing is minimized with much of the processing being off-loadable to the client; it works in concert with JavaScript, XML DOM, CSS, AJAX and HTML; browser support is almost universal (with a shrinking list of exceptions); SMIL is utterly cool; et cetera. But, why is it going to be REALLY BIG? Well, it has to do with the nature of communication itself. Communication has suffered a number of setbacks over the years, not the least of which was the advent of the alphabet. SVG offers the possibility of expanding the bandwidth of communication in terms of the rate at which humans can produce and consume information. In short, it offers a paradigm shift for which the Internet is just the first of a series of developments that will constitute a revolution in human communication. It will be argued that SVG, in that revolution, is actually more important than HTML and that future historians will look upon the twenty year radius of the present as the time that the World Wide Web and SVG came into existence. HTML will be largely forgotten.

Consider the relationship between language and thought. Thought is non-linear. It is cross-referential. It is sometimes verbal (or lexical), and sometimes it is not. There are many times that we experience the phenomenon of having a thought first, and only afterward, attempting to put that thought into words. That is, not all thoughts are composed of words, though many can be translated into words. While differing theories about the evolution of human language exist, some linguists postulate that human communication, prior to spoken language may have been more gestural and less auditory. Regardless of this, there is good evidence that humans did develop an oral tradition prior to the development of written language. There is also evidence that many of the written languages were ideographic and spatial, rather than alphabetic and sequential. The first of our mistakes in developing written language may have been to pattern our writing after our speech, rather than patterning it after our underlying thoughts. Given the advances in printing and distribution of manuscripts that the Internet has afforded, perhaps we no longer need to be bound by the shortcomings that historical accident has infused into our written expression. SVG is key in making the next step.


Table of Contents

Major advances in the technology of communication.
Why the development of speech was a mistake
Why the development of the alphabet was a bigger mistake
How HTML is a linear medium like speech and the alphabet.
How thought is non-linear.
How SVG is non-linear.
How graph theory plus semantics plus pictures can convey thoughts in a cross cultural way.
How SVG can get where we need to go quicker than HTML can
Bibliography

The SVG community's artist friend, Jerry Maddox, has worked with SVG for many years. He has encouraged me, perhaps in not so many words, to make this talk and paper a bit provocative. Certainly, at least, I think he would hope that a bit of historical breadth be brought to the table by another senior member of the SVG community. As such I will lay into this paper large slabs of speculation salted with morsels of hyperbole, for your reading enjoyment and consternation.

A.S. Diamond on the history and origin of language, like many scholars, speculates on the emergence of language as we stepped out of the primordial ooze as a species. Just what was it that first brought us to both raise our eyebrows knowingly and grunt at the same time?

One of the most readable and entertaining accounts comes from Lincoln Barnett’s Treasure of our Tongue [Barnett1965], a celebrated account of popular linguistics by a non-linguist. He summarizes the competing theories of the day (which appear not that different from the competing theories of this day) as the bow-wow (onomatopoeic), pooh-pooh (mammalian anatomy), yo-he-ho (differentiated grunts) , ta-ta (a Darwinian idea based on co-evolution of gesture and grunt) , ding-dong ( attributed to Max Muller but seeming to have a bit of Jung thrown in), or sing-song (again due to Darwin, speech began as song) . My own view is perhaps a fusion of more than one of these, but like Darwin’s is based on observation. In my case it comes from observing humans rather than reptiles. As such, I present them more for sake of contemplation than persuasion.

In the good-old-days when humans were out camping and grilling caribou and asparagus over an open fire and eating blueberries and honey for dessert, our hands were largely free for communicating. If we were not eating or hunting or preparing food or tools, our hands were able to do many things. We could make mudras, play charades, point, and use multi-finger control on our audience’s visual displays. There are times, certainly, that the pointer is much richer than a keyboard and times when a body-full of pointers is richer than six keyboards. In fact, this theory of language holds that the oral tradition was not really just oral. It was oro-gestural! Whether oral or gestural preceeded one another is rather irrelevant. Gestures work quite well for certain communication even when there is no shared spoken language, as is well known to the international tourist.

At some time, humans developed speech, In the next section I argue that this was a step backwards in the evolution of the species, but it did happen and there is a plenitude of historical and comparative linguistic evidence to suggest that happen it did, and that it happened long before writing systems emerged.

One Julian Jaynes while a faculty member at Princeton made quite a name for himself by positing that a major landmark in human neurological evolution happened between the writing of the Odyssey ( a largely oral tradition finally recorded) and the Iliad. He saw the difference as signaling the develpment of the corpus collosum and its remarkable ability to prevent the left half from knowing what the right half is doing. At any rate the transition from an oral tradition to a written one happened sometime during the early development of writing systems. However with the exception of a few ideological systems, most writing systems have sought to encode not what ideas mean but rather what words have been used to say those ideas. We have chosen the spoken word itself as the unit of speech to serve as the basis for our writing system.

The most troubling aspect of this is that speech itself is a low bandwidth channel compared to vision, and when choosing to make our ideas visible, most cultures have turned with their writing system (whether throught alphabet, syllabary, or ideography) to focus on speech for inspiration. The idea is that two people within the same language group would be able to "read writing" and come up with the same words in conveying said idea to yet another person. First however, speech is slow compared to vision, and when we choose speech (with all of the imperfection of its mapping to ideas) as the basis of a new visual form of communication we have chosen a very flawed metaphor for ideas themselves. Writing had its advantages though, since ideas could travel farther than either gestures or speech. And writing became a way to hold the new empires together.

After the development of oral language, followed by the written language, the next major transition usually posited by scholars tends to be the advent of the printing press. Now longer was the "word" controlled by just the monarchies or the Church each with their own agenda, but suddenly words were free to migrate openly through much of Europe (there was, after all a Renaissance going on). This led to all manner of upheaval of the status quo. History sort of ambled along for another 500 years with minor and major squabbles and wars consuming much of Europe and Asia, until suddenly one day with no warning nor prior art, the Internet was given us as some sort of mysterious and divine act. With the Internet came new ways of packaging and broadcasting information, and new ways of distributing and accessing it. The Internet is usually seen as the next major technology of writing and its implications are predicted to be every bit as profound as the three previous advances in the technology of communication.

So back to the pre-oral days of human communication: that period when the gesture was worth a thousand grunts. We humans were in a pre-literate day of oro-gestural communication, getting along quite well as hunters and gatherers. Then one day, some pharaoh or Emporer Cuzco in a distant land teaches his followers to cultivate and pick vegetables. Suddenly the hands that are the mainstay of human communication are stripped of their millennia–old role of communicating poetry and folklore, and assigned instead to the inglorious job of picking vegetables for the rulers of the newly emerging empires. The empires become so bloated on the over abundance of sugar beets and barley, so opulent and corpulent that their thirst for new riches and new sources of gluttony pushes the empire further and further until at last writing is invented to bring messages from point A to B. In the earlier days experiments with gestural writing (pictograms) were tried, but since the peasants (the majority after all) have lost the ability to gesture (which is just as well for keeping their communication bandwidth low) even the nobility starts to lose the ability to communicate except in the painstaking manner of grunts that have been codified as speech.

Almost 30 years ago, a student of mine who was doing an independent study with me on the linguistics of American Sign Language (a topic she knew far more about than I), believed it was time for the professor to actually see what ASL looked like when it was spoken. We went to a party for the hearing impaired, and it made an enduring impression. Beyond the experience of having her translate to me the rich ideas being expressed with such ease and facility in ASL, I also saw something done which would not have been feasible using speech. Three people standing in a triangle, were all “speaking” at the same time. Furthermore, they were all smiling and responding to one another as though they understood. When hearing people speak, not even two people can talk at the same time and still “hear” each other, but here were three people, apparently signing and understanding all at the same time. (for those who do not know it, ASL only resorts to an alphabet when words that have no gesture need to be conveyed -- things like personal names, product names, animals never-before encountered, or infrared radiation -- things much like my semantic primitives did not encode well). I asked my friend if this were common, and she said it was.

I reasoned as follows. First the overall bandwidth (in terms of bits per second) of the visual system is probably far greater than that of the auditory system. (see Edward Tufte's comments on the visual system that purportedly handles about 10 megabits per second. http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0002NC ) Additoinally, the rate at which we process visual information in terms of identifying, labeling and understanding what we see is probably far greater than the capacity of the auditory system. Speech can be processed and understood several orders of magnitude less than the rate at which the retina conveys information to the brain though it is not obvious how we might measure the meaningful comprehension of visual information. Nevertheless, given the human's well developed visual cortex , it stands to reason that we might indeed be able to process considerably more information visually than auditorially. The notion therefore that speech is an “advance” over gesture as a means of communicating can thus be held somewhat suspect.

Thus seen, the development of speech is not a glorious triumph of the human intellect, but a miserable setback caused by the greed of monarchial agribusiness. Humans used to tell good stories, until their hands got too busy picking vegetables.

Okay, so a setback occurs. It might not have been so bad, if we did not then stoop a step further. Most of our cultures then decided to invent writing (to ensure that messages could be sent intact across the vast agri-empire) and, adding insult to injury, we patterned our writing not after our thoughts (which were far richer than our grunts) but after our audible grunts known as speech themselves. Ah , eh , iii, oh, oooo! Yabadabadoo!

And to think we actually wrote this nonsense down and called it literature and then came to revere literature as though it had something to do with the human spirit. The selling of the alphabet was the greatest boondoggle ever. It slowed down the speed of communication between all humans, hence providing an inexorable momentum for preserving the status quo.

I mentioned that this would be provocative for the mere sake of provocation. The reader is under no more obligation to believe any of this then she is to believe the QWERTY keyboard is the best way to communicate with fellow humans!

HTML at its core is (with the exception of the <table> element and possibly other elements related to spatial arrangement) 1.5 dimensional: that is, its fundamental metaphor consists of written speech (text), with occasional embedded belches of multimedia (<object>, <img>, <audio>, <video>) plus graph theoretic cross-references that provide a modest foray into translinearity.

For alphabet A={a1, a2, ... , an } and graphics G={G1, G2, ...Gk} (where A intersect G is empty) we may represent a typical text as


For vocabulary V={w1, w2, ... , wn }, including graphics G and anaphora H={h1, h2, ...hk} (where G and H are subsets of V). An example of anaphora might include H={this, that, which, he, her}

linear text with internal and external links

Figure 2. The hypertext -- words and pix in sequence with references.


How thought is non-linear.

Consider the following sentence:

[1] I concluded that the woman at the table and the man who was with her refused to look at me since they thought I wanted them to give me a tip.

Such thoughts are not uncommonly found in the ordinary privacy of our individual cerebra; such utterances are not uncommon in common discourse. We talk about people's motives, our interpretations of them and why we think they thought something about our own motives. It is all a common part of the human experience and the languages of the world have become rich with modalilty, aspect and grammar to help accommodate these nuances of the hypothetical realm of social motives.

Let us consider the some of the inferences that it conveys either directly or indirectly

All these micro-inferences are derivable in the best of Chomskian, post-Chomskian, Lakoffian and post-Lakoffian linguistics. In fact, take any adjective Q that can be prepended successfully as both Q-linguistics and post-Q-linguistics and all inferences above seem to be natural inferences from the spoken phrase. (It is good that we never had the phrase Skinnerian linguistics or my claim would be false.)

Let us analyze more carefully:

[1]I concluded that the woman at the table and the man who was with her refused to look at me since they thought I wanted them to give me a tip.

The narrator clearly perceives that the man is with the woman. That the man and the woman would be collectively engaged in any thoughts about him/her is clearly speculative. For the sake of analysis of the time frame of these cognitive events, we may consider that the narrator's perception (that they refused to look at him) happened prior to his postulation of a reason. That is, we have a putative time1 at which the alleged refusal to look occurred and that precedes, in time, a time2 at which the conclusion about motives was reached. We may view each instance of speculation about another individual as the invention of a hypothetical clone of that other individual: the individual cloned into a world of speculation. Our hypothesis creates a clone with properties much like the original, but with imaginary motives and possible futures. This, for those, concerned about being ethno-centric need not worry. Examine the rich structure of aspect in Navajo verbs or the verb-based agglutination and postpositions of Quechua and you will see the subtlety of these nuances of relative time, the onset of behavior and the probability of continuation, expressed grammatically.

Some of the many connections between entities in this sentence can be seen as follows:

The sum of these inferential dependencies, and hence of the Chomskian deep structure might be diagrammed as follows. [hypotheticalMe.jpg]

It is argued that all of the drawn connections are in one form or another required to map the inferential semantics of the base structure.

Figure one portrays one particular semantic/syntactic representation of the underlying inferential structure embodied by the sentence [1] "I concluded that the woman at the table and the man who was with her refused to look at me since they thought I wanted them to give me a tip."

Semantic inferential structure associated with sentence [1]

Figure 3. Semantic inferential structure associated with sentence [1]


Of course, every linguist and psycholinguist since 1970 has probably advanced her own diagrams of such a sentence together with powerful arguments as to why all other representations were flawed. This is actually a good thing, since it allows those outside the fray to see that all of the diagrams are at some fundamental level equivalent. Since the above is a simple directed graph with nodes labeled by simple semantic elements with subscripts, one can see some superficial resemblance between it and certain representations of Turing machines when drawn as directed graphs with labeled nodes.

The reason for this argument is that many of the readers of this paper may not have much familiarity with the cognitive science of the 1970's when such things as behaviorism and Bloomfieldian linguistics were being supplanted (with an enthusiasm typical of the era) by cognitive science, neural networks, the progenitors of the world wide web and objective yet semantic theories of language. It is important in this context, because of the claim that while spoken (and written) language are primarily linear, owing to their allegiance to the portrayal of speech, the underlying structures of meaning are non-linear.

Any reasonable diagram of the meaning of a sentence will require at least two dimensions. After dabbling with diagrams of the meanings of utterances such as these :

Table 1.Â
1972 Experiment A: mapping semantic inferential content 1972 Experiment B: mapping semantic inferential content


I became intrigued by the graph theoretic question of whether or not, given any canonical illustration of the inferentially relevant relationships between semantic elements in an utterance, could that diagram be drawn in the plane without crossing lines?

It was trivial to demonstrate that ideas were non-linear (from a graph theoretic perspective). The mere facts that

  1. every language appears to use anaphora (pronouns and other pointers to previously developed concepts)

  2. many languages wrestle with the issue of scope of quantification and scope of negation inventing such constructs as doo…da (Navajo), ne … pas (French) and mana… cu (Quechua) (I recall from Mongolian class that that language did the same thing)

  3. ambiguity for referents persists in sentences like these due to Lakoff: The city council refused to give the women a permit to demonstrate because they X-ed Y. When X and Y are "feared violence" then "they" refers to the city council But when X and Y are -- "advocated revolution" then "they' refers to the women

All these things made it clear that semantics was nonlinear: the graph of its lexical identification could not be drawn as a graph embedded on the line, nor even on the circle. Cross segmental links, defying a linear embedding must always be made for proper semantics. Deep structures (to use Chomsky's term) or meanings to use another term are nonlinear.

Well, that stands to reason. Humans can think about relationships that are non-linear so shouldn't our thoughts reflect a bit of the intrinsic structure or what we think about. Is there any reason to think that the cognitive or neurological topology of an idea like a Klein bottle could somehow be implemented in a linear strand of simple automata or neurons? In the book Flatland by Edwin Abbott, we are asked to contemplate whether a two dimensional brain can really comprehend the third dimension even if given incontrovertible proof of its existence. Taken to its extreme however, Raymond Cattell should never have been able to invent the MMPI in the 1930's and Charles Osgood should never have been able to factor analyze the dimensions of semantics in the 1950's and thence discover the semantic differential.

At any rate it was with a sense of glee that I discovered in 1971, with absolute certainty, the impending obsolescence of the written word. A small number of semantic primitives (far smaller than Charles Ogden's 850 rather mushy lexicon of Basic English) was sufficient to encode all "universal meaning" of human language and of most categories of non-human language that I was able to observe at the time as well. What semantic primitives were not so good for (and this was a part of the reason Navajo was so successful as a code language in the Pacific in World War II) was the trappings of material civilization. My 72 (plus or minus) primitives worked perfectly well for a couple of years, allowing me to take notes in my classes in college and graduate school, so long as the conversation kept to mathematics, philosophy, linguistics or psychology. As soon as mold, feathers, Coca-Cola, walruses and infrared radiation entered the discussion, my primitives did not extend any better than Navajo in inventing words for diesel engines and Sherman tanks. But this was okay, since the majority of things that could not be axiomatized (with inferential models for reasoning) could be drawn using pictures.

Hence, I concluded in 1972 as shown in the following diagram that fortunately, language will soon be written as a graph in N dimensions, doughnuts and all.

More diagrams of the graph of a thought

Figure 6.


What was meant by this was that the marks of language are the nodes of a graph held together by the edges provided by predicates, aspect, and the relationships provided by postpositions, inflections and other insignia left upon those nodes by forms that resemble verbs. That graph cannot be drawn on the line, though it may be drawn on the plane, with the occasional need for toroidal crossovers.

That literature (linear text) was dead became apparent. Linear would be replaced by two-dimensional and we would find surfaces (of sufficient topological genus) onto which to scribble our ideas, using a small set of semantic primitives appropriate for all languages of the world. Now some might argue that we need three dimensions in which to write three-dimensional ideas. Perhaps. But until we figure out ways to make our sensory apparatus more than 2 --dimensional (which vision basically is, except for those tantalizing 3D cues offered by the wee bit of parallax of the offset of our two eyes), our writing is rather doomed to surfaces. No worry though. The surface can be folded into doughnuts and pretzels and foams in ways that make surfaces as rich as higher dimensional manifolds, at least for the sake of semantic expression, if not for driving theories of physics.

Just as a simple illustration, let us point out that while the graph K5 (exercise: take all five fingers on one hand and let them touch one another pairwise for all ten pairs, simultaneously -- Frank Harary and my father were the only two people I have known to solve this problem on its first presentation to them) is non planar (meaning that five nodes cannot all be connected to one another in the plane without crossing lines) it can on the doughnut. Not only that but K6 and K7 can drawn as well without crossing lines on the surface of the doughnut. Add to this richness of connectivity the torus (doughnut) provides the fact that humans actually can navigate on it with practice. Consider the wonderful early arcade game of asteroids. I can by way of a parade of expert witnesses A through Z elicit and exhibit testimony demonstrating that those well versed in such toroidal games did indeed learn to navigate the torus (anticipating non-planar events) with aplomb. The standard screen-wrap interface to the torus with its 7/4 ratio of connective richness to the plane (K7 versus K4 as the richest complete graph that can be embedded), suggests that pseudo-planarity has not yet been mined of it potential for software interface into the richness of complex semantic structures like full-texts and the WWW..

How SVG is non-linear.

As I see it, SVG is better than Flash or Silverlight or HTML or speech or text or gesture since it provides an opportunity to expand the bandwidth of human communication. The others (because of limitiations on either licensure or fundamental metaphor) just can't go there. The proper authoring tools, are hence, not going to be easy to make. They will be rich, spatial, temporal, semantic, graph theoretic, and declarative -- that much is clear...."

"Given its emphasis on material displayed on planar devices, and given the intrinsic importance of spatial relations in the presentation of content using a spatial metaphor, SVG's user community may be expected to grow to embrace and support more diversity even than HTML, which at its core is (with the exception of the <table> element and possibly other elements related to spatial arrangement) 1.5 dimensional: that is, its fundamental metaphor consists of written speech (text), with occasional embedded belches of multimedia (<object>, <img>, <audio>, <video>) plus graph theoretic cross-references that provide a modest foray into translinearity. SVG has every bit as much semantic and pragmatic reference to meaning as HTML, it is every bit as hypertextual, but its core metaphor for expression is the plane rather than the line. As such it has come to attract, already, artists, physical and social scientists, and mathematicians whose needs for expression transcend the ability to belch static frames generated elsewhere into an otherwise translinear stream of text. For this community, multidimensional space is not the occasional 2D painting on the wall of an otherwise 1.5 dimensional hypertext, rather for n>=2, n- dimensional space is home

How graph theory plus semantics plus pictures can convey thoughts in a cross cultural way.

I once proved a theorem that some, at the time, interpreted as suggesting that a search for semantic primitives (in the Ogdenian sense of Basic English) was quixotic. That theorem was that given a monolingual dictionary, the computational task of finding a smallest set of semantic primitives was NP-complete. Let me illustrate with one of my favorite examples taken from any almost any dictionary written prior to 1990. "didapper -- a dabchick or other small grebe." Whenever I have asked students if they know what a "didapper" "dabchick" or "grebe" is not one of the thousand or so students I've asked has known. Continuing in the dictionary we find "dabchick -- any of a variety of small grebes including the didapper." Not until we get to "grebe" do we find out that a grebe is a small waterfowl. So the question is: if we were to limit the defining lexicon of our dictionary to the smallest vocabulary whatever (so as to make the dictionary accessible to, say, a three-year old) how small would that vocabulary be? Well the problem, in the case of an arbitrary language with its monolingual dictionary of N words, I showed to be NP-complete. Space travelers encountering new species should not necessarily think that stealing a dictionary and plying an informant with single-malt scotch will automatically produce a Rosetta stone.

A set of semantic primitives rich enough to encode the following, and axiomatized (with a predicate calculus) to support a basic inference engine is probably sufficient for most of human expression (at least that containing no molecules).

The following are offered as examples of the sort of semantic primitives I have in mind

Once again, the molecular world populated by halibut, coca-cola, guitars and rhinos is likely to require an open and extensible format (and probably moving vectors), but plain old human thought as expressed in philosophy, teleology and mechanism is likely not to require much more, until, perhaps, we mutate.

Of course the expressive power of such a system includes undecidable subsystems and likely allows the derivation of contradictions, but humans have generally not been known to implode under exposure to simple contradictions, so that need not be a problem for inference engines.

How SVG can get where we need to go quicker than HTML can

SVG has every bit as much semantic and pragmatic reference to meaning as HTML, it is every bit as hypertextual, but its core metaphor for expression is the plane rather than the line. As such it has come to attract, already, artists, physical and social scientists, and mathematicians whose needs for expression transcend the ability to belch static frames generated elsewhere into an otherwise translinear stream of text. For this community, multidimensional space is not the occasional 2D painting on the wall of an otherwise 1.5 dimensional hypertext, rather for n>=2, n- dimensional space is home.

In short, the fundamental metaphor of SVG for the expression of human meaning is closer to the realm in which meaning exists (at least in some abstract mathematical sense). SVG has not suffered from the setbacks that millenia of bad assumptions about how to communicate thoughts have saddled onto speech, text, hypertext and, hence, HTML.

Beyond this convenience of having been out of town while all these shenanigans were transpiring, SVG has another amazing advantage over HTML: very little legacy content. The SVG community as not yet burdened the spec writers with billions of pages that reinforce the momentum of a flawed model of communication. That is, we are free to do it right for once.

This, of course, begs the question of what is right? What can SVG offer that will empower a rich web-based semantically and inferentially rich expressive medium that simplifies the task of writing ones thoughts, and which enables the ready perception of an author's meaning.

It needs the following capabilities (many of which it already contains):

Bibliography

[Diamond1965] A.S. Diamond. Copyright © 1965 Citadel Press, NY. The History and Origin of Language..

[Barnett1962] Lincoln Barnett. Copyright © 1962 Alfred A. Knopf. The treasure of our tongue..