[Up] [Map] [Prior- cogsci] [Robot Wisdom home page]

The Theory Designers Need

some directions for grounding WWWeb design in cognitive science

by Jorn Barger

Analytic table of contents
# this essay embodies a new approach to cog sci
# the new approach is aimed at creating simulated 'agents'
# start by imagining an abstract agent seeing an artifact
# trace the path of the agent's gaze
# an artifact communicates its creator's personality to the agent
# a text artifact produces an imaginary voice in readers' minds
# hypertext linking is like scrolling
# choosing a hyper-path involves complex cost-benefit analysis


# Although the topic of this essay is hypertext design on the WWWeb, what I've tried to do is *ground* this design theory in basic cognitive science... so the challenge has been to start from the most abstract plane, but then consistently reconnect it with very concrete design issues.

And because these design issues were derived all and only from 'field experience' on the WWWeb, the cognitive principles that I've had to invoke may often be unfamiliar *as cognitive science*, but they should all be quite familiar from the 'folk wisdom' of design. A great benefit of the WWWeb, for hypertext design, is the extraordinary quantity of thoroughly uninhibited experimentation.

# The format I'll use is the *abstract history* of how any person interacts with (first) a totally generic 'artifact', and then secondarily with a *hypertext artifact*, inventorying the ways they're changed by that interaction, and what psychological principles must lie behind these effects. Though I express these connections as truths, they should be understood merely as hypotheses that may not have ever been tested.

This 'abstract history' methodology is one I've been evolving since 1971, when I wrote my first social simulation, in Fortran, for Lonnie Supnick at Antioch College. I've been exercising the full freedom of the amateur, since then, in a search for psychological models that might be translatable into AI's *language of rules*, in the hope of someday building a realistic computer 'agent'. (One vivid way to think of these agents is as realistic characters in a *video game*.)

And one of the most powerful heuristics I've adopted has been to avoid jargon as far as humanly possible, and use instead the strengths of ordinary language for capturing behavioral subtleties. So this essay will try to present a detailed ordinary-language depiction of what a computer simulation would have to include, to model the behavior of a human agent interacting with a hypertext.

# So: start with an otherwise totally abstract human agent, into whose visual field an artifact (entirely generic at this point) has entered. (Shortly we will narrow this artifact to a page or screen of hypertext.) We want to anticipate as far as possible the agent's responses to this appearance.

The artifact may be expected or unexpected, intentionally summoned, or accidentally. The agent will interpret it according to her expectations, based on her understanding of the context, which in the case of the WWWeb will include the browser and the path of links just chosen. And she'll begin to build a model of the provider/creator of the artifact, as well.

Since the provider will also have built a model of future viewers of the artifact, and since each agent-- viewer and provider-- has a distinct set of motives, their interaction can probably be viewed as a sort of 'game' in the sense of classical game theory.

The viewer will be continually evaluating the expected benefits of further exploration of the artifact, as well as the costs, and also the costs and benefits of retreating from it. If she's intentionally summoned it, she'll have specific benefits in mind, and the artifact may enhance or frustrate this pursuit.

On first glance, the viewer may recognize this artifact as seen before, bringing along a past understanding that makes re-orientation quicker. It may seem only partially familiar, having been changed since the last visit. Or it may be entirely unfamiliar, so this first impression will be a critical point in determining the viewer's future opinion of it. Even if it's unfamiliar, it should resemble past artifacts and allow *some* transfer of past understanding.

# From its first entry into the agent's visual field, we could in theory trace a continuous curving path for her gaze. The artifact-designer must to some extent *engineer* the likeliest paths, and optimize them. For artifacts with text, at those points the gaze may be engaged in *reading*. But reading is more costly than a simpler gaze, so too much text is sometimes counterproductive.

(WWWeb pages may appear gradually with some browsers. If text usually appears before graphics, this may be taken into account in engineering the path of the gaze.)

# Page layout conveys personality. No matter how constrained the medium-- and simple HTML is very constrained-- the layout remains a stubborn witness to the mind of the provider, even in the choice of words, or the care in proofreading. Fonts and formatting and graphics allow a much greater expressiveness, and providers will normally seek to convey confidence, authority, intelligence, good taste, and even sexiness and glamor. Clutter and haste normally make a bad impression.

Providers therefore find great value in increased layout control-- the more expressive the better.

Unfamiliar elements attract the eye. Non-central matter should blend if possible into the background. Contrasts also attract the eye, and different sorts of contrast (bold, italic, whitespace, etc.) convey different moods. The eye is attracted to beginnings of paragraphs, to the first text after any whitespace, and to highlighted text. These will normally be scanned from page-top to page-bottom.

# When the gaze shifts to reading mode, an imagined speaker's voice will begin to be established. Normally there will be one main speaker perceived for a page (or site). The provider's choices of words and layout can strengthen or weaken this sense of a unique voice. Use of boldface in the middle of normal text tends to overwhelm this illusion.

Use of lists (like hotlists) also tends to break this continuity, although care in arranging and annotating them can minimize this. There may be a tradeoff between the costs of reading longer texts, versus the reduced continuity of shorter ones.

# By convention, text on a computer screen is usually understood to be upwardly scrollable, with new text appearing while the old partially or wholly disappears. Choosing a hyperlink causes a similar transition. The path of the eye must bridge this change in some manner. And the viewer should normally be reassured that the transition is reversible, via a 'back' button, or scrollbars, etc. (There will be an expected latency-cost for these reversals. Caching reduces this expense.)

The bottom of the screen may suggest the option of a scrolling-choice, as a hyperlink may signal, in some way, the option of a display transition. (Oddly, current scrollbar designs give little hint of hidden text.) HTML's #-link can implement internal scrolling in the form of a hyperlink. To reduce confusion, these should be clearly distinguished. (A simple solution is to use a visible "#" as the flag.) Browsers should be able to reverse an internal #-link just as it can any other.

While some hyperlinks may be purely graphical, most will involve linktext, some of which will usually be highlighted (clickable). To some extent, greater quantities of highlighted text will suggest a more important link. Choosing whether to follow a link, or continue down a page, or retreat to an earlier point, is a complex cost-benefit calculation.

# The site provider will try to accumulate credibility and goodwill that encourages the viewer to follow the provider's lead. Balanced against this in the viewer's calculations will be expectations about the number of steps required before reaching the desired payoff (the most popular destinations should take the fewest steps), the amount of reading required, the lagtime (latency) for loading each page, the risk of disappointment (as when a page provides only a single paragraph of text), the reduced continuity of the authorial 'voice', the danger of disorientation, the confidence in reversibility, and confidence that the search will move closer to the goal rather than farther away.

Benefits include a simple desire for novelty, especially GIFs or novel layouts, or the answer to a particular question, or a gratifying sense of becoming acquainted with the author/provider. Viewers who are under time-pressure will of course be less patient with obstacles and delays.

[Up] [Map] [Next] [Robot Wisdom home page] (Feedback)