[Up: web] [Robot Wisdom home page]

The DrawBack browser project

Jorn Barger June 2002

THIS JUST IN: Windows browser add-on Proxomitron has many of my desired features (but not open source?) [bbs] [docs] [screenshots] [sample filters]

This is the initial proposal for an open-source web-browser project, recently re-inspired by a thread on alt.hypertext.

The startingpoint should be very simple: load an offsite html-file, and display the body-text as plain-vanilla text... while suppressing all the 'html-junk'.

Links can still display as links, but all the burdensome heap-o'-link navigation sections should be relocated to the end.

But that's just the startingpoint!

The way I hope it will evolve is towards smart local stylesheets (maintained on your local drive) that allow you great flexibility in re-formatting others' page-designs, while automatically learning how to generalise those formatting choices to other pages on the same site.

These may use simple pattern-recognition (eg regular expressions) to distinguish the useful parts of a page from the annoying parts, hopefully even if the page is considerably revised.

In 'reformat mode' the html-source might be displayed with each open-tag replaced by a popup menu of alternate tags, so you can quickly specify what you want to see, what to suppress, and what to reformat.

For starters, the text display will be quite primitive, but a high priority will be re-wrapping paragraphs. Eventually images should be supported as well. And I'd like it to evolve some semantic-parsing capability: [ideas]

DrawBack

The name implies 'draw back to leap', a concept popularised by Arthur Koestler in his Ghost in the Machine. The idea is that browsers have gotten bogged down in feature-itis, and need to back up and start over from some simpler model. The French expression for this is reculer pour mieux sauter.

code

SourceForge has at least three codebases to explore: netrik, w3m, links. Plus there's lynx and probably many others.

I'm stuck on a slow Mac that will have Linux eventually. I'm not in a big hurry to make DrawBack my main browser-- I really just want to test the concept. When/if it reaches the point of adding fancier formatting, we'll probably want to switch to the Mozilla codebase or another graphical browser.

butwaitthere'smore

I don't see why I couldn't add a 'dback' attribute to my link-anchors:

<a href="http://blahblah" dback="http://mystyles">

so that DrawBack-users following that link would see it first with my re-styling applied...?

...and since the dback-stylesheet is already extracting and reformatting info from one page, there's no reason not to extend it to handle multiple pages-- extracting the interesting bits from several pages and presenting them as a single new page.

Rather than bulking-up the link-anchor with an href2, href3, etc, the relevant URLs could be embedded in the dback-stylesheet itself, and the link might take this surprising new form:

<a dback="http://mystyles">

parsing

The challenge will be to make our parser robust enough to recover intelligently when the original page-design changes.

The first level of analysis will be the hierarchy of tags/containers (the parse-tree). The dback-stylesheet should probably summarise the expected form for this, so that major changes can be spotted and accommodated.

In order to suppress or relocate inessential segments within this parsetree, the segments might be numbered according to their desired position ('0' for sections to be ignored).

A second, partially redundant level of parsing might identify segments by a start-pattern and end-pattern.

A third level might analyse the proportion of body-text to anchor-text (etc) to guess which sections are reading-text, and which are just heaps-o-links...

misc ideas


Suggestions

Use this GooJa thread: [SunPM]

Or (as usual)... You can submit a new URL or any other suggestion for this page by typing it into the box below. It will instantly become visible to anyone at this comments page. I should get around to checking it out and updating it above within a week or three, at which point I'll delete it from the comments page.

If you want credit, include your name and email (otherwise it's anonymous). You can use HTML but you don't have to.


Web-design pages:
main : academia : info-design : adding value : resource-pages : lessons-learned : best-worst : plugging leaks

Special topics:
surfing-skills : url-hacking : open content : semantics : pagelength : linktext : startpages : bookmarklets : weblogging : colors : autobiographical pages : thumbnail-graphics : web-video : timeline of hypertext

Anti-XML/W3C/etc:
structure-myth : page-parsing : firstcut-parser : html-history : semantic web

Design prototypes:
topical portal : dense-content faq : annotated lit : random-access lit-summary : poetry sampler : gossipy history : author-resources : hyperlinked-timeline : horizontal-timeslice : web-dossier

Website-resource pages:
RobotWisdom.com : Altavista.com : 1911encyclopedia.com : Google.com : IMDb.com : Perseus.org : Salon.com : Yahoo.com

Older stuff:
design-lab : design-checklist : HyperTerrorist : design-theory : design cog-sci



Search this site Search full Web

Before you leave this site: Be sure you've checked out Jorn's weblog which offers daily updates on the best of the Web-- news etc, plus new pages on this site. See also the overview of the hundreds of pages of original content offered here, and the offer for a printed version of the site.

Hosting provided by instinct.org. Content may be copied under Open Web Content License.