[Up: web] [Robot Wisdom home page]

URL-hacking in Google News

Jorn Barger May 2005

Ajax Challenge: Create a form that lets people set all the options below via checkboxes and popup menus.


This page was inspired-in-reverse by Tim Yang's ScrappyGoo service, that allows free access to GoogleNews queries as RSS feeds. I realised that Google's own formatting was much more readable than the RSS, and began to wonder how I could tweak it to suit my own needs.

I wanted lots of general results in reverse chronological order. The best I've come up with for a 'blank' query is "+the" (if you don't put the plus first, it may ignore the 'the').


http://news.google.com/news?
    num=100        this can be any number of results from 1 to 100 
    &scoring=d     this sorts the results by date
    &q=+the

This probably gives too many (~5000/hr), too general results, so there are various ways to narrow it. Adding

&topic=t
will limit it to sci/tech topics (substitute w for world news, n for US news, e for entertainment, m for health, or b for business).

Examples: tech, world, US, ent, health, biz

(Trying other letters, it looks like someday c will mean China, l will mean life, and y will mean society... or maybe these are rejects?)

You can of course try a batch of keywords to use in place of 'the', joined by 'OR' or the equivalent symbol '|' (above the backslash key, normally).

Example: blog|blogs|blogger|bloggers

Or you can specify a batch of preferred news sources, separated by 'OR', using the syntax "source:cnn" or whatever. (Pinning down the whatever may take a little research on the advanced search page.)

Examples (with approx weekly post-frequency):

source:abc_news (4000/week)
source:guardian_unlimited (2500)
source:san_jose_mercury_news (2500)
source:washington_post (2000)
source:newsday (2000)
source:forbes (2000)
source:boston_globe (1500)
source:reuters (1500)
source:bloomberg (1500)
source:businessweek (1200)
source:bbc_news (1000)
source:msnbc (1000)
source:new_york_times (1000)
source:times_online (900)
source:usa_today (750)
source:cbc_news (650)
source:cnn (600)
source:international_herald_tribune (600)
source:wired_news (500)
source:ha_aretz (400)
source:e_commerce_times (400)
source:independent (350)
source:new_york_post (350)
source:cnet_news_com (150)
source:space_ref (130)
source:slashdot (100)
source:eurekalert (100)
source:computerworld (100)
source:christian_science_monitor (100)
source:editor___publisher (75)   [three underlines because of ampersand]
source:gamespot (60)
source:pc_world (50)
source:pc_magazine (50)
source:search_engine_watch (50)
source:eweek (40)
source:infoworld (30)
source:cbs_marketwatch (0?)
source:economist__subscription_ (0?)
source:financial_times__subscription_ (0?)
source:the_globe_and_mail (0?)
source:al_jazeera (0?)
source:vnunet (0?)
source:the_register (0?)

(What looks like embedded blanks are underlines. Question marks mean these need to be debugged: Google recognises the string but doesn't seem to utilise the filter.)

It's probably more important to sort these by frequency of update than by topic-- infrequent high-quality publications can easily get drowned by frequent low-quality ones.

You can mix and match these tricks [eg], and you can put a minus sign in front of most of them to exclude them. ("-subscription" will handily exclude sites that require subscriptions... though it sometimes overdoes this.)





Old background on url-hacking in general: here

Tally of GoogleNews sources: [PrivateRadio]

Tag 'googlenews' on del.icio.us: [multipage]

More GoogleNews-to-RSS converters: [Pfister] [Voidstar]


Thoughts on RSS

I don't think I've found any mass-media news site that does its RSS feed right-- ie, including all stories in the order they're filed, rather than trying to keep the most important one at the top of the feed.

At first I thought it was stupid that RSS strips so much personality from people's page-design, but then I realised that this allowed me to follow sites where the page-design was so ego-y/obnoxious that I'd otherwise find them not-worth-it.

Sage in Firefox allows you to style the RSS feeds however you want using a css stylesheet.




[Up: web]
Web-design pages:
main : academia : info-design : adding value : resource-pages : lessons-learned : best-worst : plugging leaks

Special topics:
surfing-skills : url-hacking : open content : semantics : pagelength : linktext : startpages : bookmarklets : weblogging : colors : autobiographical pages : thumbnail-graphics : web-video : timeline of hypertext

Anti-XML/W3C/etc:
structure-myth : page-parsing : firstcut-parser : html-history : semantic web

Design prototypes:
topical portal : dense-content faq : annotated lit : random-access lit-summary : poetry sampler : gossipy history : author-resources : hyperlinked-timeline : horizontal-timeslice : web-dossier

Website-resource pages:
RobotWisdom.com : Altavista.com : 1911encyclopedia.com : Google.com : IMDb.com : Perseus.org : Salon.com : Yahoo.com

Older stuff:
design-lab : design-checklist : HyperTerrorist : design-theory : design cog-sci



Search this site Search full Web

Before you leave this site: Be sure you've checked out Jorn's weblog which offers daily updates on the best of the Web-- news etc, plus new pages on this site. See also the overview of the hundreds of pages of original content offered here, and the offer for a printed version of the site.

Hosting provided by instinct.org. Content may be copied under Open Web Content License.