[Up: web] [Robot Wisdom home page]
URL-hacking in Google News
Jorn Barger May 2005Ajax Challenge: Create a form that lets people set all the options below via checkboxes and popup menus.
This page was inspired-in-reverse by Tim Yang's ScrappyGoo service, that allows free access to GoogleNews queries as RSS feeds. I realised that Google's own formatting was much more readable than the RSS, and began to wonder how I could tweak it to suit my own needs.I wanted lots of general results in reverse chronological order. The best I've come up with for a 'blank' query is "+the" (if you don't put the plus first, it may ignore the 'the').
http://news.google.com/news? num=100 this can be any number of results from 1 to 100 &scoring=d this sorts the results by date &q=+theThis probably gives too many (~5000/hr), too general results, so there are various ways to narrow it. Adding
&topic=twill limit it to sci/tech topics (substitute w for world news, n for US news, e for entertainment, m for health, or b for business).Examples: tech, world, US, ent, health, biz
(Trying other letters, it looks like someday c will mean China, l will mean life, and y will mean society... or maybe these are rejects?)
You can of course try a batch of keywords to use in place of 'the', joined by 'OR' or the equivalent symbol '|' (above the backslash key, normally).
Example: blog|blogs|blogger|bloggers
Or you can specify a batch of preferred news sources, separated by 'OR', using the syntax "source:cnn" or whatever. (Pinning down the whatever may take a little research on the advanced search page.)
Examples (with approx weekly post-frequency):
source:abc_news (4000/week) source:guardian_unlimited (2500) source:san_jose_mercury_news (2500) source:washington_post (2000) source:newsday (2000) source:forbes (2000) source:boston_globe (1500) source:reuters (1500) source:bloomberg (1500) source:businessweek (1200) source:bbc_news (1000) source:msnbc (1000) source:new_york_times (1000) source:times_online (900) source:usa_today (750) source:cbc_news (650) source:cnn (600) source:international_herald_tribune (600) source:wired_news (500) source:ha_aretz (400) source:e_commerce_times (400) source:independent (350) source:new_york_post (350) source:cnet_news_com (150) source:space_ref (130) source:slashdot (100) source:eurekalert (100) source:computerworld (100) source:christian_science_monitor (100) source:editor___publisher (75) [three underlines because of ampersand] source:gamespot (60) source:pc_world (50) source:pc_magazine (50) source:search_engine_watch (50) source:eweek (40) source:infoworld (30) source:cbs_marketwatch (0?) source:economist__subscription_ (0?) source:financial_times__subscription_ (0?) source:the_globe_and_mail (0?) source:al_jazeera (0?) source:vnunet (0?) source:the_register (0?)(What looks like embedded blanks are underlines. Question marks mean these need to be debugged: Google recognises the string but doesn't seem to utilise the filter.)
It's probably more important to sort these by frequency of update than by topic-- infrequent high-quality publications can easily get drowned by frequent low-quality ones.
You can mix and match these tricks [eg], and you can put a minus sign in front of most of them to exclude them. ("-subscription" will handily exclude sites that require subscriptions... though it sometimes overdoes this.)
Old background on url-hacking in general: hereTally of GoogleNews sources: [PrivateRadio]
Tag 'googlenews' on del.icio.us: [multipage]
More GoogleNews-to-RSS converters: [Pfister] [Voidstar]
Thoughts on RSSI don't think I've found any mass-media news site that does its RSS feed right-- ie, including all stories in the order they're filed, rather than trying to keep the most important one at the top of the feed.
At first I thought it was stupid that RSS strips so much personality from people's page-design, but then I realised that this allowed me to follow sites where the page-design was so ego-y/obnoxious that I'd otherwise find them not-worth-it.
Sage in Firefox allows you to style the RSS feeds however you want using a css stylesheet.
[Up: web]
Web-design pages:
main :
academia :
info-design :
adding value :
resource-pages :
lessons-learned :
best-worst :
plugging leaks
Special topics:
surfing-skills :
url-hacking :
open content :
semantics :
pagelength :
linktext :
startpages :
bookmarklets :
weblogging :
colors :
autobiographical pages :
thumbnail-graphics :
web-video :
timeline of hypertext
Anti-XML/W3C/etc:
structure-myth :
page-parsing :
firstcut-parser :
html-history :
semantic web
Design prototypes:
topical portal :
dense-content faq :
annotated lit :
random-access lit-summary :
poetry sampler :
gossipy history :
author-resources :
hyperlinked-timeline :
horizontal-timeslice :
web-dossier
Website-resource pages:
RobotWisdom.com :
Altavista.com :
1911encyclopedia.com :
Google.com :
IMDb.com :
Perseus.org :
Salon.com :
Yahoo.com
Older stuff:
design-lab :
design-checklist :
HyperTerrorist :
design-theory :
design cog-sci
Hosting provided by instinct.org. Content may be copied under Open Web Content License.