skip to main | skip to sidebar
Loading...
You have reached the blog of
Matthew Gray
matthew@gray.org
I am a father, board gamer and software engineer.

Internet

In addition to my blog (this page), you can find me on BoardGameGeek, Twitter, Flickr, LinkedIn, FriendFeed, and various other places. I also have a slightly stale homepage.

Personal

I am an avid board gamer. I am one of the (volunteer) admins of BoardGameGeek, maintainer of the GameStoreDB, board game blogger, and gaming software geek.

Professional

I am a staff software engineer at Google. Previously, I was the CTO at an 802.11 location and security company, Newbury Networks in Boston. In June, 1999 I received my Masters degree from the MIT Media Lab. I graduated from MIT (undergraduate) in June, 1997, in physics. Prior to that I was CTO of net.Genesis from 1994 to 1996.

While at MIT, I was one of the three members of the Student Information Processing Board (SIPB) who set up www.mit.edu in the spring of 1993. I am also a former/inactive member of the Apache group, a volunteer group of developers of Apache, the world's most popular web server.

Blog

Tuesday, December 14, 2004

Del.icio.us clusters

Del.icio.us is interesting. It lends
itself well to the following experiment/analysis: Some tags co-occur
with other tags much more often than others. This creates natural
clusters. For example, "php" and "mysql" are a pair of tags that are
more likely to apply to the same URL than say "philosophy" and "web".
So, it's pretty straightforward to analyze the RSS feeds they provide
and automatically generate the clusters, so I did it.

(Note, to illustrate some of these clusters I use a stylesheet which
will not get picked up by most RSS readers, so visit the actual page
if the diagrams make no sense) Most clusters are exactly the kind of
thing you'd expect, though some have interesting structure. Others
are amusing or surprising. One "standard" cluster is essentially a
web development cluster. I've collapsed some of the sub-clusters for clarity:



Web development cluster





php mysql lamp


apache mode_perl perl work





creative css html design

javascript webdev





foaf semantic xml rdf

software web

jsp java j2ee programming



So that's a pretty coherent, straightforward cluster. There are others including the Blogging/RSS cluster (rss atom syndication cool tech blog), the Recreation cluster (photography art photo annotation flickr geo photos games fun flash humor funny comics comic) and the Politics cluster, drawn out here.


Politics cluster


rumsfeld iraq
rnc dnc
democrats
political gop
bush
politics
kerry
election fraud
usa



When you get to the somewhat higher level connections (which are that much more tenuous) there are some less obvious arrangements. All caps words represent an entire subcluster that doesn't have its structure shown:


RSS/BLOGGING


LINUX
mail windows
ftp
spam


IETF RFCs
imap email
sputnik darpa
1950s 1960s


INTERNET HISTORY
free reference
snort security
tools

WEB DEVELOPMENT




Some of the odd connections (eg "mail windows" and "ftp") may just be an artifact of the relatively small sample (less than 6000 tagged URLs).
I'll have to collect more data and continue the experiment. In any case, it's interesting to me that it exposes a hierarchical ontology in a rather straightforward way.

Posted by Matthew Gray at 11:44 PM
Labels: delicious, Technology

0 comments:

Post a Comment

Newer Post Older Post Home
Posts feed Add to Google Reader or Homepage Subscribe in Bloglines // Comments feed

Recently Played

www.flickr.com

Popular Posts

Blog Archive

  • ►  2010 (2)
    • ►  January (2)
      • 2009 Games Summary
      • Ten Years of Games
  • ►  2009 (6)
    • ►  September (2)
      • People who are unintentional "spoilers"
      • Davis Mega Maze via GPS
    • ►  April (1)
      • Google's architecture through the eyes of a 4-year...
    • ►  March (1)
      • Mozy review: It doesn't work
    • ►  February (1)
      • Recent sci-fi reading
    • ►  January (1)
      • 2008 Games Summary
  • ►  2008 (27)
    • ►  December (1)
      • Lanna Thai Diner Review
    • ►  November (5)
      • Atom feed of your recently played games
      • In praise of short games
      • 1000 different games
      • Simple election Monte Carlo toy
      • Request for online photo hosting/sharing suggestio...
    • ►  October (2)
      • Played a bunch of new games
      • BaordGameGeek and AppEngine
    • ►  July (3)
      • Quick iPhone app reviews
      • I play games with other people
      • Almost 4, Almost a boardgame geek
    • ►  June (2)
      • Spin and Axis reviews
      • Amusing StreetView vignette
    • ►  May (8)
      • 2008 SdJ Virtual Stock Market
      • Next stage of migration and a warning
      • Luck, Skill and Experience in games
      • Moved to Reading & Commute Analysis
      • I am a social network
      • Race for the Galaxy and variety
      • All my blogs
      • Trying out Blogger
    • ►  April (1)
      • My ScanSnap Workflow
    • ►  March (2)
      • Much Better, reprise
      • A couple more: Parkour and Speed Stacking
    • ►  January (3)
      • Every Year Games
      • Games of the year, 2007
      • 2007 Games Summary
  • ►  2007 (12)
    • ►  September (1)
      • Gaming impact of children
    • ►  April (5)
      • Hot at the Gathering
      • Buy, Maybe, No Buy
      • Newly played at the Gathering, Brief Comments, Tue...
      • Friedemann charms a 2-year-old
      • Gathering 2007, appetizer
    • ►  March (3)
      • Game Card Catalog
      • Gamer, age 2
      • Full Circle
    • ►  January (3)
      • BoardGameGeek Ratings
      • 2006 Games Report
      • Game Metrics for 2006
  • ►  2006 (36)
    • ►  December (2)
      • Games of the year, 2006
      • My BGG tools and toys
    • ►  November (3)
      • Great Service
    • ►  September (3)
    • ►  August (3)
    • ►  July (1)
    • ►  June (2)
    • ►  May (5)
    • ►  April (8)
    • ►  March (2)
    • ►  February (1)
    • ►  January (6)
  • ►  2005 (62)
    • ►  December (3)
    • ►  November (3)
    • ►  October (6)
    • ►  September (2)
    • ►  August (4)
    • ►  July (3)
    • ►  June (4)
    • ►  May (4)
    • ►  April (13)
    • ►  March (7)
    • ►  February (3)
    • ►  January (10)
  • ▼  2004 (87)
    • ▼  December (5)
      • Del.icio.us clusters
      • November 2004 Games
      • NOAA Weather data
      • Jordan's Furniture megastore
      • Why didn't anyone tell me?
    • ►  November (4)
    • ►  October (10)
    • ►  September (5)
    • ►  August (7)
    • ►  July (6)
    • ►  June (10)
    • ►  May (6)
    • ►  April (10)
    • ►  March (11)
    • ►  February (4)
    • ►  January (9)
  • ►  2003 (17)
    • ►  December (7)
    • ►  November (5)
    • ►  October (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
  • ►  2002 (43)
    • ►  December (2)
    • ►  November (2)
    • ►  October (3)
    • ►  September (3)
    • ►  August (3)
    • ►  June (1)
    • ►  May (6)
    • ►  April (7)
    • ►  March (13)
    • ►  February (3)

Disclaimer

I work for Google as a Software Engineer. This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.