Tuesday, December 14, 2004

Del.icio.us clusters

Del.icio.us is interesting. It lends
itself well to the following experiment/analysis: Some tags co-occur
with other tags much more often than others. This creates natural
clusters. For example, "php" and "mysql" are a pair of tags that are
more likely to apply to the same URL than say "philosophy" and "web".
So, it's pretty straightforward to analyze the RSS feeds they provide
and automatically generate the clusters, so I did it.

(Note, to illustrate some of these clusters I use a stylesheet which
will not get picked up by most RSS readers, so visit the actual page
if the diagrams make no sense) Most clusters are exactly the kind of
thing you'd expect, though some have interesting structure. Others
are amusing or surprising. One "standard" cluster is essentially a
web development cluster. I've collapsed some of the sub-clusters for clarity:

Web development cluster

php mysql lamp

apache mode_perl perl work

creative css html design

javascript webdev

foaf semantic xml rdf

software web

jsp java j2ee programming

So that's a pretty coherent, straightforward cluster. There are others including the Blogging/RSS cluster (rss atom syndication cool tech blog), the Recreation cluster (photography art photo annotation flickr geo photos games fun flash humor funny comics comic) and the Politics cluster, drawn out here.

Politics cluster

rumsfeld iraq
rnc dnc
political gop
election fraud

When you get to the somewhat higher level connections (which are that much more tenuous) there are some less obvious arrangements. All caps words represent an entire subcluster that doesn't have its structure shown:


mail windows

imap email
sputnik darpa
1950s 1960s

free reference
snort security


Some of the odd connections (eg "mail windows" and "ftp") may just be an artifact of the relatively small sample (less than 6000 tagged URLs).
I'll have to collect more data and continue the experiment. In any case, it's interesting to me that it exposes a hierarchical ontology in a rather straightforward way.

