Tuesday, December 14, 2004

Del.icio.us clusters

Del.icio.us is interesting. It lends
itself well to the following experiment/analysis: Some tags co-occur
with other tags much more often than others. This creates natural
clusters. For example, "php" and "mysql" are a pair of tags that are
more likely to apply to the same URL than say "philosophy" and "web".
So, it's pretty straightforward to analyze the RSS feeds they provide
and automatically generate the clusters, so I did it.

(Note, to illustrate some of these clusters I use a stylesheet which
will not get picked up by most RSS readers, so visit the actual page
if the diagrams make no sense) Most clusters are exactly the kind of
thing you'd expect, though some have interesting structure. Others
are amusing or surprising. One "standard" cluster is essentially a
web development cluster. I've collapsed some of the sub-clusters for clarity:



Web development cluster





php mysql lamp


apache mode_perl perl work





creative css html design

javascript webdev





foaf semantic xml rdf

software web

jsp java j2ee programming



So that's a pretty coherent, straightforward cluster. There are others including the Blogging/RSS cluster (rss atom syndication cool tech blog), the Recreation cluster (photography art photo annotation flickr geo photos games fun flash humor funny comics comic) and the Politics cluster, drawn out here.


Politics cluster


rumsfeld iraq
rnc dnc
democrats
political gop
bush
politics
kerry
election fraud
usa



When you get to the somewhat higher level connections (which are that much more tenuous) there are some less obvious arrangements. All caps words represent an entire subcluster that doesn't have its structure shown:


RSS/BLOGGING


LINUX
mail windows
ftp
spam


IETF RFCs
imap email
sputnik darpa
1950s 1960s


INTERNET HISTORY
free reference
snort security
tools

WEB DEVELOPMENT




Some of the odd connections (eg "mail windows" and "ftp") may just be an artifact of the relatively small sample (less than 6000 tagged URLs).
I'll have to collect more data and continue the experiment. In any case, it's interesting to me that it exposes a hierarchical ontology in a rather straightforward way.

No comments:

Post a Comment