Matthew Gray's Blog: Del.icio.us clusters

Tuesday, December 14, 2004

Del.icio.us clusters

Del.icio.us is interesting. It lends
itself well to the following experiment/analysis: Some tags co-occur
with other tags much more often than others. This creates natural
clusters. For example, "php" and "mysql" are a pair of tags that are
more likely to apply to the same URL than say "philosophy" and "web".
So, it's pretty straightforward to analyze the RSS feeds they provide
and automatically generate the clusters, so I did it.

(Note, to illustrate some of these clusters I use a stylesheet which
will not get picked up by most RSS readers, so visit the actual page
if the diagrams make no sense) Most clusters are exactly the kind of
thing you'd expect, though some have interesting structure. Others
are amusing or surprising. One "standard" cluster is essentially a
web development cluster. I've collapsed some of the sub-clusters for clarity:

Web development cluster

php mysql lamp

apache mode_perl perl work

creative css html design

javascript webdev

foaf semantic xml rdf

software web

jsp java j2ee programming

So that's a pretty coherent, straightforward cluster. There are others including the Blogging/RSS cluster (rss atom syndication cool tech blog), the Recreation cluster (photography art photo annotation flickr geo photos games fun flash humor funny comics comic) and the Politics cluster, drawn out here.

Politics cluster

rumsfeld iraq

rnc dnc

democrats

political gop

bush

politics

kerry

election fraud

usa

When you get to the somewhat higher level connections (which are that much more tenuous) there are some less obvious arrangements. All caps words represent an entire subcluster that doesn't have its structure shown:

RSS/BLOGGING

LINUX

mail windows

ftp

spam

IETF RFCs

imap email

sputnik darpa

1950s 1960s

INTERNET HISTORY

free reference

snort security

tools

WEB DEVELOPMENT

Some of the odd connections (eg "mail windows" and "ftp") may just be an artifact of the relatively small sample (less than 6000 tagged URLs).
I'll have to collect more data and continue the experiment. In any case, it's interesting to me that it exposes a hierarchical ontology in a rather straightforward way.

Matthew Gray's Blog

Tuesday, December 14, 2004

Del.icio.us clusters

Web development cluster

Politics cluster

No comments:

Post a Comment