Cloud Street: So much that hides

Alex points to this piece by Rashmi Sinha on 'Findability with tags': the vexed question of using tags to find the material that you've tagged, rather than as an elaborate way of building a mind-map.

I should stress, parenthetically, that that last bit wasn't meant as a putdown - it actually describes my own use of Simpy. I regularly tag pages, but almost never use tags to actually retrieve them. Sometimes - quite rarely - I do pull up all the pages I've tagged with a generic "write something about this" tag. Apart from that, I only ever ask Simpy two questions: one is "what was that page I tagged the other day?" (for which, obviously, meaningful tags aren't required); the other is "what does my tag cloud look like?".

Now, you could say that the answer to the second question isn't strictly speaking information; it's certainly not information I use, unless you count the time I spend grooming the cloud by splitting, merging and deleting stray tags. I like tag clouds and don't agree with Jeffrey Zeldman's anathema, but I do agree with Alex that they're not the last word in retrieving information from tags. Which is where Rashmi's article comes in.

Rashmi identifies three ways of layering additional information on top of the basic item/tag pairing, all of which hinge on partitioning the tag universe in different ways. This is most obvious in the case of faceted tagging: here, the field of information is partitioned before any tags are applied. Rashmi cites the familiar example of wine, where a 'region' tag would carry a different kind of information from 'grape variety', 'price' or for that matter 'taste'. Similar distinctions can be made in other areas: a news story tagged 'New Labour', 'racism' and 'to blog about' is implicitly carrying information in the domains 'subject (political philosophy)', 'subject (social issue)' and 'action to take'.

There are two related problems here. A unique tag, in this model, can only exist within one dimension: if I want separate tags for New Labour (the people) and New Labour (the philosophy), I'll either have to make an artificial distinction between the two (New_Labour vs New_Labour_philosophy) or add a dimension layer to my tags (political_party.New_Labour vs political_philosophy.New_Labour). Both solutions are pretty horrible. More broadly, you can't invoke a taxonomist's standby like the wine example without setting folksonomic backs up, and with some reason: part of the appeal of tagging is precisely that you start with a blank sheet and let the domains of knowledge emerge as they may.

Clustered tagging (a new one on me) addresses both of these problems, as well as answering the much-evaded question of how those domains are supposed to emerge. A tag cluster - as seen on Flickr - consists of a group of tags which consistently appear together, suggesting an implicit 'domain'. Crucially, a single tag can occur in multiple clusters. The clusters for the Flickr 'election' tag, for example, are easy to interpret:

vote, politics, kerry, bush, voting, ballot, poster, cameraphone, democrat, president

wahl, germany, deutschland, berlin, cdu, spd, bundestagswahl

canada, ndp, liberal, toronto, jacklayton, federalelection

and, rather anticlimactically,

england, uk

Clustering, I'd argue, represents a pretty good stab at building emergent domains. The downside is that it only becomes possible when there are huge numbers of tagging operations.

The third enhancement to tagging Rashmi describes is the use of tags as pivots:

When everything (tag, username, number of people who have bookmarked an item) is a link, you can use any of those links to look around you. You can change direction at any moment.

Lurking behind this, I think, is Thomas's original tripartite definition of 'folksonomy':

the three needed data points in a folksonomy tool [are]: 1) the person tagging; 2) the object being tagged as its own entity; and 3) the tag being used on that object. Flattening the three layers in a tool in any way makes that tool far less valuable for finding information. But keeping the three data elements you can use two of the elements to find a third element, which has value. If you know the object (in del.icio.us it is the web page being tagged) and the tag you can find other individuals who use the same tag on that object, which may lead (if a little more investigation) to somebody who has the same interest and vocabulary as you do. That person can become a filter for items on which they use that tag.

This, I think, is pivoting in action: from the object and its tags, to the person tagging and the tags they use, to the person using particular tags and the objects they tag. (There's a more concrete description here.)

Alex suggests that using tags as pivots could also be considered a subset of faceted browsing. I'd go further, and suggest that facets, clusters and pivots are all subsets of a larger set of solutions, which we can call domain-based tagging. If you use facets, the domains are imposed: this approach is a good fit to relatively closed domains of knowledge and finite groups of taggers. If you've got an epistemological blank sheet and a limitless supply of taggers, you can allow the domains to emerge: this is where clusters come into their own. And if what you're primarily interested in is people - and, specifically, who's saying what about what - then you don't want multiple content-based domains but only the information which derives directly from human activity: the objects and their taggers. Or rather, you want the objects and the taggers, plus the ability to pivot into a kind of multi-dimensional space: instead of tags existing within domains, each tag is a domain in its own right, and what you can find within each tag-domain is the objects and their taggers.

What all of this suggests is that, unsurprisingly, there is no 'one size fits all' solution. I suggested some time ago that

If 'cloudiness' is a universal condition, del.icio.us and Flickr and tag clouds and so forth don't enable us to do anything new; what they are giving us is a live demonstration of how the social mind works.

All knowledge is cloudy; all knowledge is constructed through conversation; conversation is a way of dealing with cloudiness and building usable clouds; social software lets us see knowledge clouds form in real time. I think that's fine as far as it goes; what it doesn't say is that, as well as having conversations about different things, we're having different kinds of conversations and dealing with the cloud of knowing in different ways. Ontology is not, necessarily, overrated; neither is folksonomy.

Cloud Street

Thursday, August 03, 2006

So much that hides

0 Comments:

About Me

Me elsewhere

Previous