Cloud Street

Friday, August 05, 2005

So say I

Why I use 'ethnoclassification' rather than 'folksonomy'.
  1. 'Ethnoclassification' recalls 'ethnomethodology', Harold Garfinkel's coinage for the study of the collective construction of everyday life. Garfinkel took a great deal from Alfred Schutz; I think some of his work develops Schutz's social phenomenology in the wrong direction, but to have Schutz's work developed at all is a good thing. In this context, the term 'ethnoclassification' suggests a process that's continual, provisional and embedded in practical activity: the place where it happens (to borrow a phrase from Russell Hoban) is Everywhere All The Time. I think this is a good emphasis.
  2. 'Folksonomy', by contrast, suggests both a process and the end result (a viable folk-taxonomy); as such it's confusing and promotes fuzzy argument.
  3. It's also a term with a strong positive value: forward the taxonomy of the folk! a bas les bibliothecaires! It's a marketing term as well as a term of analysis, and lends itself to slippage between description and advocacy.
  4. (Last and least) It's etymologically ghastly and obtrusively American (I don't say 'candy', I don't say 'diaper' and I don't say 'folks').
Henceforth - starting in the previous post, to be more precise - I'll be using 'ethnoclassification' to refer to the (real, universal, continuing) process and 'folksonomy' to refer to the (hyped, unrealised, arguably unrealisable) end result.

Not available before

Thanks to a couple of links posted by Thomas, I've just read Bryan Boyer's Correspondance Romano (Corriere Romano, surely? never mind) closely followed by this post from February by Tom Evslin. Tom:
People don’t think hierarchically – at least most people don’t. We think in terms of associations. Our dreams give this away as they hyperlink through experiences of the day and memories of the distant past. A conversation meanders horizontally from one topic to the next.
Hierarchies like Lotus Notes or the Dewey Decimal System were necessary when computing power was non-existent or very expensive. As computing power has become relentlessly cheaper thanks to Moore’s law, hierarchies of information have become unnecessary. ... So long as Google or its competitors can index almost everything I might ever want to find, why should any arbitrary order be imposed on information?
Once we didn’t need hierarchies to organize our approach to information, they became an impediment. It is very hard for one person to figure out which node in which folder tree another person would have put a particular piece of information. A document may be relevant to one researcher for entirely different reasons than it is relevant to another researcher.
The relationship between documents is actually dynamic depending on the needs of the reader. Not incidentally, open tagging and hyperlinking are both ways to impose particular relationships on documents to meet the need of some subset of readers.
In passing, this suggests that the contribution of tagging to the grunt work of actually finding stuff may not be all that significant. After all, "a document may be relevant to one researcher for entirely different reasons than it is relevant to another researcher": in this respect the same strictures apply to tags as to folders, with the proviso that tagging does at least give you multiple chances to get it right. I've found useful and interesting stuff by browsing, but I've also found useful and interesting stuff by browsing library catalogues, running partial name searches on booksellers' sites, googling common phrases and going to the eighth page of results, and so forth. But then, I'm a catalogue-hound and I like being surprised. If you're looking for something specific, Tom's argument (inadvertently?) suggests, you're probably better off with Google.

Bryan's post doesn't discuss taxonomies, ontologies or search engines, largely because it's a series of emails from 2002. But it does contain this beautiful piece of ethnoclassification:
Italy is about all of these things: cured meats, standing up to drink your coffee, stiffling heat, mid-day naps, skulls in churches, hot men in suits on scooters, Ananas, and cheap groceries.
This is very much the kind of freewheeling associational approach to knowledge that Tom describes - and very much the kind of ground-up, non-exclusive, plural, open-ended classifying process which has become known as 'folksonomy'.

But what happens if we take that sentence and map it onto the current 'folksonomic' toolset? Is there an 'Italy' resource somewhere - a really really authoritative Web page, say - that we can tag with 'curedmeat', 'coffeestandingup', 'stifflingheat' and so on? (Never mind the problem of cross-matching with the tags 'meat.cured', 'coffee.standing' and 'heat.stiffling' - let alone 'heat.stifling'.) Or are we going to use an 'italy' tag and apply it to single identifiable resources on 'cured meat', 'hot men in suits on scooters', etc? If so, did all those resources exist before we tried to tag them - and if not, are we going to have to create them?

The kind of association described by Tom - and exemplified by Bryan's old mails - is actually a very bad fit for the Technorati/ style of document tagging, for two reasons. One is that it's two-way: if 'Italy' is associated with 'skulls in churches' then 'skulls in churches' is necessarily associated with 'Italy'. (In the case of document-based tagging, the relationship is asymmetrical and the inverse relationship is weaker: Document 1 'is about' T1, T2, T3; Topic 1 'has some relevant information in' D1, D2, D3.) The other is that it's descriptive rather than annotative: we're not tagging stuff-about-stuff, we're tagging... well, stuff, and tagging it with other stuff. These bi-directional relationships between concepts can be approximated by the associations between tags which emerge out of the cumulative process of document tagging, but this seems like going a very long way round. "We think in terms of associations": should we have to say

this has been applied to resources which have also been classified as that

when what we want to say is

this is like that ?

There's one glaring exception to this argument: Flickr. It's easy to imagine an 'italy' photoset including images which were also tagged with 'curedmeat', 'churchskull' and so forth. Descriptive tagging, bi-directional associations, it's all there - job done. This is deceptive, however. Flickr runs on discrete objects - individual images - and the relationships between Flickr tags really describe the images themselves, or at most the universe of Flickr images. If we didn't have any images of stifling heat in Italy, that association wouldn't exist; if we had three salami pictures and only one of a skull in a church, the 'curedmeat'/'italy' association would automatically be three times as strong as 'churchskull'/italy'. Once again, we'd have to go to considerable lengths in order to represent the associations which Bryan effortlessly set out in 32 hastily-composed words.

Ethnoclassification: do we have the technology?