Cloud Street

Thursday, September 22, 2005

A place for everything

Or: what ethnoclassification is, and what folksonomy isn't.

When it comes to tagging, I'm facing both ways. I think it's fascinating and powerful and new - qualitatively new, that is: it's worth writing about not just because it's shiny, but because there's still work to be done on understanding it. At the same time, I think it's been massively oversold, often on the back of rhetorical framings which only have a glancing relationship with evidence or logic. Tagging is fascinating and powerful and new, but a lot of the talk about tagging has me tearing my hair.

I'll pick on a recent post by Dave Weinberger. (Personal to DW: sorry, Dave. I'm emphatically not (is that emphatic enough?) suggesting that you're the worst offender in this area.)
Let's say you type in "africa," "agriculture" and "grains" because that's what you're researching. You'll get lots of results, but you may miss pages about "couscous" because Google is searching for the word "grain" and doesn't know that that's what couscous is made of. Google knows the words on the pages, but doesn't know what the pages are about. That's much harder for computers because what something is about really depends on what you're looking for. That same page on couscous that to you is about economics could be about healthy eating to me or about words that repeat syllables to someone else. And that's the problem with all attempts by experts and authorities to come up with neat organizations of knowledge: What something is about depends on who's looking.
...
Let's say you come across the Moroccan couscous web page and you want to remember it. So you upload its Web address to your free page at del.icio.us that lists all the pages you've saved. Then del.icio.us asks you to enter a word or two as tags so you can find the Moroccan page later. You might tag it with Morocco, recipe, couscous, and main course, and then later you can see all the pages you've tagged with any of those words.

That's a handy way to organize a large list of pages, but tagging at del.icio.us really took off because it's a social activity: Everyone can see all the pages anyone has tagged with say, Morocco or main course or agriculture. This is a great research tool because just by checking the tag "agriculture" now and then, you'll see every page everyone else at delicious has tagged that way. Some of those pages will be irrelevant to you, of course, but many won't be. It's like having the world of people who care about a topic tell you everything they've found of interest. And unlike at Google, you'll find the pages that other humans have decided are ABOUT your topic.
What strikes me about this passage is that Dave changes scenarios in mid-stream: Let's say you come across the Moroccan couscous web page... How? Google couldn't find it. Let's compare like with like, and say that you're still looking for your couscous page: what do you do then, if not go to del.icio.us and type in "africa," "agriculture" and "grains"? Once again, assuming that whole-site searches aren't timing out, you'll get lots of results (particularly since del.icio.us doesn't seem to allow ANDing of search terms) but you may miss pages about "couscous" - and checking the tag "agriculture" now and then won't necessarily help. Google will miss the page if the term 'couscous' doesn't appear in the source (which doesn't necessarily mean 'appear on screen', of course); del.icio.us will miss it if the term hasn't been used to tag it (even if it is in the source).

Google vs del.icio.us is an odd comparison, in other words, and it's not at all clear to me that the comparison favours del.icio.us. It's great to get classificatory(?) input from the users of a document, of course - as I said above, tagging is fascinating and powerful and new - but in terms of information retrieval it can only score over a full-text search if

1. the page has been purposefully tagged by a user
2. the page has been tagged with a term which doesn't appear in the page source
3. a second user is searching for information which is contained in the page, using the term with which the first user tagged it

I don't think tagging advocates think enough about what those conditions imply. For example, at present I'm the only del.icio.us user to have tagged Mr Chichimichi's Tags are not a panacea; I tagged it with 'tagging', 'search' and 'ethnoclassification'. Until I did so, anyone looking for it would have been out of luck. Even Google wouldn't be much help - the word 'ethnoclassification' doesn't appear anywhere in the text. No, until a couple of days ago your only way of stumbling on that post would have been to run a clumsy, counter-intuitive Google search on terms like 'tagging', 'tags', 'folksonomies' and 'social software'. (Google even knows that 'folksonomies' is the plural of 'folksonomy', so searching on the singular form would work just as well. That's just not fair.)

Dave also contrasts the world of collective knowledge through distributed tagging with attempts by experts and authorities to come up with neat organizations of knowledge. Further along in the same piece, he writes:
This takes classification and about-ness out of the hands of authors and experts. Now it's up to us readers to decide what something is about.

Not only does this let us organize stuff in ways that make more sense to us, but we no longer have to act as if there's only one right way of understanding everything, or that authors and other authorities are the best judges of what things are about.
One question: who ever said that there was only one right way of understanding everything? OK, too easy. I'll rephrase that: before tagging came along, who was saying there was one right way, etc? Who are the tagging advocates actually arguing against? (It certainly isn't librarians (context here).)

There's a difference between classifications which have a single pre-determined set of definitions and classifications which are user-defined and user-extensible. But that's not the same as the difference between having an underlying ontology and not having one, or the difference between hierarchical and flat organisations of knowledge, or the difference between single and multiple sets of classifications. A closed, expert-defined, locked-down controlled vocabulary may contain multiple sets of overlapping terms; it may be a flat list of categories rather than a 'tree'; it may even be innocent of ontology. (Thanks to Jay for pointing this out, in comments here.) If tagging is better than top-down classification, it's better because it's user-defined and user-extensible - not because it's free of the vices of ontology, hierarchy and uniformity. The idea that tagging - and only tagging - stands in opposition to a classifying universe built on hierarchical uniformity is a straw man. (But the librarians get it both ways - if a top-down classifying system is shown to be flat and plural, this can be put forward as a sign of the weakness of top-down systems; the fact that bottom-up systems are more, not less, vulnerable to Chinese Encyclopedia Syndrome is passed over.)

So, tagging systems make lousy search engines, and they don't mark a qualitative leap in the organisation of human knowledge. What they're really good for - and what makes them fascinating and powerful - is conversation. Tagging, I'm suggesting, isn't there to tell us about stuff: it's there to tell us about what people say about stuff. As such, it performs rather poorly when you're asking "where is X?" or "what is X?", and it comes into its own when you're asking "what are people saying about X?" (Of course, much tag-advocacy is driven by the tacit belief that there's no fundamental difference between what people say about X and expert knowledge of X - and that an aggregate of what people say would be equivalent, if not superior, to expert knowledge. But that's an argument for another post.)

Tagging is good for telling us what people say about stuff, anyway - and when it's good, it's very good. To see what I'm talking about, have a look at Reader2 (via Thomas). It's a book recommendation site, implemented on the basis of a del.icio.us-like user/tag system. It's powerful stuff already, and it's still being developed. Does it tell me what books are really like? No - but it tells me what people are saying about them, which is precisely what I want to know. And it couldn't do this nearly as well, it seems to me, without tags - and tag clouds in particular. This, for me, is what tagging's all about. Ethnoclassification: classification as a open-ended collective activity, as one element of the continual construction of social reality.

1 Comments:

  • Phil, I disagree with you less than you probably think and maybe less than I should.

    Tags are not a replacement for full text searching, nor vice versa. Nor is hierarchy always bad. In fact, hierarchy is maybe the most efficient way we've ever invented of understanding certain complex systems (e.g., the universe). But I disagree that tagging is primarily useful for finding out what people are saying about something. E.g., delicious tagstreams are useful to me frequently because they show me pages, not what people are saying about pages.

    As for the question of whether anhyone recommends single ways of understanding things: On the one hand, you're right that it's easy for me (and others) to slip into strawman-ship. Yet, there are two reasons why some taxonomies do get preferred to the point of monopolizing our understanding. First, some organize physical objects which, by necessity, can only be on branch at a time. E.g., your local bookstore has to decide if "The Tipping Point" gets shelved under Business or Sociology. (yeah, it could put it in both, but that'd be an exception.) Even the need to record metadata on paper has tended to force one and only one taxonomy. Second, some taxonomies dominate some fields, providing a framework for discussion: Periodic table, biological taxonomies, taxonomies of heavenly objects. I agree that no practitioner would say that these are the only conceivable ways of organizing their data, but we've constrained largely for practical reasons to act that way. Now, thanks to the digitizing of info, we are shuffling off that constraint. IMO.

    By Blogger David, at 23/9/05 16:06  

Post a Comment

<< Home