Cloud Street

Wednesday, March 28, 2007

Strange clothes of sand

There hasn't been a lot here lately; there hasn't been a huge amount on my home weblog Actually Existing either, and quite a lot of what I have posted there has been tagged as work-related. So this will be the last post here; I'm merging my weblogs and taking the opportunity to leave Google and go to Wordpress. I'll see you at The Gaping Silence. (True to the name, there's nothing much there now, but I'll put some new stuff up one of these days.)

Labels:

Wednesday, February 07, 2007

Great big bodies

I think the thing that really irritates me about the Long Tail is just how basic the statistical techniques underlying it are. If you've got all that data, why on earth wouldn't you do something more interesting and more informative with it? It's really not hard. (In fact it's so easy that I can't help feeling the Long Tail image must have some other appeal - but more on that later.)

As you may have noticed, this weblog hasn't been updated for a while. In fact, when I compared it with the rest of my RSS feed I found it was a bit of an outlier:

blogs2

The Y axis is 'number of blogs': two updated today (zero days ago), 11 in the previous 10 days, 1 in the 10-day period before that, and so on until you get to the 71-80 column. Note that each column is a range of values, and that the columns are touching; technically this is a histogram rather than a bar chart.

You can do something similar with 'posts in last 100 days':

blogs1

This shows that the really heavy posters are in the minority in this sample; twelve out of the eighteen have 30 or fewer posts in the last 100 days.

So it looks as if I'm reading a lot of reasonably regular but fairly light bloggers, and a few frequent fliers. If you put the two series together you can see the two groups reflected in the way the sample smears out along the X and Y axes without much in the middle:

blogs3

My question is this. If you can produce readable and informative charts like this quickly and easily (and I assure you that you can - we're talking an hour from start to finish, and most of that went on counting the posts), what on earth would make you prefer this:

blogs5

or this:

blogs4

I can only think of two reasons. One is that it looks kind of like a power law distribution, and that's a cool idea. Except that it isn't a power law distribution, or any kind of distribution - it's a list ranked in descending order, and, er, that's it. The same criticism applies, obviously, to the classic 'power law' graphic ranking weblogs in descending order of inbound links.

DIGRESSION
You can compute a distribution of inbound links across weblogs using very much the techniques I've used here - so many weblogs with one link, so many with two and so forth. Oddly enough, what you end up with then is a curve which falls sharply then tapers off - there are far fewer weblogs with two links than with only one, but not so much of a difference between the '20 links' and '21 links' categories. However, even that isn't a power law distribution, for reasons explained here and here (reasons which, for the non-mathematician, can be summed up as 'a power law distribution means something specific, and this isn't it').
END DIGRESSION

The other reason - and, I suspect, the main reason - is that the Long Tail privileges ranking: the question it suggests isn't how many of which are doing what? but who's first?. A histogram might give more information, but it wouldn't tell me who's up there in the big head, or how far down the tail I am.

People want to be on top; failing that, they want to fantasise about being on top and identify with whoever's up there now. Not everyone, but a lot of people. The popularity of the Long Tail image has a lot in common with the popularity of celebrity gossip magazines.

Labels: , ,

Wednesday, November 22, 2006

They don't know about us

Some dystopian thoughts on data harvesting, usage tracking, recommendation engines and consumer self-expression. First, here's Tom, then me:
"This is going to be one of the great benefits of ambient/pervasive computing or everyware - not the tracking of objects but the tracking and collating of you yourself through objects."

This sentence works just as well with the word 'benefits' replaced by 'threats'. It all depends who gets to do the tracking and collating, I suppose.

Now here's Max Levchin, formerly of Paypal, and his new toy Slide (via Thomas):
If Slide is at all familiar, it's as a knockoff of Flickr, the photo-sharing site. Users upload photos, which are displayed on a running ticker or Slide Show, and subscribe to one another's feeds. But photos are just a way to get Slide users communicating, establishing relationships, Levchin explains.

The site is beginning to introduce new content into Slide Shows. It culls news feeds from around the Web and gathers real-time information from, say, eBay auctions or Match.com profiles. It drops all of this information onto user desktops and then watches to see how they react.

Suppose, for example, there's a user named YankeeDave who sees a Treo 750 scroll by in his Slide Show. He gives it a thumbs-up and forwards it to his buddy" we'll call him Smooth-P. Slide learns from this that both YankeeDave and Smooth-P have an interest in a smartphone and begins delivering competing prices. If YankeeDave buys the item, Slide displays headlines on Treo tips or photos of a leather case. If Smooth-P gives a thumbs-down, Slide gains another valuable piece of data. (Maybe Smooth-P is a BlackBerry guy.) Slide has also established a relationship between YankeeDave and Smooth-P and can begin comparing their ratings, traffic patterns, clicks and networks.

Based on all that information, Slide gains an understanding of people who share a taste for Treos, TAG Heuer watches and BMWs. Next, those users might see a Dyson vacuum, a pair of Forzieri wingtips or a single woman with a six-figure income living within a ten-mile radius. In fact, that's where Levchin thinks the first real opportunity lies - hooking up users with like-minded people. "I started out with this idea of finding shoes for my girlfriend and hotties on HotOrNot for me," Levchin says with a wry smile. "It's easy to shift from recommending shoes to humans."

If this all sounds vaguely creepy, Levchin is careful to say he's rolling out features slowly and will only go as far as his users will allow. But he sees what many others claim to see: Most consumers seem perfectly willing to trade preference data for insight. "What's fueling this is the desire for self-expression," he says.

Nick:
I'm not sure that I see, in today's self-portraits on MySpace or YouTube or Flickr, or in the fetishistic collecting of virtual tokens of attention, the desire to mark one's place in a professional or social stratum. What they seem to express, more than anything, is a desire to turn oneself into a product, a commodity to be consumed. And since, as I wrote earlier, "self-commoditization is in the end indistinguishable from self-consumption," the new portraiture seems at its core narcissistic. The portraits are advertisements for a commoditized self

Granny Weatherwax:
"And sin, young man, is when you treat people as things. Including yourself. That's what sin is. ... People as things, that's where it starts."

More precisely, that's where some extraordinarily unequal and dishonest social relationships can start.

Labels: , ,

Monday, November 13, 2006

Got a web between his toes

Now that Nick has read the last rites for Web 2.0, perhaps it's safe to return to a question that's never quite been resolved.

To wit: what is Web 2.0? (We've established that it's not a snail.) Over at What I wrote, I've just put up a March 2003 article called "In Godzilla's footprint". In it, I asked similar questions about e-business, taking issue with the standard rhetoric of 'efficiency' and 'empowerment'. I suggested that e-business wasn't - or rather isn't - a phenomenon in its own right, but the product of three much larger trends: standardisation, automation and externalisation of costs. (Read the whole thing.)

Assuming for the moment that I called this one correctly - and I find my arguments pretty persuasive - what of Web 2.0? More of the same, only featuring the automation of income generation (AdSense) and the externalisation of payroll costs ('citizen journalism')? Or is there more going on - and if so, what?

Update 16/11

It would be remiss of me not to give any pointers to my own thinking on Web 2.0. So I'm republishing another column at What I wrote, this time from February of this year. Most of you will probably have seen it the first time round, when it appeared in iSeries NEWS UK, but I think it's worth giving it another airing. Have a gander.

Labels: , , , , ,

Monday, November 06, 2006

Simplify, reduce, oversimplify

An interesting post on 'folksonomies' at Collin Brooke's blog prompted this comment, which I thought deserved a post of its own.

I think Peter Merholz's coinage 'ethnoclassification' could be useful here. As I've argued elsewhere, I think we can see all taxonomies (and ultimately all knowledge) as the product of an extended conversation within a given community: in this respect a taxonomy is simply an accredited 'folksonomy'.

However, I think there's a dangerous (but interesting) slippage here between what folksonomies could be and what folksonomies are: between the promise of the project of 'folksonomy' (F1) and what's delivered by any identifiable folksonomy (F2). (You can get into very similar arguments about Wikipedia 1 and Wikipedia 2 - sometimes with the same people.) Compared to the complexity and exhaustiveness of any functioning taxonomic scheme, I don't believe that any actually-existing 'folksonomy' is any more than an extremely sketchy work in progress.

For this reason (among others), I believe we need different words for the activity and the endpoint. So we could contrast classification with Peterme's 'ethnoclassification', on one hand, and note that the only real difference between the two is that the former takes place within structured and credentialled communities. On the other hand, we could contrast actual taxonomies with 'folksonomies'. The latter could have very much the same relationship with officially-credentialled taxonomies as classification does with ethnoclassification - but they aren't there yet.

The shift from 'folksonomy' to 'ethnoclassification' has two interesting side-effects, which I suspect are both fairly unwelcome to folksonomy boosters (a group in which I don't include Thomas Vander Wal, ironically enough). On one hand, divorcing process and product reminds us that improvements to one don't necessarily translate as improvements in the other. The activity that goes into producing a 'folksonomy', as distinct from a taxonomy, may give more participants a better experience (more egalitarian, more widely distributed, more chatty, more fun) but you wouldn't necessarily expect the end product to show improvements as a result. (You'd expect it to be a bit scrappy, by and large.) On the other hand, divorcing process from technology reminds us that ethnoclassification didn't start with del.icio.us; the aggregation of informal knowledge clouds is something we've been doing for a long time, perhaps as long as we've been human.

Tuesday, October 31, 2006

A taxonomy of terror

I attended part of a very interesting conference on terrorism last week. The organisers intend to launch a network and a journal devoted to 'critical terrorism studies', a project which I strongly support. As the previous blog entry suggests, I've studied a bit of terrorism in my time - and I'm very much in favour of people being encouraged to approach the phenomenon critically, which is to say without necessarily endorsing the definitions and interpretive frameworks offered by official sources.

However, it seems to me that the nature of the object of study still needs to be defined - and defined at once more precisely and more loosely. In other words, I don't believe there's much common ground between someone who thinks of terrorism in terms of gathering intelligence on the IRA, and someone who maintains that George W. Bush is a bigger terrorist than Osama bin Laden; I don't think it's particularly productive to try to find common ground between those two images of terrorism, or to simply allow them to coexist without defining the differences between them. On the other hand, I don't see much mileage in a 'purist' Terrorism Studies which would focus solely on groups akin to the IRA - or in an alternative purism which would concentrate on terror attacks by Western governments.

A third approach offers to resolve the gap between these two - although I should say straight away that I don't believe it does so. This approach is that of terrorism as an object of discourse: what is under analysis is not so much an identifiable set of actions, or types of action, as the texts and utterances which purport to analyse and describe terrorism. The effect is to turn the analytical gaze back on the governmental discourse of terrorism, which in turn makes it possible to contrast the official image of the terrorist threat with data from other sources; an interesting example of this approach in practice is Richard Jackson's paper Religion, Politics and Terrorism: A Critical Analysis of Narratives of “Islamic Terrorism” (DOC file available from here).

I think this is a powerful and constructive approach - my own thesis (as yet unpublished) includes some quite similar work on Italian left-wing armed groups of the 1970s, whose presentation in both the mainstream and the Communist press was heavily shaped by differing ideological assumptions. But I think it should be recognised that it's an approach of a different order from the other two. To combine them would be to mix ontological and epistemological arguments - to say, in other words, That's what is officially labelled terrorism, but this is real terrorism. (Or: That's what they call terrorism, but this is what we know to be the reality of terrorism.) The problem with this is that it implies a commitment to a particular idea of real terrorism, without actually suggesting a candidate. At best, this formulation frees the analyst to retain his or her prior commitments, bolstered with added ontological certitude. At worst, it suggests that real terrorism is the inverse of officially labelled terrorism - or at least that there is no possible overlap between officially labelled terrorism and real terrorism. This is surely inadequate: a critical approach should be able to do more with the official version than simply reverse it.

I believe that the study of terrorism must include all of these elements, and recognise that they may overlap but don't coincide. In other words, it must include the following:
  1. Organised political violence by non-state actors: 'terrorism' as a political intervention (call it T1)
  2. Indiscriminate large-scale attacks on civilians: terror as a tactic, in warfare or otherwise (T2)
  3. The constructed antagonist of the War on Terror: 'Terrorism' as object of discourse (T3)
We can think of it as a three-circle Venn diagram, with areas of intersection between each pair of circles and a triple intersection in the middle.



What is immediately apparent about this list is how little of the field of terrorism falls into all three categories. The (white) triple intersect - mass killing of civilians by a non-state political actor, officially labelled (and denounced) as terrorism - is represented by a relatively small number of horrific events, chief among them September 11th. By contrast, much of what students of terrorism - myself included - would like to be able to look at under that name falls into only two categories, or even one. The (red) intersect of T1 and T3, most obviously, is represented by those acts by armed groups which are officially denounced but don't involve mass killing of civilians: the 'execution' of Aldo Moro and the IRA's Brighton bomb, for example. The use of terror tactics by non-governmental death squads, such as the Nicaraguan Contras and the Salvadorean ORDEN militia, falls into the blue intersect of T1 and T2. The use of state terror by official enemies and 'rogue states' - such as the Syrian Hama massacre or Saddam Hussein's gassing of the people of Halabja - falls into the green intersect of T2 and T3. And this is without considering all those activities which fall into only one category: T1 (magenta) alone, activities by armed groups which fall below the radar of the discourse of 'terrorism' (a large and interesting category); T2 (cyan) alone, terror tactics used by states and not denounced as terrorism; and T3 (yellow) alone, officially-denounced 'terrorism' which involves neither an organised armed group nor a mass attack on civilians.

I don't, myself, see any problem with studying all three of these categories - or rather, all seven. I hope the remit of the new Critical Terrorism Studies is broad enough to encompass all of these without imposing an artificial unity on them. Paramilitary fundraising in Northern Ireland cannot be studied in the same way as the attack on Fallujah or press reporting of the 'ricin plot'; each of these deserves to be studied, however, and the different approaches appropriate to studying them can only strengthen the field.

Monday, September 18, 2006

The people with the answers

Nick:
Larry Sanger, the controversial online encyclopedia's cofounder and leading apostate, announced yesterday, at a conference in Berlin, that he is spearheading the launch of a competitor to Wikipedia called The Citizendium. Sanger describes it as "an experimental new wiki project that combines public participation with gentle expert guidance."

The Citizendium will begin as a "fork" of Wikipedia, taking all of Wikipedia's current articles and then editing them under a new model that differs substantially from the model used by what Sanger calls the "arguably dysfunctional" Wikipedia community. "First," says Sanger, in explaining the primary differences, "the project will invite experts to serve as editors, who will be able to make content decisions in their areas of specialization, but otherwise working shoulder-to-shoulder with ordinary authors. Second, the project will require that contributors be logged in under their own real names, and work according to a community charter. Third, the project will halt and actually reverse some of the 'feature creep' that has developed in Wikipedia."

I've been thinking about Wikipedia, and about what makes a bad Wikipedia article so bad, for some time - this March 2005 post took off from some earlier remarks by Larry Sanger. I'm not attempting to pass judgment on Wikipedia as a whole - there are plenty of good Wikipedia articles out there, and some of them are very good indeed. But some of them are bad. Picking on an old favourite of mine, here's the first paragraph of the Wikipedia article on the Red Brigades, with my comments.

The Red Brigades (Brigate Rosse in Italian, often abbreviated as BR) are

The word is 'were'. The BR dissolved in 1981; its last successor group gave up the ghost in 1988. There's a small and highly violent group out there somewhere which calls itself "Nuove Brigate Rosse" - the New Red Brigades - but its continuity with the original BR is zero. This is a significant disagreement, to put it mildly.

a militant leftist group located in Italy. Formed in 1970, the Marxist Red Brigades

'Marxist' is a bizarre choice of epithet. Most of the Italian radical left was Marxist, and almost all of it declined to follow the BR's lead. Come to that, the Italian Communist Party (one of the BR's staunchest enemies) was Marxist. Terry Eagleton's a Marxist; Jeremy Hardy's a Marxist; I'm a Marxist myself, pretty much. The BR had a highly unusual set of political beliefs, somewhere between Maoism, old-school Stalinism and pro-Tupamaro insurrectionism. 'Maoist' would do for a one-word summary. 'Marxist' is both over-broad and misleading.

sought to create a revolutionary state through armed struggle

Well, yes. And no. I mean, I don't think it's possible to make any sense of the BR without acknowledging that, while they did have a famous slogan about portare l'attacco al cuore dello stato ('attacking at the heart of the state'), their anti-state actions were only a fairly small element of what they did. To begin with they were a factory-based group, who took action against foremen and personnel managers; in their later years - which were also their peak years - the BR, like other armed groups, got drawn into what was effectively a vendetta with the police, prioritising revenge attacks over any kind of 'revolutionary' programme. You could say that the BR were a revolutionary organisation & consequently had a revolutionary programme throughout, even if their actions didn't always match it - but how useful would this be?

and to separate Italy from the Western Alliance

Whoa. I don't think the BR were particularly in favour of Italy's NATO membership, but the idea that this was one of their key goals is absurd. If the BR had been a catspaw for the KGB, intent on fomenting subversion so as to destabilise Italy, then this probably would have been high on their list. But they weren't, and it wasn't.

In 1978, they kidnapped and killed former Prime Minister Aldo Moro under obscure circumstances.

Remarkably well-documented circumstances, I'd have said.

After 1984's scission

This is just wrong - following growing and unresolvable factionalism, the BR formally dissolved in October 1981.

Red Brigades managed with difficulty to survive the official end of the Cold War in 1989

This is both confused and wrong. Given that there was a split, how would the BR have survived beyond 1981 (or 1984), let alone 1989? As for the BR's successor groups, the last one to pack it in was last heard from in 1988.

even though it is now a fragile group with no original members.

Or rather, even though the name is now used by a small group about which very little is know, but which is not believed to have any connection to the original group (whose members are after all knocking on a bit by now).

Throughout the 1970’s the Red Brigades were credited with 14,000 acts of violence.

Good grief. Credited by whom? According to the sources I've seen, between 1970 and 1981 Italian armed struggle groups were responsible for a total of 3,258 actions, including 110 killings; the BR's share of the total came to 472 actions, including 58 killings. (Most 'actions' consisted of criminal damage and did not involve personal violence.) I'd be the first to admit that the precision of these figures is almost certainly spurious, but even if we doubled that figure of 472 we'd be an awful long way short of 14,000.

I'm not even going to look at the body of the article.

I think there are two main problems here; the good news is that Larry's proposals for the neo-Wikipedia (Nupedia? maybe not) would address both of them.

Firstly, first mover advantage. The structure of Wikipedia creates an odd imbalance between writers and editors. Writing a new article is easy: the writer can use whatever framework he or she chooses, in terms both of categories used to structure the entry and of the overall argument of the piece. Making minor edits to an article is easy: mutter 1984? no way, it was 1981!, log on, a bit of typing and it's done. But making major edits is hard - you can see from the comments above just how much work would be needed to make that BR article acceptable, starting from what's there now. It would literally be easier to write a new article. What's more, making edits stick is hard; I deleted one particularly ignorant falsehood from the BR article myself a few months ago, only to find my edit reverted the next day. (Of course, I re-reverted it. So there!)

Larry's suggestion of getting experts on board is very much to the point here. Slap my face and call me a credentialled academic, but I don't believe that everyone is equally qualified to write an encyclopedia article about their favourite topic - and I do think it matters who gets the first go.

Secondly, gaming the system. Wikipedia is a community as well as an encyclopedia. I'll pass over Larry's suggestion that Wikipedia is dysfunctional as a community, but I do think it's arguable that some behaviours which work well for Wikipedia-the-community are dysfunctional for Wikipedia-the-resource. It's been suggested, for instance, that what really makes Wikipedia special is the 'history' pages, which take the lid off the debate behind the encyclopedia and let us see knowledge in the process of formation. It follows from this that to show the world a single, 'definitive' version of an article on a subject would actually be a step backwards: The discussion tab on Wikipedia is a great place to point to your favorite version ... Does the world need a Wikipedia for stick-in-the-muds? W. A. Gerrard objects:
Of what value is publicly documenting the change history of an encyclopedia entry? How can something that purports to be authoritative allow the creation of alternative versions which readers can adopt as favorites?

If an attempt to craft a wiki that strives for accuracy, even via a flawed model, is considered something for “stick-in-the-muds”, then it’s apparent that many of Wikipedia’s supporters value the dynamics of its community more than the credibility of the product they deliver.

I think this is exactly right: the history pages are worth much more to members of the Wikipedia community than to Wikipedia users. People like to form communities and communities like to chat - and edits and votes are the currency of Wikipedia chat. And gaming the system is fun (hence the word 'game'). Aaron Swartz quotes comments about Wikipedia regulars who delete your newly[-]create[d] article without hesitation, or revert your changes and accuse you of vandalis[m] without even checking the changes you made, or who "edited" thousands of articles ... [mostly] to remove material that they found unsuitable. This clearly suggest the emergence of behaviours which are driven more by social expectations than by a concern for Wikipedia. The second writer quoted above continues: Indeed, some of the people-history pages contained little "awards" that people gave each other -- for removing content from Wikipedia.

Now, all systems can be gamed, and all communities chat. The question is whether the chatting and the gaming can be harnessed for the good of the encyclopedia - or, failing that, minimised. I'm not optimistic about the first possibility, and I suspect Larry Sanger isn't either. Larry does, however, suggest a very simple hack which would help with the second: get everyone to use their real name. This would, among other things, make it obvious when a writer had authority in a given area. I don't entirely agree with Aaron's conclusion:
Larry Sanger famously suggested that Wikipedia must jettison its anti-elitism so that experts could feel more comfortable contributing. I think the real solution is the opposite: Wikipedians must jettison their elitism and welcome the newbie masses as genuine contributors to the project, as people to respect, not filter out.

This is half right: Wikipedia-the-community has produced an elite of 'regulars', whose influence over Wikipedia-the-resource derives from their standing in the community rather than from any kind of claim to expertise. I agree with Aaron that this is an unhealthy situation, but I think Larry was right as well. The artificial elitism of the Wikipedia community doesn't only marginalise the 'masses' who contribute most of the original content; it also sidelines the subject-area experts who, within certain limited domains, have a genuine claim to be regarded as an elite.

I don't know if the Citizendium is going to address these problems in practice; I don't know if the Citizendium is going anywhere full stop. But I think Larry Sanger is asking the right questions. It's increasingly clear that Wikipedia isn't just facing in two directions at once, it's actually two different things - and what's good for Wikipedia-the-community isn't necessarily good for Wikipedia-the-resource.

Tuesday, September 05, 2006

Back in the garage

I have begun to see what I think is a promising trend in the publishing world that may just transform the industry for good.

Paul Hartzog's Many-to-Many post on publishing draws some interesting conclusions from the success of Charlie Stross's Accelerando (nice one, Charlie). but makes me a bit nervous, partly because of the liberal use of excitable bolding.
What I am suggesting is happening is the reversal of traditional publishing, i.e. the transformation of the system in which authors create and distribute their work. In the old system, it is assumed that the publishing process acts as a quality control filter ... but it ends up merely being a profit-capturing filter.
[...]
Conversely, in the new system, the works are made available, and it is up to the community-at-large to pass judgement on their quality. In the emerging system, authors create and distribute their work, and readers, individually and collectively, including fans as well as editors and peers, review, comment, rank, and tag, everything.

Setting aside the formatting - and the evangelistic tone, something which never fails to set my teeth on edge - this is all interesting stuff. My problem is that I'm not sure about the economics of it. It's not so much that writers won't write if they don't get paid - writers will write, full stop - as that writers won't eat if they don't get paid: some money has to change hands some time. If the kind of development Paul is talking about takes hold, I can imagine a range of more-or-less unintended consequences, all with different overtones but few of them, to this jaundiced eye, particularly desirable:
  1. Mass amateurisation means that nobody pays for anything, which in turn means that nobody makes a living from writing; this is essentially the RIAA/BPI anti-filesharing nightmare scenario, transposed to literature
  2. Mass amateurisation doesn't touch the Dan Brown/Katie Price market, but gains traction in specialist areas of literature to the point where nobody can make a living from writing unless they're writing for the mass market; this is Charlie Gillett's argument for keeping CDs expensive (and the line the BPI would use against filesharing if they had any sense)
  3. Downloads like Accelerando function essentially as tasters and people end up buying just as many actual books, if not more; this scenario will also be familiar from filesharing arguments, as it's the line generally used to counter the previous two
  4. Mass amateur production becomes a new sphere of economic activity, linked in with and subordinate to the major mainstream operators: this is the MySpace scenario (at least, the MySpace makes money for Murdoch scenario)
  5. Mass amateur production becomes a new sphere of non-economic activity, with a few star authors subsidised by publishing companies for the sake of the cachet they bring: the open source scenario
  6. Mass amateur production becomes a new sphere of economic activity, existing on the margins and in the shadows, out of the reach of the major mainstream operators: the punk scenario (or, for older readers, the hippie scenario)
We can dismiss the first, RIAA-nightmare scenario. The third ('tasters') would be bearable, although it wouldn't go halfway to justifying Paul's argument. Most of the rest look pretty ghastly to me. Perhaps Paul is thinking in terms of the last scenario or something like it - but in that case I'd have to say that his optimism is just as misplaced, for different but related reasons, as the pessimism of the first scenario (although a new wave of garage literature would be a fine thing to see).

The trouble with making your own history is that you don't do it in circumstances of your own choosing. The participatory buzz of Web 2.0 tends to eat away at the structural and procedural walls that stop people getting their hands on stuff - but that can just mean that only the strongest and highest walls are left standing. Besides, walls can be useful, particularly if you want to keep a roof over your head.

Thursday, August 31, 2006

We're all together now, dancing in time

Ryan Carson:
I’d love to add friends to my Flickr account, add my links to del.icio.us, browse digg for the latest big stories, customise the content of my Netvibes home page and build a MySpace page. But you know what? I don’t have time and you don’t either...

Read the whole thing. What's particularly interesting is a small straw poll at the end of the article, where Ryan asks people who actually work on this stuff what social software apps they use on a day-to-day basis. Six people made 30 nominations in all; Ryan had five of his own for a total of 35.

Here are the apps which got more than one vote:

Flickr (four votes)
Upcoming (two)
Wikipedia (two)

And, er, that's it.

Social software looks like very big news indeed from some perspectives, but when it's held to the standard of actually helping people get stuff done, it fades into insignificance. I think there are three reasons for this apparent contradiction. First, there's the crowd effect - and, since you need a certain number of users before network effects start taking off, any halfway-successful social software application has a crowd behind it. It can easily look as if everyone's doing it, even if the relevant definition of 'everyone' looks like a pretty small group to you and me.

Then there's the domain effect: tagging and user-rating are genuinely useful and constructive, in some not very surprising ways, within pre-defined domains. (Think of a corporate intranet app, where there is no need for anyone to specify that 'Dunstable' means one of the company's offices, 'Barrett' means the company's main competitor and 'Monkey' means the payroll system.) For anyone who is getting work done with tagging, in other words, tagging is going to look pretty good - and, thanks to the crowd effect, it's going to look like a good thing that everyone's using.

Thirdly, social software is new, different, interesting and fun, as something to play with. It's a natural for geeks with time to play with stuff and for commentators who like writing about new and interesting stuff - let alone geek commentators. The hype generates itself; it's the kind of development that's guaranteed to look bigger than it is.

Put it all together - and introduce feedback effects, as the community of geek commentators starts to find social software apps genuinely useful within its specialised domain - and social software begins to look like a Tardis in reverse: much, much bigger on the outside than it is on the inside.

That's not to say that social software isn't interesting, or that it isn't useful. But I think that in the longer term those two facets will move apart: useful and productive applications of tagging will be happening under the commentator radar, often behind organisational firewalls, while the stuff that's interesting and fun to play with will remain... interesting and fun to play with.

Thursday, August 03, 2006

So much that hides

Alex points to this piece by Rashmi Sinha on 'Findability with tags': the vexed question of using tags to find the material that you've tagged, rather than as an elaborate way of building a mind-map.

I should stress, parenthetically, that that last bit wasn't meant as a putdown - it actually describes my own use of Simpy. I regularly tag pages, but almost never use tags to actually retrieve them. Sometimes - quite rarely - I do pull up all the pages I've tagged with a generic "write something about this" tag. Apart from that, I only ever ask Simpy two questions: one is "what was that page I tagged the other day?" (for which, obviously, meaningful tags aren't required); the other is "what does my tag cloud look like?".

Now, you could say that the answer to the second question isn't strictly speaking information; it's certainly not information I use, unless you count the time I spend grooming the cloud by splitting, merging and deleting stray tags. I like tag clouds and don't agree with Jeffrey Zeldman's anathema, but I do agree with Alex that they're not the last word in retrieving information from tags. Which is where Rashmi's article comes in.

Rashmi identifies three ways of layering additional information on top of the basic item/tag pairing, all of which hinge on partitioning the tag universe in different ways. This is most obvious in the case of faceted tagging: here, the field of information is partitioned before any tags are applied. Rashmi cites the familiar example of wine, where a 'region' tag would carry a different kind of information from 'grape variety', 'price' or for that matter 'taste'. Similar distinctions can be made in other areas: a news story tagged 'New Labour', 'racism' and 'to blog about' is implicitly carrying information in the domains 'subject (political philosophy)', 'subject (social issue)' and 'action to take'.

There are two related problems here. A unique tag, in this model, can only exist within one dimension: if I want separate tags for New Labour (the people) and New Labour (the philosophy), I'll either have to make an artificial distinction between the two (New_Labour vs New_Labour_philosophy) or add a dimension layer to my tags (political_party.New_Labour vs political_philosophy.New_Labour). Both solutions are pretty horrible. More broadly, you can't invoke a taxonomist's standby like the wine example without setting folksonomic backs up, and with some reason: part of the appeal of tagging is precisely that you start with a blank sheet and let the domains of knowledge emerge as they may.

Clustered tagging (a new one on me) addresses both of these problems, as well as answering the much-evaded question of how those domains are supposed to emerge. A tag cluster - as seen on Flickr - consists of a group of tags which consistently appear together, suggesting an implicit 'domain'. Crucially, a single tag can occur in multiple clusters. The clusters for the Flickr 'election' tag, for example, are easy to interpret:

vote, politics, kerry, bush, voting, ballot, poster, cameraphone, democrat, president

wahl, germany, deutschland, berlin, cdu, spd, bundestagswahl

canada, ndp, liberal, toronto, jacklayton, federalelection


and, rather anticlimactically,

england, uk

Clustering, I'd argue, represents a pretty good stab at building emergent domains. The downside is that it only becomes possible when there are huge numbers of tagging operations.

The third enhancement to tagging Rashmi describes is the use of tags as pivots:
When everything (tag, username, number of people who have bookmarked an item) is a link, you can use any of those links to look around you. You can change direction at any moment.

Lurking behind this, I think, is Thomas's original tripartite definition of 'folksonomy':
the three needed data points in a folksonomy tool [are]: 1) the person tagging; 2) the object being tagged as its own entity; and 3) the tag being used on that object. Flattening the three layers in a tool in any way makes that tool far less valuable for finding information. But keeping the three data elements you can use two of the elements to find a third element, which has value. If you know the object (in del.icio.us it is the web page being tagged) and the tag you can find other individuals who use the same tag on that object, which may lead (if a little more investigation) to somebody who has the same interest and vocabulary as you do. That person can become a filter for items on which they use that tag.

This, I think, is pivoting in action: from the object and its tags, to the person tagging and the tags they use, to the person using particular tags and the objects they tag. (There's a more concrete description here.)

Alex suggests that using tags as pivots could also be considered a subset of faceted browsing. I'd go further, and suggest that facets, clusters and pivots are all subsets of a larger set of solutions, which we can call domain-based tagging. If you use facets, the domains are imposed: this approach is a good fit to relatively closed domains of knowledge and finite groups of taggers. If you've got an epistemological blank sheet and a limitless supply of taggers, you can allow the domains to emerge: this is where clusters come into their own. And if what you're primarily interested in is people - and, specifically, who's saying what about what - then you don't want multiple content-based domains but only the information which derives directly from human activity: the objects and their taggers. Or rather, you want the objects and the taggers, plus the ability to pivot into a kind of multi-dimensional space: instead of tags existing within domains, each tag is a domain in its own right, and what you can find within each tag-domain is the objects and their taggers.

What all of this suggests is that, unsurprisingly, there is no 'one size fits all' solution. I suggested some time ago that
If 'cloudiness' is a universal condition, del.icio.us and Flickr and tag clouds and so forth don't enable us to do anything new; what they are giving us is a live demonstration of how the social mind works.

All knowledge is cloudy; all knowledge is constructed through conversation; conversation is a way of dealing with cloudiness and building usable clouds; social software lets us see knowledge clouds form in real time. I think that's fine as far as it goes; what it doesn't say is that, as well as having conversations about different things, we're having different kinds of conversations and dealing with the cloud of knowing in different ways. Ontology is not, necessarily, overrated; neither is folksonomy.

Wednesday, July 05, 2006

The users geeks don't see

Nick writes, provocatively as ever, about the recent 'community-oriented' redesign of the netscape.com portal:
A few days ago, Netscape turned its traditional portal home page into a knockoff of the popular geek news site Digg. Like Digg, Netscape is now a "news aggregator" that allows users to vote on which stories they think are interesting or important. The votes determine the stories' placement on the home page. Netscape's hope, it seems, is to bring Digg's hip Web 2.0 model of social media into the mainstream. There's just one problem. Normal people seem to think the entire concept is ludicrous.

Nick cites a post titled Netscape Community Backlash, from which this line leapt out at me:
while a lot of us geeks and 2.0 types are addicted to our own technology (and our own voices, to be honest), it's pretty darn obvious that A LOT of people want to stick with the status quo

This reminded me of a minor revelation I had the other day, when I was looking for the Java-based OWL reasoner 'pellet'. I googled for
pellet owl
- just like that, no quotes - expecting to find a 'pellet' link at the bottom of forty or fifty hits related to, well, owls and their pellets. In fact, the top hit was "Pellet OWL Reasoner". (To be fair, if you google
owl pellet
you do get the fifty pages of owl pellets first.)

I think it's fair to say that the pellet OWL reasoner isn't big news even in the Web-using software development community; I'd be surprised if everyone reading this post even knows what an OWL reasoner is (or has any reason to care). But there's enough activity on the Web around pellet to push it, in certain circumstances, to the top of the Google rankings (see for yourself).

Hence the revelation: it's still a geek Web. Or rather, there's still a geek Web, and it's still making a lot of the running. When I first started using the Internet, about ten years ago, there was a geek Web, a hobbyist Web, an academic Web (small), a corporate Web (very small) and a commercial Web (minute) - and the geek Web was by far the most active. Since then the first four sectors have grown incrementally, but the commercial Web has exploded, along with a new sixth sector - the Web-for-everyone of AOL and MSN and MySpace and LiveJournal (and blogs), whose users vastly outnumber those of the other five. But the geek Web is still where a lot of the new interesting stuff is being created, posted, discussed and judged to be interesting and new.

Add social software to the mix - starting, naturally, within the geek Web, as that's where it came from - and what do you get? You get a myth which diverges radically from the reality. The myth is that this is where the Web-for-everyone comes into its own, where millions of users of what was built as a broadcast Web with walled-garden interactive features start talking back to the broadcasters and breaking out of their walled gardens. The reality is that the voices of the geeks are heard even more loudly - and even more disproportionately - than before. Have a look at the 'popular' tags on del.icio.us: as I write, six of the top ten (including all of the top five) relate directly to programmers, and only to programmers. (Number eight reads: "LinuxBIOS - aims to replace the normal BIOS found on PCs, Alphas, and other machines with a Linux kernel". The unglossed reference to Alphas says it all.) Of the other four, one's a political video, two are photosets and one is a full-screen animation of a cartoon cat dancing, rendered entirely in ASCII art. (Make that seven of the top ten.)

I'm not a sceptic about social software: ranking, tagging, search-term-aggregation and the other tools of what I persist in calling ethnoclassification are both new and powerful. But they're most powerful within a delimited domain: a user coming to del.icio.us for the first time should be looking for the 'faceted search' option straight away ("OK, so that's the geek cloud, how do I get it to show me the cloud for European history/ceramics/Big Brother?") The fact that there is no 'faceted search' option is closely related, I'd argue, to the fact that there is no discernible tag cloud for European history or ceramics or Big Brother: we're all in the geek Web. (Even Nick Carr.) (Photography is an interesting exception - although even there the only tags popular enough to make the del.icio.us tag cloud are 'photography', 'photo' and 'photos'. There are 40 programming-related tags, from ajax to xml.)

Social software wasn't built for the users of the Web-for-everyone. Reaction to the Netscape redesign tells us (or reminds us) that there's no reason to assume they'll embrace it.

Update Have a look at Eszter Hargittai's survey of Web usage among 1,300 American college students, conducted in February and March 2006. MySpace is huge, and Facebook's even huger, but Web 2.0 as we know it? It's not there. 1.9% use Flickr; 1.6% use Digg; 0.7% use del.icio.us. Answering a slightly different question, 1.5% have ever visited Boingboing, and 1% Technorati. By contrast, 62% have visited CNN.com and 21% bbc.co.uk. It's still, very largely, a broadcast Web with walled-garden interactivity. Comparing results like these with the prophecies of tagging replacing hierarchy, Long Tail production and mashups all round, I feel like invoking the story of the blind men and the elephant - except that I'm not even sure we've all got the same elephant.

Monday, June 12, 2006

We hear the sound of machines

Sooner or later, the Internet will need to be saved from Google. Because Google - which appears to be an integral part of the information-wants-to-be-free Net dream, the search engine which gives life to the hyperlinked digital nervous system of a kind of massively-distributed Xanadu project - is nothing of the sort. Google is a private company; Google's business isn't even search. Google's business is advertising - and, whatever we think about how well search goes together with tagging and folksonomic stumbling-upon, search absolutely doesn't go with advertising. (Update 15th June: this is a timely reminder that Google is a business, and its business is advertising. Mass personalisation, online communities, interactive rating and ranking, it's all there - and it's all about the advertising.)

I had thought that, in the context of plain vanilla Web search, Google actually had this cracked - that the prominence of 'sponsored links', displayed separately from search results, allowed them to deliver an unpolluted service and still make money. I hadn't reckoned with AdSense. AdSense doesn't in itself pollute Google's search results. What it does is far worse: it encourages other people to pollute the Net. Which will mean, ultimately, that Google will paint (or choke) itself into a corner - but that, if we're not careful, an awful lot of users will be stuck in that corner with them.

For a much fuller and more cogent version of this argument, read Seth Jayson (via Scott). One point in particular stood out: Google (Nasdaq: GOOG) insiders are continuing to drop shares on the public at a rate that boggles the mind. It's true. Over the last year, as far as published records show, Sun insiders have sold $50,000 worth of shares, net. In the same period, IBM insiders have sold $6,500,000; Microsoft insiders have sold $1,500,000,000; and Google insiders have sold $5,000,000,000. See for yourself. That's a lot of shares.

Monday, June 05, 2006

I couldn't make it any simpler

I hate to say this - I've always loathed VR boosters and been highly sceptical about the people they boost - but Jaron Lanier's a bright bloke. His essay Digital Maoism doesn't quite live up to the title, but it's well worth reading (thanks, Thomas).

I don't think he quite gets to the heart of the current 'wisdom of the crowds' myth, though. It's not Maoism so much as Revivalism: there's a tight feedback loop between membership of the collective, collective activity and (crucially) celebration of the activity of the collective. Or: celebration of process rather than end-result - because the process incarnates the collective.

Put it this way. Say that (for example) the Wikipedia page on the Red Brigades is wildly wrong or wildly inadequate (which is just as bad); say that the tag cloud for an authoritative Red Brigades resource is dominated by misleading tags ('kgb', 'ussr', 'mitrokhin'...). Would a wikipedian or a 'folksonomy' advocate see this situation as a major problem? Not being either I can't give an authoritative answer, but I strongly suspect the answer would be No: it's all part of the process, it's all part of the collective self-expression of wikipedians and the growth of the folksonomy, and if the subject experts don't like it they should just get their feet wet and start tagging and editing themselves. And if, in practice, the experts don't join in - perhaps, in the case of Wikipedia, because they don't have the stomach for the kind of 'editing' process which saw Jaron Lanier's own corrections get reverted? Again, I don't know for sure, but I suspect the answer would be another shrug: the wiki's open to all - and tagspace couldn't be more open - so who's to blame, if you can't make your voice heard, but you? There's nothing inherently wrong with the process, except that you're not helping to improve it. There's nothing inherently wrong with the collective, except that you haven't joined it yet.

Two quotes to clarify (hopefully) the connection between collective and process. Michael Wexler:
our understanding of things changes and so do the terms we use to describe them. How do I solve that in this open system? Do I have to go back and change all my tags? What about other people’s tags? Do I have to keep in mind all the variations on tags that reflect people’s different understanding of the topics?

The social connected model implies that the connections are the important part, so that all you need is one tag, one key, to flow from place to place and discover all you need to know. But the only people who appear to have time to do that are folks like Clay Shirky. The rest of us need to have information sorted and organized since we actually have better things to do than re-digest it.
<...>
What tagging does is attempt to recreate the flow of discovery. That’s fine… but what taxonomy does is recreate the structure of knowledge that you’ve already discovered. Sometimes, I like flowing around and stumbling on things. And sometimes, that’s a real pita. More often than not, the tag approach involves lots of stumbling around and sidetracks.
<...>
It's like Family Feud [a.k.a. Family Fortunes - PJE]. You have to think not of what you might say to a question, you have to guess what the survey of US citizens might say in answer to a question. And that’s really a distraction if you are trying to just answer the damn question.

And our man Lanier:
there's a demonstrative ritual often presented to incoming students at business schools. In one version of the ritual, a large jar of jellybeans is placed in the front of a classroom. Each student guesses how many beans there are. While the guesses vary widely, the average is usually accurate to an uncanny degree.

This is an example of the special kind of intelligence offered by a collective. It is that peculiar trait that has been celebrated as the "Wisdom of Crowds,"
<...>
The phenomenon is real, and immensely useful. But it is not infinitely useful. The collective can be stupid, too. Witness tulip crazes and stock bubbles. Hysteria over fictitious satanic cult child abductions. Y2K mania. The reason the collective can be valuable is precisely that its peaks of intelligence and stupidity are not the same as the ones usually displayed by individuals. Both kinds of intelligence are essential.

What makes a market work, for instance, is the marriage of collective and individual intelligence. A marketplace can't exist only on the basis of having prices determined by competition. It also needs entrepreneurs to come up with the products that are competing in the first place. In other words, clever individuals, the heroes of the marketplace, ask the questions which are answered by collective behavior. They put the jellybeans in the jar.

To illustrate this, once more (just the once) with the Italian terrorists. There are tens of thousands of people, at a conservative estimate, who have read enough about the Red Brigades to write that Wikipedia entry: there are a lot of ill-informed or partially-informed or tendentious books about terrorism out there, and some of them sell by the bucketload. There are probably only a few hundred people who have read Gian Carlo Caselli and Donatella della Porta's long article "The History of the Red Brigades: Organizational structures and Strategies of Action (1970-82)" - and I doubt there are twenty who know the source materials as well as the authors do. (I'm one of the first group, obviously, but certainly not the second.) Once the work's been done anyone can discover it, but discovery isn't knowledge: the knowledge is in the words on the pages, and ultimately in the individuals who wrote them. They put the jellybeans in the jar.

This is why (an academic writes) the academy matters, and why academic elitism is - or at least can be - both valid and useful. Jaron:
The balancing of influence between people and collectives is the heart of the design of democracies, scientific communities, and many other long-standing projects. There's a lot of experience out there to work with. A few of these old ideas provide interesting new ways to approach the question of how to best use the hive mind.
<...>
Scientific communities ... achieve quality through a cooperative process that includes checks and balances, and ultimately rests on a foundation of goodwill and "blind" elitism — blind in the sense that ideally anyone can gain entry, but only on the basis of a meritocracy. The tenure system and many other aspects of the academy are designed to support the idea that individual scholars matter, not just the process or the collective.

I'd go further, if anything. Academic conversations may present the appearance of a collective, but it's a collective where individual contributions are preserved and celebrated ("Building on Smith's celebrated critique of Jones, I would suggest that Smith's own analysis is vulnerable to the criticisms advanced by Evans in another context..."). That is, academic discourse looks like a conversation - which wikis certainly can do, although Wikipedia emphatically doesn't.

The problem isn't the technology, in other words: both wikis and tagging could be ways of making conversation visible, which inevitably means visualising debate and disagreement. The problem is the drive to efface any possibility of conflict, effectively repressing the appearance of debate in the interest of presenting an evolving consensus. (Or, I could say, the problem is the tendency of people to bow and pray to the neon god they've made, but that would be a bit over the top - and besides, Simon and Garfunkel quotes are far too obvious.)

Update 13th June

I wrote (above): It's not Maoism so much as Revivalism: there's a tight feedback loop between membership of the collective, collective activity and (crucially) celebration of the activity of the collective. Or: celebration of process rather than end-result - because the process incarnates the collective.

Here's Cory Doctorow, responding to Lanier:
Wikipedia isn't great because it's like the Britannica. The Britannica is great at being authoritative, edited, expensive, and monolithic. Wikipedia is great at being free, brawling, universal, and instantaneous.
<...>
If you suffice yourself with the actual Wikipedia entries, they can be a little papery, sure. But that's like reading a mailing-list by examining nothing but the headers. Wikipedia entries are nothing but the emergent effect of all the angry thrashing going on below the surface. No, if you want to really navigate the truth via Wikipedia, you have to dig into those "history" and "discuss" pages hanging off of every entry. That's where the real action is, the tidily organized palimpsest of the flamewar that lurks beneath any definition of "truth." The Britannica tells you what dead white men agreed upon, Wikipedia tells you what live Internet users are fighting over.

The Britannica truth is an illusion, anyway. There's more than one approach to any issue, and being able to see multiple versions of them, organized with argument and counter-argument, will do a better job of equipping you to figure out which truth suits you best.

Quoting myself again, There's nothing inherently wrong with the process, except that you're not helping to improve it. There's nothing inherently wrong with the collective, except that you haven't joined it yet.

Thursday, May 25, 2006

When there is no outside

Nick Carr's hyperbolically-titled The Death of Wikipedia has received a couple of endorsements and some fairly vigorous disagreement, unsurprisingly. I think it's as much a question of tone as anything else. When Nick reads the line
certain pages with a history of vandalism and other problems may be semi-protected on a pre-emptive, continuous basis.

it clearly sets alarm bells ringing for him, as indeed it does for me ("Ideals always expire in clotted, bureaucratic prose", Nick comments). Several of his commenters, on the other hand, sincerely fail to see what the big deal might be: it's only a handful of pages, it's only semi-protection, it's not that onerous, it's part of the continuing development of Wikipedia editing policies, Wikipedia never claimed to be a totally open wiki, there's no such thing as a totally open wiki anyway...

I think the reactions are as instructive as the original post. No, what Nick's pointing to isn't really a qualitative change, let alone the death of anything. But yes, it's a genuine problem, and a genuine embarrassment to anyone who takes the Wikipedian rhetoric seriously. Wikipedia ("the free encyclopedia that anyone can edit") routinely gets hailed for its openness and its authority, only not both at the same time - indeed, maximising one can always be used to justify limits on the other. As here. But there's another level to this discussion, which is to do with Wikipedia's resolution of the openness/authority balancing-act. What happens in practice is that the contributions of active Wikipedians take precedence over both random vandals and passing experts. In effect, both openness and authority are vested in the group.

In some areas this works well enough, but in others it's a huge problem. I use Wikipedia myself, and occasionally drop in an edit if I see something that's crying out for correction. Sometimes, though, I see a Wikipedia article that's just wrong from top to bottom - or rather, an article where verifiable facts and sustainable assertions alternate with errors and misconceptions, or are set in an overall argument which is based on bad assumptions. In short, sometimes I see a Wikipedia article which doesn't need the odd correction, it needs to be pulled and rewritten. I'm not alone in having this experience: here's Tom Coates on 'penis envy' and Thomas Vander Wal (!) on 'folksonomy', as well as me on 'anomie'.

It's not just a problem with philosophical concepts, either - I had a similar reaction more recently to the Wikipedia page on the Red Brigades. On the basis of the reading I did for my doctorate, I could rewrite that page from start to finish, leaving in place only a few proper names and one or two of the dates. But writing this kind of thing is hard and time-consuming work - and I've got quite enough of that to do already. So it doesn't get done.

I don't think this is an insurmountable problem. A while ago I floated a cunning plan for fixing pages like this, using PledgeBank to mobilise external reserves of peer-pressure; it might work, and if only somebody else would actually get it rolling I might even sign up. But I do think it's a problem, and one that's inherent to the Wikipedia model.

To reiterate, both openness and authority are vested in the group. Openness: sure, Wikipedia is as open to me as any other registered editor d00d, but in practice the openness of Wikipedia is graduated according to the amount of time you can afford to spend on it. As for authority, I'm not one, but (like Debord) I have read several good books - better books, to be blunt, than those relied on by the author[s] of the current Red Brigades article. But what would that matter unless I was prepared to defend what I wrote against bulk edits by people who disagreed - such as, for example, the author[s] of the current article? On the other hand, if I was prepared to stick it out through the edit wars, what would it matter whether I knew my stuff or not? This isn't just random bleating. When I first saw that Red Brigades article I couldn't resist one edit, deleting the completely spurious assertion that the group Prima Linea was a Red Brigades offshoot. When I looked at the page again the next day, my edit had been reverted.

Ultimately Wikipedia isn't about either openness or authority: it's about the collective activity of editing Wikipedia and being a Wikipedian. From that, all else follows.

Update 2/6/06 (in response to David, in comments)

There are two obvious problems with the Wikipedia page on the Brigate Rosse, and one that's larger but more diffuse. The first problem is that it's written in the present tense; it's extremely dubious that there's any continuity between the historic Brigate Rosse and the gang who shot Biagi, let alone that they're simply, unproblematically the same group. This alone calls for a major rewrite. Secondly, the article is written very much from a police/security-service/conspiracist stance, with a focus on question like whether the BR was assisted by the Czech security services or penetrated by NATO. But this tends to reinforce an image of the BR as a weird alien force which popped up out of nowhere, rather than an extreme but consistent expression of broader social movements (all of which has been documented).

The broader problem - which relates to both of the specific points - goes back to a problem with the amateur-encyclopedia format itself: Wikipedia implicitly asks what a given topic is, which prompts contributors to think of their topic as having a core, essential meaning (I wrote about this last year). The same problem can arise in a 'proper' encyclopedia, but there it's generally mitigated by expertise: somebody who's spent several years studying the broad Italian armed struggle scene is going to be motivated to relate the BR back to that scene, rather than presenting it as an utterly separate thing. The motivation will be still greater if the expert on the BR has also been asked to contribute articles on Prima Linea, the NAP, etc. This, again, is something that happens (and works, for all concerned) in the kind of restricted conversations that characterise academia, but isn't incentivised by the Wikipedia conversation - because the Wikipedia conversation doesn't go anywhere else. Doing Wikipedia is all about doing Wikipedia.

Monday, May 15, 2006

Who's there?

At Many-to-Many, Ross Mayfield reports that Clay Shirky and danah boyd have been thinking about "the lingering questions in our field", viz. the field of social software. I was a bit surprised to see that

How can communities support veterans going off topic together and newcomers seeking topical information and connections?

still qualifies as a 'lingering question'; I distinctly remember being involved in thrashing this one out, together with Clay, the best part of nine years ago. But this was the one that really caught my eye, if you'll pardon the expression:

What level of visual representation of the body is necessary to trigger mirror neurons?

Uh-oh. Sherry Turkle (subscription-only link):

a woman in a nursing home outside Boston is sad. Her son has broken off his relationship with her. Her nursing home is taking part in a study I am conducting on robotics for the elderly. I am recording the woman’s reactions as she sits with the robot Paro, a seal-like creature advertised as the first ‘therapeutic robot’ for its ostensibly positive effects on the ill, the elderly and the emotionally troubled. Paro is able to make eye contact by sensing the direction a human voice is coming from; it is sensitive to touch, and has ‘states of mind’ that are affected by how it is treated – for example, it can sense whether it is being stroked gently or more aggressively. In this session with Paro, the woman, depressed because of her son’s abandonment, comes to believe that the robot is depressed as well. She turns to Paro, strokes him and says: ‘Yes, you’re sad, aren’t you. It’s tough out there. Yes, it’s hard.’ And then she pets the robot once again, attempting to provide it with comfort. And in so doing, she tries to comfort herself.

What are we to make of this transaction? When I talk to others about it, their first associations are usually with their pets and the comfort they provide. I don’t know whether a pet could feel or smell or intuit some understanding of what it might mean to be with an old woman whose son has chosen not to see her anymore. But I do know that Paro understood nothing. The woman’s sense of being understood was based on the ability of computational objects like Paro – ‘relational artefacts’, I call them – to convince their users that they are in a relationship by pushing certain ‘Darwinian’ buttons (making eye contact, for example) that cause people to respond as though they were in relationship.

Further reading: see Kathy Sierra on mirror neurons and the contagion of negativity. See also Shelley's critique of Kathy's argument, and of attempts to enforce 'positive' feelings by manipulating mood. And see the sidebar at Many-to-Many, which currently reads as follows:
Recent Comments

viagra on Sanger on Seigenthaler’s criticism of Wikipedia

hydrocodone cheap on Sanger on Seigenthaler’s criticism of Wikipedia

viagra on Sanger on Seigenthaler’s criticism of Wikipedia

alprazolam online on Sanger on Seigenthaler’s criticism of Wikipedia

Timur on Sanger on Seigenthaler’s criticism of Wikipedia

Timur on Sanger on Seigenthaler’s criticism of Wikipedia

Recent Trackbacks

roulette: roulette

jouer casino: jouer casino

casinos on line: casinos on line

roulette en ligne: roulette en ligne

jeux casino: jeux casino

casinos on line: casinos on line

Tuesday, May 09, 2006

Some day this will all be yours

Scott Karp:
What if dollars have no place in the new economics of content?
...
In media 1.0, brands paid for the attention that media companies gathered by offering people news and entertainment (e.g. TV) in exchange for their attention. In media 2.0, people are more likely to give their attention in exchange for OTHER PEOPLE’S ATTENTION. This is why MySpace can’t effectively monetize its 70 million users through advertising — people use MySpace not to GIVE their attention to something that is entertaining or informative (which could thus be sold to advertisers) but rather to GET attention from other users.
...
MySpace can’t sell attention to advertisers because the site itself HAS NONE. Nobody pays attention to MySpace — users pay attention to each other, and compete for each other’s attention — it’s as if the site itself doesn’t exist.

You see the same phenomenon in blogging — blogging is not a business in the traditional sense because most people do it for the attention, not because they believe there’s any financial reward. What if the economics of media in the 21st century begin to look like the economics of poetry in the 20th century? — Lots of people do it for their own personal gratification, but nobody makes any money from it.

Pedantry first: it's inconceivable that we'll reach a point where nobody makes any money from the media, at least this side of the classless society. Even the hard case of blogging doesn't really stand up - I could name half a dozen bloggers who have made money or are making money from their blogs, without pausing to think.

It's a small point, but it's symptomatic of the enthusiastic looseness of Karp's argument. So I welcomed Nicholas Carr's counterblast, which puts Karp together with some recent comments by Esther Dyson:
"Most users are not trying to turn attention into anything else. They are seeking it for itself. For sure, the attention economy will not replace the financial economy. But it is more than just a subset of the financial economy we know and love."

Here's Carr:
I fear that to view the attention economy as "more than just a subset of the financial economy" is to misread it, to project on it a yearning for an escape (if only a temporary one) from the consumer culture. There's no such escape online. When we communicate to promote ourselves, to gain attention, all we are doing is turning ourselves into goods and our communications into advertising. We become salesmen of ourselves, hucksters of the "I." In peddling our interests, moreover, we also peddle the commodities that give those interests form: songs, videos, and other saleable products. And in tying our interests to our identities, we give marketers the information they need to control those interests and, in the end, those identities. Karp's wrong to say that MySpace is resistant to advertising. MySpace is nothing but advertising.

Now, this is good, bracing stuff, but I think Carr bends the stick a bit too far the other way. I know from my own experience that there's a part of my life labelled Online Stuff, and that most of my reward for doing Online Stuff is attention from other people doing Online Stuff. Real-world payoffs - money, work or just making new real-world friends - are nice to get, but they're not what it's all about.

The real trouble is that Karp has it backwards. Usenet - where I started doing Online Stuff, ten years ago - is a model of open-ended mutual whuffie exchange. (A very imperfect model, given the tendency of social groups to develop boundaries and hierarchies, but at least an unmonetised one.) Systematised whuffie trading came along later. The model case here is eBay, where there's a weird disconnect between meaning and value. Positive feedback doesn't really mean that you think the other person is a "great ebayer" - it doesn't really mean anything, any more than "A+++++" means something distinct from "A++++" or "A++++++". What it does convey is value: it makes it that much easier for the other person to make money. It also has attention-value, making the other person feel good for no particular real-world reason, but even this is quantifiable ("48! I'm up to 48!").

Ultimately Dyson and Carr are both right. The 'attention economy' of Online Stuff is new, absorbing and unlike anything that went before - not least because the way in which it gratifies fantasies of being truly appreciated, understood, attended to. But, to the extent that the operative model is eBay rather than Usenet, it is nothing other than a subset of the financial economy. Karp may be right about the specific case of MySpace, but I can't help distrusting his exuberance - not least because, in my experience, the suffix '2.0' is strongly associated with a search for new ways to cash in.

Thursday, April 27, 2006

Not a fish at all

On the subject of broadcast vs broadband, Tom writes:
There's nothing rapid about this transition at all. It's been happening in the background for fifteen years. So let me rephrase it in ways that I understand. Shock revelation! A new set of technologies has started to displace older technologies and will continue to do so at a fairly slow rate over the next ten to thirty years!
...
My sense of these media organisations that use this argument of incredibly rapid technology change is that they're screaming that they're being pursued by a snail and yet they cannot get away! 'The snail! The snail!', they cry. 'How can we possibly escape!?'. The problem being that the snail's been moving closer for the last twenty years one way or another and they just weren't paying attention.

In comments, Will writes:
If one person is claiming that the world is moving fairly slowly, and has some sound advice on what this might look like (as you are doing here), and another person is claiming that the world is moving extraordinarily quickly, but offers some quickfire measures through which to cope with this, the sense of emergency will win purely because it is present. From here, it almost becomes *risky* not to then adopt the quickfire measures suggested by the second person. Panic becomes a safer strategy than calmness. Which explains management consultancy...

and John asks:
does web2.0 count as a snail too?

But Web 2.0 is not a snail.

Web 2.0 is the people pointing and shouting 'The snail! The snail!'

Web 2.0 is also the people who overhear the first group and join in, shouting 'The whale! The whale!' and pointing vaguely upwards and towards the nearest ocean.

Web 2.0 is also the people who hear the second group and panic about the approaching whale, or is it a land-whale? what is a land-whale anyway? whatever it is, there's one coming and we'd all better... well, we'd better tell someone about it, anyway - I mean, there's a land-whale coming, how often does something like that happen?

Web 2.0 is also the people who hear the third group and improvise a land-whale parade, with floats and dancers and drummers and at its centre a giant paper land-whale held aloft by fifteen people, because, I don't know, but everyone was talking about land-whales and it just seemed like a good idea, you know?

And Web 2.0 is the people who come along halfway through the parade and sell the roadside spectators standing-room tickets.

Cloudbuilding (3)

By way of background to this post - and because I think it's quite interesting in itself - here's a short paper I gave last year at this conference (great company, shame about the catering). It was co-written with my colleagues Judith Aldridge and Karen Clarke. I don't stand by everything in it - as I've got deeper into the project I've moved further away from Clay's scepticism and closer towards people like Carole Goble and Keith Cole - but I think it still sets out an argument worth having.

Mind the gap: Metadata in e-social science

1. Towards the final turtle

It’s said that Bertrand Russell once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the centre of our galaxy. At the end of the lecture, a little old lady at the back of the room got up and said: “What you have told us is rubbish. The world is really a flat plate supported on the back of a giant tortoise.”

Russell smiled and replied, “What is the tortoise standing on?”

“You’re very clever, young man, very clever,” said the old lady. “But it’s turtles all the way down.”

The Russell story is emblematic of the logical fallacy of infinite regress: proposing an explanation which is just as much in need of explanation as the original fact being explained. The solution, for philosophers (and astronomers), is to find a foundation on which the entire argument can be built: a body of known facts, or a set of acceptable assumptions, from which the argument can follow.

But what if infinite regress is a problem for people who want to build systems as well as arguments? What if we find we’re dealing with a tower of turtles, not when we’re working backwards to a foundation, but when we’re working forwards to a solution?
WSDL [Web Services Description Language] lets a provider describe a service in XML [Extensible Markup Language]. [...] to get a particular provider’s WSDL document, you must know where to find them. Enter another layer in the stack, Universal Description, Discovery, and Integration (UDDI), which is meant to aggregate WSDL documents. But UDDI does nothing more than register existing capabilities [...] there is no guarantee that an entity looking for a Web Service will be able to specify its needs clearly enough that its inquiry will match the descriptions in the UDDI database. Even the UDDI layer does not ensure that the two parties are in sync. Shared context has to come from somewhere, it can’t simply be defined into existence. [...] This attempt to define the problem at successively higher layers is doomed to fail because it’s turtles all the way up: there will always be another layer above whatever can be described, a layer which contains the ambiguity of two-party communication that can never be entirely defined away. No matter how carefully a language is described, the range of askable questions and offerable answers make it impossible to create an ontology that’s at once rich enough to express even a large subset of possible interests while also being restricted enough to ensure interoperability between any two arbitrary parties.
(Clay Shirky)

Clay Shirky is a longstanding critic of the Semantic Web project, an initiative which aims to extend Web technology to encompass machine-readable semantic content. The ultimate goal is the codification of meaning, to the point where understanding can be automated. In commercial terms, this suggests software agents capable of conducting a transaction with all the flexibility of a human being. In terms of research, it offers the prospect of a search engine which understands the searches it is asked to run and is capable of pulling in further relevant material unprompted.

This type of development is fundamental to e-social science: a set of initiatives aiming to enable social scientists to access large and widely-distributed databases using ‘grid computing’ techniques.
A Computational Grid performs the illusion of a single virtual computer, created and maintained dynamically in the absence of predetermined service agreements or centralised control. A Data Grid performs the illusion of a single virtual database. Hence, a Knowledge Grid should perform the illusion of a single virtual knowledge base to better enable computers and people to work in cooperation.
(Keith Cole et al)

Is Shirky’s final turtle a valid critique of the visions of the Semantic Web and the Knowledge Grid? Alternatively, is the final turtle really a Babel fish — an instantaneous universal translator — and hence (excuse the mixed metaphors) a straw person: is Shirky setting the bar impossibly high, posing goals which no ‘semantic’ project could ever achieve? To answer these questions, it’s worth reviewing the promise of automated semantic processing, and setting this in the broader context of programming and rule-governed behaviour.

2. Words and rules

We can identify five levels of rule-governed behaviour. In rule-driven behaviour, firstly, ‘everything that is not compulsory is forbidden’: the only actions which can be taken are those dictated by a rule. In practice, this means that instructions must be framed in precise and non-contradictory terms, with thresholds and limits explicitly laid down to cover all situations which can be anticipated. This is the type of behaviour represented by conventional task-oriented computer programming.

A higher level of autonomy is given by rule-bound behaviour: rules must be followed, but there is some latitude in how they are applied. A set of discrete and potentially contradictory rules is applied to whatever situation is encountered. Higher-order rules or instructions are used to determine the relative priority of different rules and resolve any contradiction.

Rule-modifying behaviour builds on this level of autonomy, by making it possible to ‘learn’ how and when different rules should be applied. In practice, this means that priority between different rules is decided using relative weightings rather than absolute definitions, and that these weightings can be modified over time, depending on the quality of the results obtained. Neither rule-bound nor rule-modifying behaviour poses any fundamental problems in terms of automation.

Rule-discovering behaviour, in addition, allows the existing body of rules to be extended in the light of previously unknown regularities which are encountered in practice (“it turns out that many Xs are also Y; when looking for Xs, it is appropriate to extend the search to include Ys”). This level of autonomy — combining rule observance with reflexive feedback — is fairly difficult to envisage in the context of artificial intelligence, but not impossible.

The level of autonomy assumed by human agents, however, is still higher, consisting of rule-interpreting behaviour. Rule-discovery allows us to develop an internalised body of rules which corresponds ever more closely to the shape of the data surrounding us. Rule-interpreting behaviour, however, enables us to continually and provisionally reshape that body of rules, highlighting or downgrading particular rules according to the demands of different situations. This is the type of behaviour which tells us whether a ban is worth challenging, whether a sales pitch is to be taken literally, whether a supplier is worth doing business with, whether a survey’s results are likely to be useful to us. This, in short, is the level of Shirky’s situational “shared context” — and of the final turtle.

We believe that there is a genuine semantic gap between the visions of Semantic Web advocates and the most basic applications of rule-interpreting human intelligence. Situational information is always local, experiential and contingent; consequently, the data of the social sciences require interpretation as well as measurement. Any purely technical solution to the problem of matching one body of social data to another is liable to suppress or exclude much of the information which makes it valuable.

We cannot endorse comments from e-social science advocates such as this:
variable A and variable B might both be tagged as indicating the sex of the respondent where sex of the respondent is a well defined concept in a separate classification. If Grid-hosted datasets were to be tagged according to an agreed classification of social science concepts this would make the identification of comparable resources extremely easy.
(Keith Cole et al)

Or this:
work has been undertaken to assert the meaning of Web resources in a common data model (RDF) using consensually agreed ontologies expressed in a common language [...] Efforts have concentrated on the languages and software infrastructure needed for the metadata and ontologies, and these technologies are ready to be adopted.
(Carole Goble and David de Roure; emphasis added)

Statements like these suggest that semantics are being treated as a technical or administrative matter, rather than a problem in its own right; in short, that meaning is being treated as an add-on.

3. Google with Craig

To clarify these reservations, let’s look at a ‘semantic’ success story.
The service, called “Craigslist-GoogleMaps combo site” by its creator, Paul Rademacher, marries the innovative Google Maps interface with the classifieds of Craigslist to produce what is an amazing look into the properties available for rent or purchase in a given area. [...] This is the future….this is exactly the type of thing that the Semantic Web promised
(Joshua Porter)

‘This’ is is an application which calculates the location of properties advertised on the ‘Craigslist’ site and then displays them on a map generated from Google Maps. In other words, it takes two sources of public-domain information and matches them up, automatically and reliably.

That’s certainly intelligent. But it’s also highly specialised, and there are reasons to be sceptical about how far this approach can be generalised. On one hand, the geographical base of the application obviates the issue of granularity. Granularity is the question of the ‘level’ at which an observation is taken: a town, an age cohort, a household, a family, an individual? a longitudinal study, a series of observations, a single survey? These issues are less problematic in a geographical context: in geography, nobody asks what the meaning of ‘is’ is. A parliamentary constituency; a census enumeration district; a health authority area; the distribution area of a free newspaper; a parliamentary constituency (1832 boundaries) — these are different ways of defining space, but they are all reducible to a collection of identifiable physical locations. Matching one to another, as in the CONVERTGRID application (Keith Cole et al) — or mapping any one onto a uniform geographical representation — is a finite and rule-bound task. At this level, geography is a physical rather than a social science.

The issue of trust is also potentially problematic. The Craigslist element of the Rademacher application brings the social element to bear, but does so in a way which minimises the risks of error (unintentional or intentional). There is a twofold verification mechanism at work. On one hand, advertisers — particularly content-heavy advertisers, like those who use the ‘classifieds’ and Craigslist — are motivated to provide a (reasonably) accurate description of what they are offering, and to use terms which match the terms used by would be buyers. On the other hand, offering living space over Craigslist is not like offering video games over eBay: Craigslist users are not likely to rely on the accuracy of listings, but will subject them to in-person verification. In many disciplines, there is no possibility of this kind of ‘real-world’ verification; nor is there necessarily any motivation for a writer to use researchers’ vocabularies, or conform to their standards of accuracy.

In practice, the issues of granularity and trust both pose problems for social science researchers using multiple data sources, as concepts, classifications and units differ between datasets. This is not just an accident that could have been prevented with more careful planning; it is inherent in the nature of social science concepts, which are often inextricably contingent on social practice and cannot unproblematically be recorded as ‘facts’. The broad range covered by a concept like ‘anti-social behaviour’ means that coming up with a single definition would be highly problematic — and would ultimately be counter-productive, as in practice the concept would continue to be used to cover a broad range. On the other hand, concepts such as ‘anti-social behaviour’ cannot simply be discarded, as they are clearly produced within real — and continuing — social practices.

The meaning of a concept like this — and consequently the meaning of a fact such as the recorded incidence of anti-social behaviour — cannot be established by rule-bound or even rule-discovering behaviour. The challenge is to record both social ‘facts’ and the circumstances of their production, tracing recorded data back to its underlying topic area; to the claims and interactions which produced the data; and to the associations and exclusions which were effectively written into it.

4. Even better than the real thing

As an approach to this problem, we propose a repository of content-oriented metadata on social science datasets. The repository will encompass two distinct types of classification. Firstly, those used within the sources themselves; following Barney Glaser, we refer to these as ‘In Vivo Concepts’. Secondly, those brought to the data by researchers (including ourselves); we refer to these as ‘Organising Concepts’. The repository will include:

• relationships between Organising Concepts
‘theft from the person’ is a type of ‘theft’

• associations between In-Vivo Concepts and data sources
the classification of ‘Mugging’ appears in ‘British Crime Survey 2003’

• relationships between In-Vivo Concepts
‘Snatch theft’ is a subtype of the classification of ‘Mugging’

• relationships between Organising Concepts and In-Vivo Concepts
the classification of ‘Snatch theft’ corresponds to the concept of ‘theft from the person’

The combination of these relationships will make it possible to represent, within a database structure, a statement such as

Sources of information on Theft from the person include editions of the British Crime Survey between 1996 and the present; headings under which it is recorded in this source include Snatch theft, which is a subtype of Mugging

The structure of the proposed repository has three significant features. Firstly, while the relationships between concepts are hierarchical, they are also multiple. In English law, the crime of Robbery implies assault (if there is no physical contact, the crime is recorded as Theft). The In-Vivo Concept of Robbery would therefore correspond both to the Organising Concept of Theft from the person and that of Personal violence. Since different sources may share categories but classify them differently, multiple relationships between In-Vivo Concepts will also be supported. Secondly, relationships between concepts will be meaningful: it will be possible to record that two concepts are associated as synonyms or antonyms, for example, as well as recording one as a sub-type of the other. Thirdly, the repository will not be delivered as an immutable finished product, but as an open and extensible framework. We shall investigate ways to enable qualified users to modify both the developed hierarchy of Organising Concepts and the relationships between these and In-Vivo Concepts.

In the context of the earlier discussion of semantic processing and rule-governed behaviour, this repository will demonstrate the ubiquity of rule-interpreting behaviour in the social world by exposing and ‘freezing’ the data which it produces. In other words, the repository will encode shifting patterns of correspondence, equivalence, negation and exclusion, demonstrating how the apparently rule-bound process of constructing meaning is continually determined by ‘shared context’.

The repository will thus expose and map the ways in which social data is structured by patterns of situational information. The extensible and modifiable structure of the repository will facilitate further work along these lines: the further development of the repository will itself be an example of rule-interpreting behaviour. The repository will not — and cannot — provide a seamless technological bridge over the semantic gap; it can and will facilitate the work of bridging the gap, but without substituting for the role of applied human intelligence.