Cloud Street

Monday, March 27, 2006

We are bored in the city

Et la piscine de la rue des Fillettes. Et le commissariat de police de la rue du Rendez-Vous. La clinique médico-chirurgicale et le bureau de placement gratuit du quai des Orfèvres. Les fleurs artificielles de la rue du Soleil. L'hôtel des Caves du Château, le bar de l'Océan et le café du Va et Vient. L'hôtel de l'Epoque.

Et l'étrange statue du Docteur Philippe Pinel, bienfaiteur des aliénés, dans les derniers soirs de l'été. Explorer Paris.

The early situationists, following Chtcheglov's lead, turned urban wandering into a form of political/psychological exploration, a group encounter with the city mediated only by alcohol. At a less exalted level, I've long been fascinated by the kind of odd urban poetry evoked here, in Manchester as much as Paris, and by the changing articulation of city space: established cities are a slow-motion example of Marx's dictum about how we make our lives within conditions we have inherited. So it's easy to see how well this could work:
Socialight lets you put virtual "sticky" notes called StickyShadows anywhere in the real world. Share pictures, notes and more using your cell phone.
But - for all that the site says about restricting access to Groups and Contacts - it's also easy to see how very badly it could work.
* I leave a note for all my friends at the mall to let them know where I'm hanging out. All my friends in the area see it.
* A woman shows all her close friends the tree under which she had her first kiss.
* An entire neighborhood gets together and documents all the unwanted litter they find in an effort to share ownership of a community problem.
* A food-lover uses Socialight to share her thoughts on the amazing vanilla milkshakes at a new shop.
* The neighborhood historian creates her own walking tour for others to follow.
* A group of friends create their own scavenger hunt.
* A tourist takes place-based notes about stores in a shopping district, only for himself, for a time when he returns to the same city.
* A small business places StickyShadows that its customers would be interested in finding.
* A band promotes an upcoming show by leaving a StickyShadow outside the venue.
It was all going so well (although I did wonder why that entire neighbourhood couldn't just pick up the litter) right up to the last two. Advertising - yep, that's just what we all want more of in our urban lives. Lots of nice intrusive advertising.

The worst thing about taking-for-granted that our experiences with the city and each other will be "enriched" by more data, by more information, by making the invisible visible, etc., is that we never have to account for or be accountable to how.
More specifically, there's a huge difference between enabling conversation and enabling people to be informed - in other words, between talking-with and being-talked-at. Social software is all about conversation - about enabling people to talk together. Moreover, any conversation is defined as much by what it shuts out as what it includes; it's hard to listen to the people you want to talk with when you're being talked at. Even setting aside the information-overload potential of all those overlapping groups (do I need to know where so-and-so had her first kiss? do I need to know now?), it's clear that Socialight is trying to serve two ends which are not only incompatible but opposed - and only one of which pays money. Which is probably why, even though the technology is still in beta, I already feel that using it constructively would be going against the grain.

Friday, March 03, 2006

Cloudbuilding (2)

Here's a problem I ran into, halfway through building my first ontology, and some thoughts on what the solution might be.

Question 47 of the Mixmag survey reads:

Have you ever had an instance[sic] where your drug use caused you to:
Get arrested?
Lose a job?
Fail an exam?
Crash a car/bike?
Be kicked out of a club?

What this tells us is that one of the things the Mixmag questionnaire is 'about' - one of the in vivo concepts (or groups of in vivo concepts) that we need to record - is misadventures consequent on drug use. The question is how we define this concept logically - and this isn't just an abstract question, as the way that we define it will affect how people can access the information. There are three main possibilities.

1. Model the world
We could say that to have a job is to be a party to a contract of employment, which is a type of agreement between two parties, which is agreed on a set occasion and covers a set timespan. Hence to lose a job is to cease to be a party to a previously-agreed contract of employment; this may occur as a consequence of drug use (defined, in the Mixmag context, as the use of a psychoactive substance other than alcohol and tobacco).

This is all highly logical and would make it explicit that the Mixmag data contains some information on terminations of contracts of employment (as well as on drug-related stuff). However, the Mixmag survey isn't actually about contracts of employment, and doesn't mandate the definitional assumptions I made above. So this isn't really legitimate. (It would also be incredibly laborious, particularly when we turn our attention away from the relatively succinct Mixmag survey and look at more typical social survey data: surveys of physical capacity, for example, routinely ask people whether they can (a) walk to the shops (b) walk to the Post Office (c) walk to the nearest bus stop, and so on down to (j) or (k). All, in theory, capable of being modelled logically - but perhaps only in theory.)

2. Stick to the theme
Alternatively, we could begin by taking a view as to the key concepts which a data source is about - in this case, psychoactive consumption, feelings about psychoactive consumption, consequences of psychoactive consumption, and sexual behaviour - and draw the line at anything beyond those concepts. On this assumption the fact that the survey covers misadventures consequent on drug use would be within scope, but the list of misadventures given above wouldn't be: that's part of the data that researchers will find when they look at the data source itself, not part of the conceptual 'catalogue' that we're building. The advantage of this is that it's conceptually very 'clean' and makes it that much clearer what a source is about; the disadvantage is obviously that it cuts off some ways in to the data and hides some information.

3. Include black boxes
What I've got at the moment - following the principle of using the definitions supplied by the source - is an ontology in which some concepts are defined and others are undefined (black boxes). For instance, I've got a concept of Job loss, but all that OWL 'knows' about it is that it's a type of Misadventure (which may be consequent on drug use) - which is in turn a type of Life event, (which is a type of event that happens to one person). This would allow anyone searching for events consequent on drug use to get to job loss as a type of misadventure, but wouldn't let them get to drug-related misadventure from job loss - unless they happened to enter the exact name of the 'job loss' concept. I'm coming to believe that this is unsatisfactory: we should define the model in terms of what a data source is about. This means that we've got to either take a narrow, domain-specific view or take the view that each source gives us one piece of a much larger picture - in which case we're inevitably committed to modelling the world. But the 'black box' option isn't really sustainable.

Cloudbuilding (1)

This one's about work.

I'm currently documenting the concepts underlying the 2005 Mixmag Drug Survey using Protege. Here's why:

The documentation of social science datasets on a conceptual level, so as to make multiple datasets comprehensible within a shared conceptual framework, is inherently problematic: the concepts on which the data of the social sciences are constructed are imprecise, contested and mutable, with key concepts defined differently by different sources. When a major survey release is published, for example, the accompanying metadata often includes not only a definition of key terms, but discussion of how and why the definitions have changed since the previous release. This information is of crucial importance to the social scientist, both as a framework for understanding statistical data and as a body of social data in its own right.

It follows that we cannot think in terms of ironing out inconsistencies between social science datasets and resolving ambiguities. Rather, documenting the datasets must include documenting the definitions of the conceptual framework on which the datasets are built, however imprecise or inappropriate these concepts might appear in retrospect. This will also involve preserving - and exposing - the variations between different sources, or successive releases from a single source.

There are currently two main approaches to conceptually-oriented data documentation. A ‘top down’ approach is exemplified by the European Language Social Sciences Thesaurus (ELSST). The Madiera portal allows researchers to explore ELSST and access European survey data which has been linked to ELSST keywords. The limitations of the top-down approach can be gauged from ELSST’s concepts relating to drug use. Drug Abuse, Drug Addiction, Illegal Drugs and Drug Effects are all 'leaf' concepts - headings which have no subheadings under them. However, they are in different parts of the overall ELSST tree: for example, Drug Abuse is under Social Problems->Abuse, while Drug Effects is under Biology->Pharmacology. Although the hierarchy is augmented by a list of 'related' concepts, to some extent facilitating horizontal as well as vertical navigation, the hierarchy inevitably makes some types of enquiry easier than others. Anyone using the ELSST 'tree' will be visually reminded of the affinities identified by ELSST’s authors between Pharmacology and Physiology, or between Drug Abuse and Child Abuse. These problems follow from the initial design choice of a single conceptual hierarchy.

This approach to classification has recently come under criticism. Advocates of 'bottom-up' approaches argue that top-down taxonomies like the Dewey Decimal System or ELSST are an artificial imposition on the world of knowledge, which is better represented as a set of individual acts of labelling or ‘tagging’. It is argued that the 'trees' of hierarchical taxonomies can be replaced with a pile of 'leaves'.

One successful 'bottom-up' approach is the framework for documenting survey data developed by the Data Documentation Initiative (DDI). The DDI standard makes it possible to search on keywords associated with surveys, sections of surveys and individual questions; the short text of individual questions is also searchable. Searches of DDI metadata can also be run from the Madiera portal: a search on ‘marijuana’, for instance, brings back short text items including the following:

- Health Behaviour in School-Aged Children (Switzerland, 1990)

Smoking cannabis should be legal? Q2.31
- Scottish Social Attitudes Survey (Scotland, 2001)

- Eurobarometer 37.0 (EU-wide, 1992)

Clearly, this way in to the data makes it easy for a well-prepared researcher to track the use of particular concepts 'in the wild' (in vivo concepts). However, this gain comes at the cost of some information. There is wide variation both in the terminology used in the surveys and in the concepts to which they refer. In one survey smoking cannabis might be a type of petty crime; in others it might figure as a type of leisure activity or a potential health risk. These conceptual differences are reflected in the vocabulary used by data sources - and by researchers. Depending on context, three researchers using 'marijuana', 'hashish' and 'cannabis' as search terms may be asking for the same data or for three different sets of data.

Neither the 'top-down' nor the 'bottom-up' approach articulates the conceptual assumptions which underlie the construction of a dataset - assumptions expressed both in the definition of in vivo concepts and in relationships between them. Rather than leaving much of this conceptual information undocumented (the DDI approach) or encoding one 'correct' set of assumptions while excluding or sidelining others (the ELSST approach), we propose to offer a coherent hierarchy of in vivo concepts for each individual source, based on the definitions (explicit and implicit) used in each source. Comparing the in vivo conceptual hierarchies used in multiple datasets will enable researchers both to see where concepts are directly comparable and to see where - and how - their definitions diverge and overlap.

To document hierarchies of in vivo concepts, we shall use description logic and the Semantic Web language OWL-DL (Web Ontology Language - Description Logic). OWL-DL makes it possible to formulate a precise logical specification of concepts such as

- use of cannabis (either marijuana or hashish) in the month prior to the survey
- use of either Valium or temazepam, at any time
- seizures of Class A drugs by HM Customs in the financial year 2004/5

At least, that's the idea. Now wait for part 2...

Thursday, March 02, 2006

Nor mine, now

I nearly installed Hyperwords this morning; the only reason I didn't is that I haven't moved to Firefox 1.5 yet (and don't intend to until I'm confident it won't break any of the extensions I'm already using). And, in principle, it looks great:
With the Hyperwords Firefox Extension installed just select any text and a menu appears. You can search major search engines, look things up in reference sites, check dictionary definitions, translate, email quickly and much more.
So why does the thought of actually using it give me the creeps? Alex is similarly ambivalent:
In principle, it's a handy tool. But I would have to overcome a few personal adoption barriers before I started using it on a regular basis. As a consumer, I can see the appeal of opening up texts to interact with the rest of the Web; but as a writer, I instinctively bristle at the idea of giving up that kind of control. I suspect that disposition colors the way I read things on the Web; I like my documents to feel fixed, not fluid. And the Web feels squishy enough as it is. That, and somehow the premise of cracking open someone else's document with a toolbox of Web services feels like a kind of violation. This is undoubtedly my own personal neurotic hangup.
Well, if it is, it's mine too. Mark Bernstein gets some of it:
In the very early days of hypertext research, people worried a lot about hand-crafted links. "How will we ever afford to put in all those links?" We also worried about how we'd ever manage to afford to digitize stuff for the Web, not to mention paying people to create original Web pages. Overnight, we discovered that we'd got the sign wrong: people would pay for the privilege of making Web sites. The problem isn't the 'tyranny' of the links, and replacing it with the tyranny of the link server might not be a great solution.
Authors don't offer navigation options to be "useful"; thoughtful writers use links to express ideas. Argumentation seeks understanding, not merely access.
Let's put some of that together: cracking open someone else's document with a toolbox of Web services; the tyranny of the link server; thoughtful writers use links to express ideas. In other words, Hyperwords doesn't extend existing hyperlink practice but undermines it. In the Hyperwords world you'll no longer read a document, you'll mine it for information - or rather, mine it for jumping-off points for retrieving information from authoritative sources. (Or retrieving whatever other stuff you may want to retrieve.)

Alex mentioned Xanadu, but I don't think Hyperwords is a step in that direction. If anything, it's a step backwards. (One of Xanadu's key words is "author-based".) Hyperlinks and the Web of dialogic, socially-produced content go together just fine; as Mark says, mass amateurism is already providing an answer to the question of where all those links are going to come from. It's messy and incomplete, but it's here - and it's, well, ours (as a writer, I instinctively bristle at the idea of giving up that kind of control). You can see two visions of the Web here: the mass amateurisation of writing as against the 'consumer'-oriented, authority-led, broadcast Web. Hyperwords ostensibly enhances horizontal, transverse linkage, but its effect would be to pull the Web further towards broadcast mode - albeit an 'empowered', roll-your-own broadcast mode.

Can't keep quiet for long - I'm a human being!
Can't help singing this song - I'm a human being!
You won't listen to me,
I'm not an authority...

- Steve Mason, "Eclipse"