Friday, December 3, 2010

Google's automatic writing and the gendering of birds

The almost meaningless faux-text-mining of a Google search on "birdlike woman" and "birdlike man" turns up the following results:

Vanilla Google:
"woman""man"ratio "woman"/"man"
"birdlike"16, 1002, 9905.38

Google Books:
"woman""man"ratio "woman"/"man"
"birdlike"1, 5206062.5

This probably tells us more about Google than about the correlation of gender and the term "birdlike." The hyphen makes a big difference in the search. This particular search also doesn't catch instances like "her movements were quick and birdlike."

I often think it would be interesting to do some small bit of real text-mining, just to have a global look at a corpus, but it's always incidental to the argument, so I never follow up.

The appeal of text-mining, which I think is actually magnified in the Google search, is that it's a kind of automatic writing, in which the body of the text (corpus) is made to give up its latent spirit. That the Google algorithm is unknown except insofar as it is known to maximize ad revenue does not diminish this appeal, the temptation to present Google hits as data. Since so much of our daily information is filtered through the Google algorithm anyway, it serves as a sort of corporate unconscious, whose essence is perhaps more compelling than truth.

The appeal of the Google search in lieu of text-mining is formalized in toys like Googlefight, which simply runs two Google searches at once and visualizes the results:


The bar graph calls on a visual form designed to represent meaningful data; although of course such forms are routinely abused (I particularly enjoy April Winchell's pie charts), the form still invites one to seriously compare the numbers. Yet the tongue-in-cheek cheesy stick-figure animation acknowledges the unseriousness of the Google fight. A Google fight is only good for settling a certain kind of argument, the confrontational flame-war variety that isn't particularly invested in actually solving a problem, not a debate but a "FIGHT." (I tried to get a screen shot of the "FIGHT" title, but I'm just not that quick on the draw, apparently.)

Yet for all that, toys like Google Fight are amusing (try Foucault versus Habermas!) and a little beguiling. I don't have time to prepare a corpus and an algorithm, but I do have three seconds to do a Google search, or make a Wordle.

Word cloud for Beatrice Forbes-Robertson Hale's The Nest-Builder (1916).

Such tools get you somewhere; they just don't get you far. It's interesting ("merely" interesting?) that the above word cloud says nothing about birds or nests, and that some of the most prominent words are "know" and "time." But of course not all words are weighted equally in a novel, and it matters that the chapters are titled "Mate-Song," "Mated," "The Nestling," "Wings," etc.--that indeed the whole marriage plot is structured around a bird allegory that disappears in the word cloud. And this may be another reason it's so appealing to let a simple Google search stand in for data, even when its unreliability is universally acknowledged. It gets you somewhere but it doesn't get you far, and in the end this is true of most text-mining, too. In the end we're fascinated by automatic writing, the possibility of forcing the body to secrete a hidden spirit, but we're also agnostic about spirit tout court. A highly sophisticated search with a known margin of error probes an ontological terrain that's suspiciously similar to the corporate unconscious, which we're tempted to say is all phony advertising anyway--or it isn't--one or the other.

