I'm not really a PMLA-hater in the way that so many seem to be. I feel that I've read many a good article in it, and one of the great things about PMLA is that it's ubiquitous and covers a lot of scholarly terrain that I forget that I give a crap about when not reminded from time to time. PMLA is the irregularly-appearing, late-arriving, flimsily bound, comically hyperprofessionalized flagship publication of our field. From where I stand it would be a little hard not to love it.*
But I think PMLA should fold.
Think about it. The MLA is forever passing resolutions, which are well-intentioned, controversial only to a limited subset of nerds, and nonbinding. A statement from the MLA on the ethical treatment of adjuncts is a welcome thing, but it isn't likely hiring practices or people's life situations all that much. I actually think the most awesome thing the MLA has done in recent years is open a Twitter account for Rosemary Feal.** (The MLA: it's friendly!) Prestige only gets you so far. (Please correct me if I'm wrong; I'd love to hear that the MLA made some department start adding tenure lines.)
It's a different matter, though, when it comes to publishing. The MLA does actually put out its own publications, and they're some of the most useful and central to the profession, much as we love to affectionately deride PMLA and compare the MLA Handbook unfavorably to the magisterial CMoS. As slow to update as the MLA Bibliography is, as prone to crashing as the JIL is, where would we be without these things?
The greatest virtue of being the center of the profession is being the center of the profession. Suppose PMLA were to move entirely to open peer review and open access, reconstituting itself as a Public Library of the MLA (on the model of PLoS), perhaps with the help of an NEH grant. Would its prestige disappear? I doubt it. Its reputation for stodginess might, though.
Unlike those resolutions, the move to open peer review and open access would amount to an action, not just a recommendation. Because PMLA comes with a boatload of prestige, it would solve one problem that open peer review projects often face: fear among contributors that their work won't count toward promotion and tenure. It could be a decisive move to open up the acceptance of open peer review in the humanities, and make it easier for other journals to move to such models.
I am, of course, leaving aside the logistical complications, which are legion. And in a way, I understand the value of PMLA behaving like the most conservative journal of all time. It can't really afford to make sudden moves, precisely because it's big and central and purports to represent such a broad range of scholars.*** Any move PMLA makes is likely to be consequential, whether or not it's where we want the profession to be. But that's what drives so many people nuts about it now, and that's why the switch could effect a powerful change in how we handle scholarly publication.
*One of the most regularly made criticisms of PMLA (which Gregory Jusdanis, for example, makes) is that it is insufficiently "cutting-edge" or "revolutionary." I'm already on record as being against innovation, so it will come as no surprise that I think this desideratum is misguided. This topic is worth more than a footnote, but for now I'll just say that, first, newness is not self-evidently a good, and second, that to reserve approval for that scholarship which thoroughly changes our thinking is analogous to a capitalist logic of production, in which only "growth" counts.
**I genuinely think this is awesome.
***I grant that the committee-driven compromise that could easily come out of such a proposal would probably be something like a pilot program that effectively creates two tiers: "real" PMLA and "experimental" (perhaps all-online, perhaps less valuable toward t&p) PMLA. And that would never do.
Wednesday, December 29, 2010
Monday, December 27, 2010
No, seriously, how am I going to squish this down into a twenty-minute talk?
Labels:
MLA,
William Carlos Williams
Sunday, December 19, 2010
A Supposedly Fun Thing: Text-Mining and the Amusement/Knowledge System; or, the Epistemological Sentimentalists
If we could text-mine the internets of the last few days for the correlation between the words "n-gram" and "fun," I'm sure we'd get a nontrivial number. One of the most striking things about the reception of the Google Books Ngrams, largely in the form of the web tool, is the giddy delight with which people have announced how much fun it is. Exhibit A is the bit I quoted yesterday from Patricia Cohen at the New York Times:
The prevalence of this language of play raises two questions.
1. What rhetorical work is this move (calling Google Books Ngrams a fun toy) doing?
2. What experiential dimension of Google Books Ngrams does this rhetorical move describe, and what does it tell us about the tool's epistemic significance?
Answering the first question feeds into answering the second. To call the Google Books Ngrams web tool (henceforth "GBN") a fun toy is to hedge one's bets, to express approval without necessarily venturing into the higher-stakes terrain of approving it as a research method. Any assessment of the tool's epistemic value is channeled through an expression of pleasure (or, as Patricia Cohen and The Awl's Choire Sicha rather interestingly suggest, compulsion). Play can of course be a form of learning, and very important--that's what Dorothea Salo's tweet indicates. But play is a good learning environment precisely because the stakes are low and mistakes can be made safely, as a comment by Bill Flesch suggests: "I played around with it for about half an hour. Now I'm bored." New toy, please! With respect to knowledge, the language of play is deeply ambivalent.
As I read it, the universal declaration of fun that has surrounded the release of GBN is as much about guilt as about pleasure. Those who are compulsively "ngraming," as Sicha so amusingly puts it, are often all too aware of GBN's limitations, which have been blogged extensively, all the way down to what Natalie Binder points out, in her much-retweeted post, has to underlie the whole operation: inevitably imperfect OCR.*
Why does the GBN web tool even exist? Not to advance knowledge, I don't think, or at least not directly, but rather because it's fun. Because it directs interest toward the more substantive element of the project, the downloadable data set that relatively few people are actually going to download.
There are huge problems with using GBN (and throughout I'm alluding to the web tool/toy that everybody is saying is so much fun) as any sort of meaningful index of culture, and everyone knows it. And yet.
I would argue that the universal declaration of fun is a form of confession: I am deriving epistemological satisfaction from this unsound tool, with its built-in Words for Snowism. It's a guilty pleasure, epistemic candy: the sensation of knowledge, lacking in any nutritional value.
But the guilt goes rather deeper than the simple tension between GBN's unreliability for actual research and the "gee whiz!" quality of the graphs: GBN is fun because it is so limited.
That great scholar of nineteenth-century culture, Walter Benjamin, described a mode of writing that he called "information."
As soon as we raise such questions, the graph stops being "understandable in itself," stops being information. Conversely, when you aren't given the choice to sort by genre, how genres are defined necessarily stops being a question. It's the very fact that the toy is a black box and a blunt instrument that makes it feel immediate and incontrovertible and, in that very satisfying way, obvious. We get the epistemic satisfaction of information, and the thing that gives it to us is precisely that information's lack of nuance.
Yesterday I used the word "cheap" to describe the kind of historical narratives GBN suggests. There is indeed a kind of economic dimension to the satisfaction that GBN delivers. Of Oscar Wilde's many quotable lines, I am reminded of this one:
The analogy to sentimentalism is useful not only because it gives us a model for understanding the economy of feeling here, but also because it allows us to recognize that there is an element of feeling in the way that we encounter information. We are likely to find it ethically reprehensible when our emotions or what we believe we know are manipulated. And yet there are times when we want the cheap thrill. Most people I know will freely cop to liking a good emotionally manipulative movie or novel, whether a thriller or a romance or one of those movies where the dog dies. As the fun of ngrams demonstrates, we like a little intellectual manipulation too.
(I know, I know, it doesn't tell you anything conclusively, but...try Foucault versus Habermas!)
What does it mean, this liking it?
I mentioned Bill Brown's term, the "amusement/knowledge system," in my title above because it's another, perhaps more explicit way of describing the close interweaving of knowledge and fun at the end of the nineteenth century that so fascinated Benjamin (208). In my own work I have tried to make a case for taking seriously both the knowledge and the amusement in that system, notably in naturalist fiction, because it's often in such liminal places that the terms of what counts as knowledge are most at stake. Part of the reason experimental literature seems to be here to stay is that the amusement/knowledge system is, too.
The point is not to condemn fun as something that has no place in knowledge--far from it. Fun is central to how we vet knowledge--just think of how important it is that research be "interesting"! It is our highest (and also most common) praise.*** Indeed, play lies at the heart of our most cherished models of intellectual inquiry--a nonutilitarian curiosity to "see what happens." As I quoted Dorothea Salo at the beginning of this post: "THAT, friends, is how one learns."
So condemning fun is not at all on my agenda. Rather, I want to draw attention to the emotional content of the way we talk about knowledge, and to the ambivalence that intellectual "fun" signifies. Ours is an age of "news junkies" (again with the pleasure bordering on unpleasurable compulsion, à la the "addictive" ngrams) and "armchair policy wonks" and people who read voraciously, but only in the proverbial dubiously defined "nonfiction" category. Nate Silver and the Freakonomics dudes are minor celebrities. Lies, damned lies, and statistics are our idea of fun, as powerfully as a Victorian melodrama was ever considered fun. Which means we need to think much more about how fun operates, and why, and what that means for knowledge. And just as crucially: what knowledge means for pleasure.
*In fairness, Ben Schmidt argues that GBN's OCR is pretty accurate, given the state of the field, and also that "No one is in a position to be holier-than-thou about metadata. We all live in a sub-development of glass houses." But there's a big difference between "this is really good, for OCR" and "this degree of accuracy is good enough for supplying evidence for X kinds of claims."
**Taken out of context, Wilde appears here to be describing sentimentalism through an economic metaphor. In fact, it's rather the reverse, or at the very least something more confused than that: most of the surrounding text is taken up with Wilde chastising Douglas for his financial mooching.
***As Sianne Ngai points out, the "interesting," like the language of play, has a hedging quality, bridging epistemological and aesthetic domains.
Benjamin, Walter. "The Storyteller: Observations on the Works of Nikolai Leskov." Trans. Harry Zohn. Selected Writings: Volume 3, 1935-1938. Ed. Howard Eiland and Michael W. Jennings. Cambridge, Mass.: Belknap-Harvard UP, 2002. Print.
Brown, Bill. The Material Unconscious: American Amusement, Stephen Crane, and the Economies of Play. Cambridge, Mass.: Harvard UP, 1996. Print.
Ngai, Sianne. "Merely Interesting." Critical Inquiry 34.4 (Summer 2008): 777-817. Print.
Wilde, Oscar. "To Alfred Douglas." Jan.-Mar. 1897. The Complete Letters of Oscar Wilde. Eds. Merlin Holland and Rupert Hart-David. New York: Henry Holt, 2000. Print.
Previously on text-mining:
Google Books Ngrams and the number of words for "snow"
Dec. 16, 2010
Dec. 14, 2010
Google's automatic writing and the gendering of birds
The intended audience is scholarly, but a simple online tool allows anyone with a computer to plug in a string of up to five words and see a graph that charts the phrase’s use over time — a diversion that can quickly become as addictive as the habit-forming game Angry Birds.But that's just one example--the fun of the Google Books Ngrams tool is almost universally noted. See, for instance, "Fun With Google's Ngram Viewer" (Mother Jones), "Fun with Google NGram Viewer" (WSJ), and "BRB, Can't Stop NGraming" (The Awl). And Dorothea Salo tweets,
What I like about the GBooks n-grams is seeing all kinds of people playing with it. Just playing. THAT, friends, is how one learns.
The prevalence of this language of play raises two questions.
1. What rhetorical work is this move (calling Google Books Ngrams a fun toy) doing?
2. What experiential dimension of Google Books Ngrams does this rhetorical move describe, and what does it tell us about the tool's epistemic significance?
Answering the first question feeds into answering the second. To call the Google Books Ngrams web tool (henceforth "GBN") a fun toy is to hedge one's bets, to express approval without necessarily venturing into the higher-stakes terrain of approving it as a research method. Any assessment of the tool's epistemic value is channeled through an expression of pleasure (or, as Patricia Cohen and The Awl's Choire Sicha rather interestingly suggest, compulsion). Play can of course be a form of learning, and very important--that's what Dorothea Salo's tweet indicates. But play is a good learning environment precisely because the stakes are low and mistakes can be made safely, as a comment by Bill Flesch suggests: "I played around with it for about half an hour. Now I'm bored." New toy, please! With respect to knowledge, the language of play is deeply ambivalent.
As I read it, the universal declaration of fun that has surrounded the release of GBN is as much about guilt as about pleasure. Those who are compulsively "ngraming," as Sicha so amusingly puts it, are often all too aware of GBN's limitations, which have been blogged extensively, all the way down to what Natalie Binder points out, in her much-retweeted post, has to underlie the whole operation: inevitably imperfect OCR.*
Why does the GBN web tool even exist? Not to advance knowledge, I don't think, or at least not directly, but rather because it's fun. Because it directs interest toward the more substantive element of the project, the downloadable data set that relatively few people are actually going to download.
There are huge problems with using GBN (and throughout I'm alluding to the web tool/toy that everybody is saying is so much fun) as any sort of meaningful index of culture, and everyone knows it. And yet.
I would argue that the universal declaration of fun is a form of confession: I am deriving epistemological satisfaction from this unsound tool, with its built-in Words for Snowism. It's a guilty pleasure, epistemic candy: the sensation of knowledge, lacking in any nutritional value.
But the guilt goes rather deeper than the simple tension between GBN's unreliability for actual research and the "gee whiz!" quality of the graphs: GBN is fun because it is so limited.
That great scholar of nineteenth-century culture, Walter Benjamin, described a mode of writing that he called "information."
Villemessant, the founder of Le Figaro, characterized the nature of information in a famous formulation. 'To my readers,' he used to say, 'an attic fire in the Latin Quarter [Paris] is more important than a revolution in Madrid.' This makes strikingly clear that what gets the readiest hearing is no longer intelligence coming from afar, but the information which supplies a handle for what is nearest. Intelligence that came from afar--whether over spatial distance (from foreign countries) or temporal (from tradition)--possessed an authority which gave it validity, even when it was not subject to verification. Information, however, lays claim to prompt verifiability. The prime requirement is that it appear 'understandable in itself.' (147, emphasis added)What GBN delivers is information in this sense. It is near at hand, easy to use, and puts out a nice visualization that appears "understandable in itself." It's easy to deliver, in that way, not unlike a pizza. It's no good to point out, as Mark Davies does, that the Corpus of Historical American English (COHA) allows one to look at specific syntactic forms, or include related words, or track usages by the genre of the source. Such capacities only raise anxieties. (For example, what gets tagged as "nonfiction"? Where, for instance, do autobiographies go? I once, to my astonishment, saw The Autobiography of Alice B. Toklas in the nonfiction section of a book store--along with Three Lives! But I digress.)
As soon as we raise such questions, the graph stops being "understandable in itself," stops being information. Conversely, when you aren't given the choice to sort by genre, how genres are defined necessarily stops being a question. It's the very fact that the toy is a black box and a blunt instrument that makes it feel immediate and incontrovertible and, in that very satisfying way, obvious. We get the epistemic satisfaction of information, and the thing that gives it to us is precisely that information's lack of nuance.
Yesterday I used the word "cheap" to describe the kind of historical narratives GBN suggests. There is indeed a kind of economic dimension to the satisfaction that GBN delivers. Of Oscar Wilde's many quotable lines, I am reminded of this one:
The fact is that you were, and are I suppose still, a typical sentimentalist. For a sentimentalist is simply one who desires to have the luxury of an emotion without paying for it. (768)Feeling, Wilde suggests, has to be earned.** Bracketing the question of whether this is a good description of sentimentalism, it's a good analogue for the epistemic candy of GBN. One receives the apparent solidity of research--the nice graph that summarizes and visualizes what might otherwise be years of labor in the making--without having to have actually done any research. This is only a cheap thrill, "fun," when it is actually cheap--that is, when we don't inquire into how the corpus was prepared, or what effects GBN's case-sensitivity is having on our results.
The analogy to sentimentalism is useful not only because it gives us a model for understanding the economy of feeling here, but also because it allows us to recognize that there is an element of feeling in the way that we encounter information. We are likely to find it ethically reprehensible when our emotions or what we believe we know are manipulated. And yet there are times when we want the cheap thrill. Most people I know will freely cop to liking a good emotionally manipulative movie or novel, whether a thriller or a romance or one of those movies where the dog dies. As the fun of ngrams demonstrates, we like a little intellectual manipulation too.
(I know, I know, it doesn't tell you anything conclusively, but...try Foucault versus Habermas!)
What does it mean, this liking it?
I mentioned Bill Brown's term, the "amusement/knowledge system," in my title above because it's another, perhaps more explicit way of describing the close interweaving of knowledge and fun at the end of the nineteenth century that so fascinated Benjamin (208). In my own work I have tried to make a case for taking seriously both the knowledge and the amusement in that system, notably in naturalist fiction, because it's often in such liminal places that the terms of what counts as knowledge are most at stake. Part of the reason experimental literature seems to be here to stay is that the amusement/knowledge system is, too.
The point is not to condemn fun as something that has no place in knowledge--far from it. Fun is central to how we vet knowledge--just think of how important it is that research be "interesting"! It is our highest (and also most common) praise.*** Indeed, play lies at the heart of our most cherished models of intellectual inquiry--a nonutilitarian curiosity to "see what happens." As I quoted Dorothea Salo at the beginning of this post: "THAT, friends, is how one learns."
So condemning fun is not at all on my agenda. Rather, I want to draw attention to the emotional content of the way we talk about knowledge, and to the ambivalence that intellectual "fun" signifies. Ours is an age of "news junkies" (again with the pleasure bordering on unpleasurable compulsion, à la the "addictive" ngrams) and "armchair policy wonks" and people who read voraciously, but only in the proverbial dubiously defined "nonfiction" category. Nate Silver and the Freakonomics dudes are minor celebrities. Lies, damned lies, and statistics are our idea of fun, as powerfully as a Victorian melodrama was ever considered fun. Which means we need to think much more about how fun operates, and why, and what that means for knowledge. And just as crucially: what knowledge means for pleasure.
*In fairness, Ben Schmidt argues that GBN's OCR is pretty accurate, given the state of the field, and also that "No one is in a position to be holier-than-thou about metadata. We all live in a sub-development of glass houses." But there's a big difference between "this is really good, for OCR" and "this degree of accuracy is good enough for supplying evidence for X kinds of claims."
**Taken out of context, Wilde appears here to be describing sentimentalism through an economic metaphor. In fact, it's rather the reverse, or at the very least something more confused than that: most of the surrounding text is taken up with Wilde chastising Douglas for his financial mooching.
***As Sianne Ngai points out, the "interesting," like the language of play, has a hedging quality, bridging epistemological and aesthetic domains.
Benjamin, Walter. "The Storyteller: Observations on the Works of Nikolai Leskov." Trans. Harry Zohn. Selected Writings: Volume 3, 1935-1938. Ed. Howard Eiland and Michael W. Jennings. Cambridge, Mass.: Belknap-Harvard UP, 2002. Print.
Brown, Bill. The Material Unconscious: American Amusement, Stephen Crane, and the Economies of Play. Cambridge, Mass.: Harvard UP, 1996. Print.
Ngai, Sianne. "Merely Interesting." Critical Inquiry 34.4 (Summer 2008): 777-817. Print.
Wilde, Oscar. "To Alfred Douglas." Jan.-Mar. 1897. The Complete Letters of Oscar Wilde. Eds. Merlin Holland and Rupert Hart-David. New York: Henry Holt, 2000. Print.
Previously on text-mining:
Google Books Ngrams and the number of words for "snow"
Dec. 16, 2010
Dec. 14, 2010
Google's automatic writing and the gendering of birds
Friday, December 17, 2010
Google Books Ngrams and the number of words for "snow"
As I mentioned yesterday, Google has put out a big data set (downloadable) and a handy interface for tracking the incidence of words and phrases. As many have pointed out, one can do a lot more with the raw data set than with the handy, handy online tool, but it's that latter that the New York Times called
I said yesterday that Google Books Ngrams was a lot more sophisticated than Googlefight, and it is. But I'm troubled by the model of cheap history that's presented in the NYT article--as if to suggest that if you want to do cultural studies now, all you need to do is Google (Books Ngram) it:
Let's look at the first of these reports: "With a click you can see that “women,” in comparison with “men,” is rarely mentioned until the early 1970s, when feminism gained a foothold."
The implicit narrative is that nobody even bothered to talk about women until second-wave feminism came along. In fact, if you go by the incidence of the words "men" and "women" in the Google Books Ngrams data set, sure, you might be tempted to really believe that the 1970s was the time "when feminism gained a foothold." I can imagine the suffragists who fought for and won the franchise that I as a woman can enjoy annually asking, "what are we, chopped liver?"
What distinguishes the feminist movements of the 1970s, for the purposes of this data set, is its renewed attention to language. The suffragists wanted a policy change: they wanted the vote (and the freedoms that the vote could give them). The second-wave feminists wanted policy changes too (still working on that wage gap, people!) but they also wanted a deeper change: they wanted to change the way we thought about women and--here's the kicker--spoke about women. The 1970s is when it became broadly recognized as problematic to treat "man" as a synonym for "person," and I suspect that a significant percentage of the uses of "men" were and remain the "universal" usage. That's a nuance that the online Ngrams tool can't give you ("with a click").
Likewise, if you got your understanding of history through Google Books Ngrams, you wouldn't expect to hear this from 1929:
Google Books Ngrams is a fun tool (as everyone keeps pointing out) and, if you download the data set, even a useful one. But it can only get you so far, and uncontextualized, it encourages assumptions that it does not announce. I mention the number of words for "snow" in my title above because it's a famous fallacy--the notion that Inuit has [insert high number here] words for snow, always with the implicit suggestion that having a lot of words for something means that something is extremely important to the culture. Language Log uses this as their go-to example of stupid assertions about language widely believed by the public; it's a cheap Whorfism, claiming broad cultural significance for something incidental. We have a widely accepted term for a magical being that flies by night and runs a clandestine cash-for-baby-teeth operation. That doesn't make it central to American culture. ("Mom, is the Tooth Fairy real?" "Yes! Check Google Books Ngrams if you don't believe me!")
There's a certain Words For Snowism in the online Google Books Ngrams tool, the suggestion that the more frequently a word is used, the more important it is in a collective unconscious of which the Google Books data set serves as a convenient index. This importance is not the same thing as significance, in the sense of significant digits or statistical significance; it's not the difference that makes a difference, but rather a psychologized importance--attachment, cathexis. Which is really kind of garbage.
The web interface is, as my friend Will says, a toy. For the serious scholar, there's much more to be done with ngrams, and one can be careful as well as lazy with the conclusions one draws. But the toy has a "boom! proven with statistics!" quality, a reality-effect that's enormously pleasurable, even, as Patricia Cohen writes for the NYT, "addictive." (That's the point of toys, isn't it?) That's why I'm inclined to agree with Jen Howard, who writes that her "skepticism is mostly directed at how people will use it and what kinds of conclusions they will jump to on dubious evidence." That sort of jumping is practically built into the ngrams tool.
Woolf, Virginia. A Room of One's Own. Annot. and introd. Susan Gubar. 1929; Orlando: Harcourt, 2005. Print.
Previously on text-mining:
Dec. 16, 2010
Dec. 14, 2010
Google's automatic writing and the gendering of birds
a diversion that can quickly become as addictive as the habit-forming game Angry Birds.(I've never heard of Angry Birds, but that's the kind of thing I'm likely to be out of the loop on, so okay.)
I said yesterday that Google Books Ngrams was a lot more sophisticated than Googlefight, and it is. But I'm troubled by the model of cheap history that's presented in the NYT article--as if to suggest that if you want to do cultural studies now, all you need to do is Google (Books Ngram) it:
With a click you can see that “women,” in comparison with “men,” is rarely mentioned until the early 1970s, when feminism gained a foothold. The lines eventually cross paths about 1986.I will concede that newspaper articles are necessarily glib, but it's easy to see how the fallacy that this article promotes would be broadly accepted. The first quoted paragraph above correlates the incidence of words with known historical events; the second moves on to suggest the ngrams' predictive capacity. There's a narrative implicit in each statement of "just the facts," only the assumptions that go into them are effaced.
You can also learn that Mickey Mouse and Marilyn Monroe don’t get nearly as much attention in print as Jimmy Carter; compare the many more references in English than in Chinese to “Tiananmen Square” after 1989; or follow the ascent of “grilling” from the late 1990s until it outpaced “roasting” and “frying” in 2004.
“The goal is to give an 8-year-old the ability to browse cultural trends throughout history, as recorded in books,” said Erez Lieberman Aiden, a junior fellow at the Society of Fellows at Harvard.
Let's look at the first of these reports: "With a click you can see that “women,” in comparison with “men,” is rarely mentioned until the early 1970s, when feminism gained a foothold."
The implicit narrative is that nobody even bothered to talk about women until second-wave feminism came along. In fact, if you go by the incidence of the words "men" and "women" in the Google Books Ngrams data set, sure, you might be tempted to really believe that the 1970s was the time "when feminism gained a foothold." I can imagine the suffragists who fought for and won the franchise that I as a woman can enjoy annually asking, "what are we, chopped liver?"
What distinguishes the feminist movements of the 1970s, for the purposes of this data set, is its renewed attention to language. The suffragists wanted a policy change: they wanted the vote (and the freedoms that the vote could give them). The second-wave feminists wanted policy changes too (still working on that wage gap, people!) but they also wanted a deeper change: they wanted to change the way we thought about women and--here's the kicker--spoke about women. The 1970s is when it became broadly recognized as problematic to treat "man" as a synonym for "person," and I suspect that a significant percentage of the uses of "men" were and remain the "universal" usage. That's a nuance that the online Ngrams tool can't give you ("with a click").
Likewise, if you got your understanding of history through Google Books Ngrams, you wouldn't expect to hear this from 1929:
Have you any notion of how many books are written about women in the course of one year? Have you any notion how many are written by men? Are you aware that you are, perhaps, the most discussed animal in the universe? Here had I come with a notebook and a pencil proposing to spend a morning reading, supposing that at the end of the morning I should have transferred the truth to my notebook. But I should need to be a herd of elephants, I thought, and a wilderness of spiders, desperately referring to the animals that are reputed longest lived and most multitudinously eyed, to cope with all this. I should need claws of steel and beak of brass even to penetrate the husk. How shall I ever find the grains of truth embedded in all this mass of paper, I asked myself, and in despair began running my eye up and down the long list of titles. Even the names of the books gave me food for thought. Sex and its nature might well attract doctors and biologists; but what was surprising and difficult of explanation was the fact that sex--woman, that is to say--also attracts agreeable essayists, light-fingered novelists, young men who have taken the M.A. degree; men who have taken no degree; men who have no apparent qualification save that they are not women. (27)That's Virginia Woolf, of course, giving a fictionalized, subjective encounter with the British Library. Yes, it's a bit longer than a sentence, and you have to read it; you can't just click! But it gives you much more women's history than does the Google Books Ngrams example cited by the NYT.
Google Books Ngrams is a fun tool (as everyone keeps pointing out) and, if you download the data set, even a useful one. But it can only get you so far, and uncontextualized, it encourages assumptions that it does not announce. I mention the number of words for "snow" in my title above because it's a famous fallacy--the notion that Inuit has [insert high number here] words for snow, always with the implicit suggestion that having a lot of words for something means that something is extremely important to the culture. Language Log uses this as their go-to example of stupid assertions about language widely believed by the public; it's a cheap Whorfism, claiming broad cultural significance for something incidental. We have a widely accepted term for a magical being that flies by night and runs a clandestine cash-for-baby-teeth operation. That doesn't make it central to American culture. ("Mom, is the Tooth Fairy real?" "Yes! Check Google Books Ngrams if you don't believe me!")
There's a certain Words For Snowism in the online Google Books Ngrams tool, the suggestion that the more frequently a word is used, the more important it is in a collective unconscious of which the Google Books data set serves as a convenient index. This importance is not the same thing as significance, in the sense of significant digits or statistical significance; it's not the difference that makes a difference, but rather a psychologized importance--attachment, cathexis. Which is really kind of garbage.
The web interface is, as my friend Will says, a toy. For the serious scholar, there's much more to be done with ngrams, and one can be careful as well as lazy with the conclusions one draws. But the toy has a "boom! proven with statistics!" quality, a reality-effect that's enormously pleasurable, even, as Patricia Cohen writes for the NYT, "addictive." (That's the point of toys, isn't it?) That's why I'm inclined to agree with Jen Howard, who writes that her "skepticism is mostly directed at how people will use it and what kinds of conclusions they will jump to on dubious evidence." That sort of jumping is practically built into the ngrams tool.
Woolf, Virginia. A Room of One's Own. Annot. and introd. Susan Gubar. 1929; Orlando: Harcourt, 2005. Print.
Previously on text-mining:
Dec. 16, 2010
Dec. 14, 2010
Google's automatic writing and the gendering of birds
Thursday, December 16, 2010
As a follow-up to the last few posts, I see that Google has just released the Books Ngram Viewer, which is a lot more sophisticated than a Googlefight! Questions remain about what's in the corpus, and given Google Books's well known problems with metadata, I also wonder about the dates. Still, it's a nice thing to have around.
Labels:
new media/old media,
text-mining
Tuesday, December 14, 2010
Given that I'm a text-mining skeptic, it is only fitting that I hope to pursue a small project with Aditi Muralidharan next semester.
For any humanistic question really worth asking, text-mining can never provide an adequate answer. But it can provide supplementary evidence, or provide part of an answer, or point one toward a way of reading. Aditi is working on an interface that will reduce the up-front cost (in time, etc.) of doing text-mining, which, to my mind, will make it more natural for scholars to use quantitative evidence without feeling pressure (due to massive time-investment) to make it the centerpiece of the argument.
I'm still persuaded that text-mining, or any operation that wrings data out of discourse, is an incitement to automatic writing, a way of forcing the body of the text to reveal an unconscious that it didn't know to keep secret. There's something unsporting about it--and something naïvely idealistic, too. The hidden is accorded special powers, its occultism its epistemic guarantee. (After all, what would be the point of using a computer to do something that could easily be done by hand? The whole point is that we're not experiencing it, not actually reading it.) Despite its apparent superficiality, text-mining is a hypnosis to close reading's talking cure.
If we must make our texts hysterical, then, what better questions to ask them than questions about gender circa 1900?
That's what we're going to do.
For any humanistic question really worth asking, text-mining can never provide an adequate answer. But it can provide supplementary evidence, or provide part of an answer, or point one toward a way of reading. Aditi is working on an interface that will reduce the up-front cost (in time, etc.) of doing text-mining, which, to my mind, will make it more natural for scholars to use quantitative evidence without feeling pressure (due to massive time-investment) to make it the centerpiece of the argument.
I'm still persuaded that text-mining, or any operation that wrings data out of discourse, is an incitement to automatic writing, a way of forcing the body of the text to reveal an unconscious that it didn't know to keep secret. There's something unsporting about it--and something naïvely idealistic, too. The hidden is accorded special powers, its occultism its epistemic guarantee. (After all, what would be the point of using a computer to do something that could easily be done by hand? The whole point is that we're not experiencing it, not actually reading it.) Despite its apparent superficiality, text-mining is a hypnosis to close reading's talking cure.
If we must make our texts hysterical, then, what better questions to ask them than questions about gender circa 1900?
That's what we're going to do.
Labels:
new media/old media,
psychoanalysis,
text-mining
Wednesday, December 8, 2010
I think it needs to be said that Sady Doyle is a Writer To Watch (TM). I've been reading her since her blogspot days; she was hilarious then, and she's only gotten better--her voice more mature and confident, less prone to taking refuge in irony--though her irony has always been deftly wielded, too. Sady Doyle's pop culture and lit criticism is smart and infused with a healthy dose of nerd. For every close reading of a Kelly Clarkson song, she has a review of an epistolary novel by Chris Kraus; better yet, she has equally smart and engaging things to say about both. She is a Tolkien to so many bloggers' Lewis (or Rowling). You can tell she's got appendices squirreled away somewhere, and a hand-drawn map. She has English major chops, in the best sense. Her recent series at The Awl is great. Conclusion: if you aren't reading Sady Doyle, you should be.
Labels:
feminism,
new media/old media,
Sady Doyle,
writing
Sunday, December 5, 2010
I was just made aware that there is a movie coming out called How Do You Know (trailer). I would dearly love for this to be an adaptation of "Melanctha," but the odds aren't looking good.
I did notice that there's no question mark in the movie title, though, which is very Stein.
I did notice that there's no question mark in the movie title, though, which is very Stein.
Labels:
film,
Gertrude Stein,
pop culture
Friday, December 3, 2010
Google's automatic writing and the gendering of birds
The almost meaningless faux-text-mining of a Google search on "birdlike woman" and "birdlike man" turns up the following results:
Vanilla Google:
Google Books:
This probably tells us more about Google than about the correlation of gender and the term "birdlike." The hyphen makes a big difference in the search. This particular search also doesn't catch instances like "her movements were quick and birdlike."
I often think it would be interesting to do some small bit of real text-mining, just to have a global look at a corpus, but it's always incidental to the argument, so I never follow up.
The appeal of text-mining, which I think is actually magnified in the Google search, is that it's a kind of automatic writing, in which the body of the text (corpus) is made to give up its latent spirit. That the Google algorithm is unknown except insofar as it is known to maximize ad revenue does not diminish this appeal, the temptation to present Google hits as data. Since so much of our daily information is filtered through the Google algorithm anyway, it serves as a sort of corporate unconscious, whose essence is perhaps more compelling than truth.
The appeal of the Google search in lieu of text-mining is formalized in toys like Googlefight, which simply runs two Google searches at once and visualizes the results:
The bar graph calls on a visual form designed to represent meaningful data; although of course such forms are routinely abused (I particularly enjoy April Winchell's pie charts), the form still invites one to seriously compare the numbers. Yet the tongue-in-cheek cheesy stick-figure animation acknowledges the unseriousness of the Google fight. A Google fight is only good for settling a certain kind of argument, the confrontational flame-war variety that isn't particularly invested in actually solving a problem, not a debate but a "FIGHT." (I tried to get a screen shot of the "FIGHT" title, but I'm just not that quick on the draw, apparently.)
Yet for all that, toys like Google Fight are amusing (try Foucault versus Habermas!) and a little beguiling. I don't have time to prepare a corpus and an algorithm, but I do have three seconds to do a Google search, or make a Wordle.
Such tools get you somewhere; they just don't get you far. It's interesting ("merely" interesting?) that the above word cloud says nothing about birds or nests, and that some of the most prominent words are "know" and "time." But of course not all words are weighted equally in a novel, and it matters that the chapters are titled "Mate-Song," "Mated," "The Nestling," "Wings," etc.--that indeed the whole marriage plot is structured around a bird allegory that disappears in the word cloud. And this may be another reason it's so appealing to let a simple Google search stand in for data, even when its unreliability is universally acknowledged. It gets you somewhere but it doesn't get you far, and in the end this is true of most text-mining, too. In the end we're fascinated by automatic writing, the possibility of forcing the body to secrete a hidden spirit, but we're also agnostic about spirit tout court. A highly sophisticated search with a known margin of error probes an ontological terrain that's suspiciously similar to the corporate unconscious, which we're tempted to say is all phony advertising anyway--or it isn't--one or the other.
Vanilla Google:
"woman" | "man" | ratio "woman"/"man" | |
---|---|---|---|
"birdlike" | 16, 100 | 2, 990 | 5.38 |
"bird-like" | 74,400 | 272,000 | 0.27 |
Google Books:
"woman" | "man" | ratio "woman"/"man" | |
---|---|---|---|
"birdlike" | 1, 520 | 606 | 2.5 |
"bird-like" | 633 | 490 | 1.29 |
This probably tells us more about Google than about the correlation of gender and the term "birdlike." The hyphen makes a big difference in the search. This particular search also doesn't catch instances like "her movements were quick and birdlike."
I often think it would be interesting to do some small bit of real text-mining, just to have a global look at a corpus, but it's always incidental to the argument, so I never follow up.
The appeal of text-mining, which I think is actually magnified in the Google search, is that it's a kind of automatic writing, in which the body of the text (corpus) is made to give up its latent spirit. That the Google algorithm is unknown except insofar as it is known to maximize ad revenue does not diminish this appeal, the temptation to present Google hits as data. Since so much of our daily information is filtered through the Google algorithm anyway, it serves as a sort of corporate unconscious, whose essence is perhaps more compelling than truth.
The appeal of the Google search in lieu of text-mining is formalized in toys like Googlefight, which simply runs two Google searches at once and visualizes the results:
(Source.) |
The bar graph calls on a visual form designed to represent meaningful data; although of course such forms are routinely abused (I particularly enjoy April Winchell's pie charts), the form still invites one to seriously compare the numbers. Yet the tongue-in-cheek cheesy stick-figure animation acknowledges the unseriousness of the Google fight. A Google fight is only good for settling a certain kind of argument, the confrontational flame-war variety that isn't particularly invested in actually solving a problem, not a debate but a "FIGHT." (I tried to get a screen shot of the "FIGHT" title, but I'm just not that quick on the draw, apparently.)
Yet for all that, toys like Google Fight are amusing (try Foucault versus Habermas!) and a little beguiling. I don't have time to prepare a corpus and an algorithm, but I do have three seconds to do a Google search, or make a Wordle.
Word cloud for Beatrice Forbes-Robertson Hale's The Nest-Builder (1916). |
Such tools get you somewhere; they just don't get you far. It's interesting ("merely" interesting?) that the above word cloud says nothing about birds or nests, and that some of the most prominent words are "know" and "time." But of course not all words are weighted equally in a novel, and it matters that the chapters are titled "Mate-Song," "Mated," "The Nestling," "Wings," etc.--that indeed the whole marriage plot is structured around a bird allegory that disappears in the word cloud. And this may be another reason it's so appealing to let a simple Google search stand in for data, even when its unreliability is universally acknowledged. It gets you somewhere but it doesn't get you far, and in the end this is true of most text-mining, too. In the end we're fascinated by automatic writing, the possibility of forcing the body to secrete a hidden spirit, but we're also agnostic about spirit tout court. A highly sophisticated search with a known margin of error probes an ontological terrain that's suspiciously similar to the corporate unconscious, which we're tempted to say is all phony advertising anyway--or it isn't--one or the other.
Thursday, December 2, 2010
Works Cited, remember the ladies edition
Look, I have a blog that's called Works Cited, so I'm going to have to just point out something that's been driving me nuts.
Rei Terada recently posted an insightful reflection on the aims and meaning of WikiLeaks, drawing on some easily available but little-read essays by Julian Assange. There are still zero comments on that blog post. My colleague Aaron Bady (or "Adrian," as Clay Shirky recently accidentally dubbed him in a Twitter shout-out) then wrote a post on WikiLeaks that I feel confident in saying would have been impossible without Terada's post. (Aaron explicitly links her post.)
Aaron's post has received 326 comments and counting and links from the likes of Jon Dresner and Clay Shirky. Granted, a large percentage of those comments come from internet armchair-policy-wonk blowhards, but that comes with the territory. While it makes sense that Aaron's post would initially attract more comments than Terada's--he has a wider readership and posts more regularly than she does--the stark disparity between the attention the two posts are getting strikes me as almost unbelievable. Three hundred and twenty-six versus zero. Without diminishing Aaron's post--it is smart, and bonus points for gratuitous Teddy Roosevelt--Terada's already contains the core insight that makes Aaron's post so interesting, i.e. that Wikileaks is less about revealing secrets for the sake of the specific information involved than about disrupting what Assange calls, with a capaciousness that powerfully reorients the way we understand legitimacy, "conspiracy."
Given these facts, it's hard not to see this as another episode of When A Woman Says It, Crickets; When A Man Says It (Later), Genius!!! There are certainly circumstantial reasons that Aaron's post would get more attention than Terada's, but there's no legitimate reason that her post would be ignored. Every professional woman has had this happen to her, and every time it happens, the ghost of Simone de Beauvoir weeps. Now, I'm sure Terada herself isn't even remotely fussed about this. What would she want with three hundred blowhards commenting on her blog? But whether or not any individual commenter or linker is thinking, "whose substantive post should I read and link, that of a dude or that of a lady? Oh who are we kidding definitely a dude!", the effect is the same: dude gets a signal boost and is credited with genius, lady disappears from the political conversation. Citation is partly about credit, and there's some credit due here.
I'm not holding my breath, though.
-----
UPDATE. I've seen the following objections raised to the above post, and while I probably oughtn't address this gender studies 101 stuff, well, I will -- briefly.
The objections:
1. There are factors besides gender that explain the popularity of Aaron's blog post.
Response: Yes, of course there are. My point (as I explicitly state above) is not that people who read or link to Aaron are making an active choice to ignore one of his sources on the grounds of gender, but rather that this pattern (as Meg brilliantly abbreviates it, WAWSIC;WAMSILG) is pervasive, and that this is objectively speaking an instance thereof. Voilà les crickets. Voilà the attention. This post first of all an attempt to draw more attention to the earlier and very worth-while post by Rei Terada. It is, second, a remark on the attention disparity, and on the broader pattern of failure to attend to what women say that is necessarily its context. This is a gender issue irrespective of whether any individual person is consciously or unconsciously deciding that women aren't worth listening to.
2. One quite often doesn't know the gender of a blogger, and 3. one of the first and most influential bloggers to link Aaron was Digby, a woman (2, 3).
Response: Since my argument was never "people are ignoring Rei Terada purely because she is a woman," these objections aren't quite on point. But it's worth observing that, since femininity is marked and masculinity is unmarked in present culture, the blogger of unknown gender, in the absence of stereotypical feminine markers, is usually (consciously or unconsciously) presumed masculine. Digby is actually the classic example of this. Digby rose to prominence as a left blogger under her pseudonym, and was for years almost universally presumed male, until she "came out" as female by accepting an award in person in 2007. She's understood as female now, but her reputation was made when she was "ungendered," which was received by default as male. Naturally you are enlightened and truly without gender bias in all that you do. But generally speaking the world isn't.
Rei Terada recently posted an insightful reflection on the aims and meaning of WikiLeaks, drawing on some easily available but little-read essays by Julian Assange. There are still zero comments on that blog post. My colleague Aaron Bady (or "Adrian," as Clay Shirky recently accidentally dubbed him in a Twitter shout-out) then wrote a post on WikiLeaks that I feel confident in saying would have been impossible without Terada's post. (Aaron explicitly links her post.)
Aaron's post has received 326 comments and counting and links from the likes of Jon Dresner and Clay Shirky. Granted, a large percentage of those comments come from internet armchair-policy-wonk blowhards, but that comes with the territory. While it makes sense that Aaron's post would initially attract more comments than Terada's--he has a wider readership and posts more regularly than she does--the stark disparity between the attention the two posts are getting strikes me as almost unbelievable. Three hundred and twenty-six versus zero. Without diminishing Aaron's post--it is smart, and bonus points for gratuitous Teddy Roosevelt--Terada's already contains the core insight that makes Aaron's post so interesting, i.e. that Wikileaks is less about revealing secrets for the sake of the specific information involved than about disrupting what Assange calls, with a capaciousness that powerfully reorients the way we understand legitimacy, "conspiracy."
Given these facts, it's hard not to see this as another episode of When A Woman Says It, Crickets; When A Man Says It (Later), Genius!!! There are certainly circumstantial reasons that Aaron's post would get more attention than Terada's, but there's no legitimate reason that her post would be ignored. Every professional woman has had this happen to her, and every time it happens, the ghost of Simone de Beauvoir weeps. Now, I'm sure Terada herself isn't even remotely fussed about this. What would she want with three hundred blowhards commenting on her blog? But whether or not any individual commenter or linker is thinking, "whose substantive post should I read and link, that of a dude or that of a lady? Oh who are we kidding definitely a dude!", the effect is the same: dude gets a signal boost and is credited with genius, lady disappears from the political conversation. Citation is partly about credit, and there's some credit due here.
I'm not holding my breath, though.
-----
UPDATE. I've seen the following objections raised to the above post, and while I probably oughtn't address this gender studies 101 stuff, well, I will -- briefly.
The objections:
1. There are factors besides gender that explain the popularity of Aaron's blog post.
Response: Yes, of course there are. My point (as I explicitly state above) is not that people who read or link to Aaron are making an active choice to ignore one of his sources on the grounds of gender, but rather that this pattern (as Meg brilliantly abbreviates it, WAWSIC;WAMSILG) is pervasive, and that this is objectively speaking an instance thereof. Voilà les crickets. Voilà the attention. This post first of all an attempt to draw more attention to the earlier and very worth-while post by Rei Terada. It is, second, a remark on the attention disparity, and on the broader pattern of failure to attend to what women say that is necessarily its context. This is a gender issue irrespective of whether any individual person is consciously or unconsciously deciding that women aren't worth listening to.
2. One quite often doesn't know the gender of a blogger, and 3. one of the first and most influential bloggers to link Aaron was Digby, a woman (2, 3).
Response: Since my argument was never "people are ignoring Rei Terada purely because she is a woman," these objections aren't quite on point. But it's worth observing that, since femininity is marked and masculinity is unmarked in present culture, the blogger of unknown gender, in the absence of stereotypical feminine markers, is usually (consciously or unconsciously) presumed masculine. Digby is actually the classic example of this. Digby rose to prominence as a left blogger under her pseudonym, and was for years almost universally presumed male, until she "came out" as female by accepting an award in person in 2007. She's understood as female now, but her reputation was made when she was "ungendered," which was received by default as male. Naturally you are enlightened and truly without gender bias in all that you do. But generally speaking the world isn't.
Labels:
Aaron Bady,
feminism,
new media/old media,
Rei Terada,
Wikileaks
Subscribe to:
Posts (Atom)