Wednesday, December 22, 2010

Google Words

Last week The Chronicle reported on a paper published in Science, Quantitative Analysis of Culture Using Millions of Digitized Books,” which analyzed some 360 billion English words in the five million books digitized by Google (“about 4% of all books ever printed”). From the abstract: 
Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. "Culturomics" extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
Geoffrey Nunberg writes
Whatever misgivings scholars may have about the larger enterprise, the data will be a lot of fun to play around with. And for some—especially students, I imagine—it will be a kind of gateway drug that leads to more-serious involvement in quantitative research.
Access to the data is available from Ngrams.GoogleLabs.com,


Speaking of data visualization, there is also Google fusion tables