Wednesday, December 22, 2010

Google Words

Last week The Chronicle reported on a paper published in Science, Quantitative Analysis of Culture Using Millions of Digitized Books,” which analyzed some 360 billion English words in the five million books digitized by Google (“about 4% of all books ever printed”). From the abstract: 
Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. "Culturomics" extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
Geoffrey Nunberg writes
Whatever misgivings scholars may have about the larger enterprise, the data will be a lot of fun to play around with. And for some—especially students, I imagine—it will be a kind of gateway drug that leads to more-serious involvement in quantitative research.
Access to the data is available from,

Speaking of data visualization, there is also Google fusion tables

Wednesday, December 15, 2010

Digital Forensics

CLIR has released a report called Digital Forensics and Born-Digital Content in Cultural Heritage Collections. From the introduction:
The purpose of this report is twofold: first, to introduce the field of digital forensics to professionals in the cultural heritage sector; and second, to explore some particular points of convergence between the interests of those charged with collecting and maintaining borndigital cultural heritage materials and those charged with collecting and maintaining legal evidence. ...
There are deep historical connections between the emergence of archival science and the Roman law of antiquity, founded on concepts such as chain of custody. (The forensics of modern evidentiary standards is etymologically rooted in the forensics of verbal disputation—“forensics” comes from the Latin forensis, “before the forum.”)

Monday, December 13, 2010

The OED on Information

From “The Information Palace” (NYRblog): 
The earliest citation comes from the Rolls of Parliament for 1386: “Thanne were such proclamacions made ... bi suggestion & informacion of suche that wolde nought her falsnesse had be knowen to owre lige Lorde.” For centuries thereafter, informations were filed, or recorded, or laid, against people.
From then to now the word takes a twisty path, and the OED‘s lexicographers hold our hand around every corner. Information can be “a teaching; an instruction.” It can be “divine influence or direction; inspiration, esp. through the Holy spirit.” It can be “that of which one is apprised or told; intelligence, news.”
Ever lurking behind the arras is the ancient Latin precursor: the verb informare—to give form to; to shape; to mold. Information is the act of infusion with form. ...

Friday, December 3, 2010

Cartographic Realty Versus Reality

I’ve been researching the role of books in the history of the Pacific Northwest (the first byproduct of that work is here), so I enjoyed reading in the December issue of Fine Books & Collections an article about the search for the Northwest Passage. From “Wishful Thinking”: 
Perhaps the strangest and most imaginative maps showing a Northwest Passage were those based on the apocryphal 1640 voyage of the Spanish admiral Bartholemew de Fonte. In a letter published in a 1708 edition of the British magazine The Monthly Miscellany or Memoirs for the Curious, de Fonte claimed to have sailed up the Pacific coast of the Americas. Somewhere north of Vancouver Island he found a strait that led to a great inland sea where he met a merchant ship from Boston. Although it is now believed that the magazine’s editor wrote the piece, the great English proponent of the passage, Arthur Dobbs, took the article to be genuine and gave it credibility when he included the de Fonte expedition in his An account of the countries adjoining to Hudson’s Bay in the north-west part of America (1744).
The de Fonte ‘discovery’ remained a cartographic realty for more than half a century and entrapped some of the brightest and most prominent mapmakers …

Thursday, December 2, 2010

The Most Important Components of Information Management: Attention and Judgment

From Ann Blair’s “Information Overload, Then and Now” (The Chronicle): 
Complaints about "too many books" echo across the centuries, from when books were papyrus rolls, parchment manuscripts, or hand printed. … 
Early negative responses include Ecclesiastes 12:12 ("Of making books there is no end," probably from the fourth or third century BC) and Seneca's "distringit librorum multitudo" ("the abundance of books is distraction," first century AD). But we also find enthusiasm for accumulation—of papyri at the Library of Alexandria (founded in the early third century BC) or of the 20,000 "facts" that Pliny the Elder accumulated in Historia naturalis (completed in AD 77). Though we no longer care especially about ancient precedent, we hear the same doom and praise today.
Blair discusses the long history of text management practices—reusable wax tablets, florilegia, alphabetical indices, text divisions, commonplaces, bibliographies, compendia, periodicals, books reviews, dictionaries, and encyclopedias—and wonders what is at risk as academic scholarship moves to electronic media. She identifies three areas of concern:

  1. Storing: “computers preserve only what has been upgraded to match their ever-changing specifications. Documents without anyone interested in using them and upgrading them to new platforms may become inaccessible.”
  2. Sorting: “search engines can track the keywords chosen by individual users and writers, but we still need library catalogers and indexers who can identify relevant category terms that do not appear explicitly in the text and who can group related topics under consistent subject headings.”
  3. Selecting and summarizing: “making and using shortcuts skillfully and responsibly requires judgment.”
Blair concludes:
we need to proceed carefully in the transition to electronic media, lest we lose crucial methods of working that rely on and foster thoughtful decision making. Like generations before us, we need all the tools for gathering and assessing information that we can muster—some inherited from the past, others new to the present. Many of our technologies will no doubt rapidly seem obsolete, but, we can hope, not human attention and judgment, which should continue to be the central components of thoughtful information management.