Wednesday, December 22, 2010

Google Words

Last week The Chronicle reported on a paper published in Science, Quantitative Analysis of Culture Using Millions of Digitized Books,” which analyzed some 360 billion English words in the five million books digitized by Google (“about 4% of all books ever printed”). From the abstract: 
Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. "Culturomics" extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
Geoffrey Nunberg writes
Whatever misgivings scholars may have about the larger enterprise, the data will be a lot of fun to play around with. And for some—especially students, I imagine—it will be a kind of gateway drug that leads to more-serious involvement in quantitative research.
Access to the data is available from,

Speaking of data visualization, there is also Google fusion tables

Wednesday, December 15, 2010

Digital Forensics

CLIR has released a report called Digital Forensics and Born-Digital Content in Cultural Heritage Collections. From the introduction:
The purpose of this report is twofold: first, to introduce the field of digital forensics to professionals in the cultural heritage sector; and second, to explore some particular points of convergence between the interests of those charged with collecting and maintaining borndigital cultural heritage materials and those charged with collecting and maintaining legal evidence. ...
There are deep historical connections between the emergence of archival science and the Roman law of antiquity, founded on concepts such as chain of custody. (The forensics of modern evidentiary standards is etymologically rooted in the forensics of verbal disputation—“forensics” comes from the Latin forensis, “before the forum.”)

Monday, December 13, 2010

The OED on Information

From “The Information Palace” (NYRblog): 
The earliest citation comes from the Rolls of Parliament for 1386: “Thanne were such proclamacions made ... bi suggestion & informacion of suche that wolde nought her falsnesse had be knowen to owre lige Lorde.” For centuries thereafter, informations were filed, or recorded, or laid, against people.
From then to now the word takes a twisty path, and the OED‘s lexicographers hold our hand around every corner. Information can be “a teaching; an instruction.” It can be “divine influence or direction; inspiration, esp. through the Holy spirit.” It can be “that of which one is apprised or told; intelligence, news.”
Ever lurking behind the arras is the ancient Latin precursor: the verb informare—to give form to; to shape; to mold. Information is the act of infusion with form. ...

Friday, December 3, 2010

Cartographic Realty Versus Reality

I’ve been researching the role of books in the history of the Pacific Northwest (the first byproduct of that work is here), so I enjoyed reading in the December issue of Fine Books & Collections an article about the search for the Northwest Passage. From “Wishful Thinking”: 
Perhaps the strangest and most imaginative maps showing a Northwest Passage were those based on the apocryphal 1640 voyage of the Spanish admiral Bartholemew de Fonte. In a letter published in a 1708 edition of the British magazine The Monthly Miscellany or Memoirs for the Curious, de Fonte claimed to have sailed up the Pacific coast of the Americas. Somewhere north of Vancouver Island he found a strait that led to a great inland sea where he met a merchant ship from Boston. Although it is now believed that the magazine’s editor wrote the piece, the great English proponent of the passage, Arthur Dobbs, took the article to be genuine and gave it credibility when he included the de Fonte expedition in his An account of the countries adjoining to Hudson’s Bay in the north-west part of America (1744).
The de Fonte ‘discovery’ remained a cartographic realty for more than half a century and entrapped some of the brightest and most prominent mapmakers …

Thursday, December 2, 2010

The Most Important Components of Information Management: Attention and Judgment

From Ann Blair’s “Information Overload, Then and Now” (The Chronicle): 
Complaints about "too many books" echo across the centuries, from when books were papyrus rolls, parchment manuscripts, or hand printed. … 
Early negative responses include Ecclesiastes 12:12 ("Of making books there is no end," probably from the fourth or third century BC) and Seneca's "distringit librorum multitudo" ("the abundance of books is distraction," first century AD). But we also find enthusiasm for accumulation—of papyri at the Library of Alexandria (founded in the early third century BC) or of the 20,000 "facts" that Pliny the Elder accumulated in Historia naturalis (completed in AD 77). Though we no longer care especially about ancient precedent, we hear the same doom and praise today.
Blair discusses the long history of text management practices—reusable wax tablets, florilegia, alphabetical indices, text divisions, commonplaces, bibliographies, compendia, periodicals, books reviews, dictionaries, and encyclopedias—and wonders what is at risk as academic scholarship moves to electronic media. She identifies three areas of concern:

  1. Storing: “computers preserve only what has been upgraded to match their ever-changing specifications. Documents without anyone interested in using them and upgrading them to new platforms may become inaccessible.”
  2. Sorting: “search engines can track the keywords chosen by individual users and writers, but we still need library catalogers and indexers who can identify relevant category terms that do not appear explicitly in the text and who can group related topics under consistent subject headings.”
  3. Selecting and summarizing: “making and using shortcuts skillfully and responsibly requires judgment.”
Blair concludes:
we need to proceed carefully in the transition to electronic media, lest we lose crucial methods of working that rely on and foster thoughtful decision making. Like generations before us, we need all the tools for gathering and assessing information that we can muster—some inherited from the past, others new to the present. Many of our technologies will no doubt rapidly seem obsolete, but, we can hope, not human attention and judgment, which should continue to be the central components of thoughtful information management.

Tuesday, November 30, 2010

Fishy Records

The authors of “Coding Early Naturalists' Accounts into Long-Term Fish Community Changes in the Adriatic Sea (1800–2000)” (PLoS ONE) highlight the value of historical qualitative sources for fishery science.

Wednesday, November 24, 2010

All This, and Libraries Too

From “On Gratitude in Academe” (The Chronicle):
Libraries and librarians: Our colleagues who are information professionals provide us with the scholarly resources we need for our research and teaching, and they do so with minimal recognition and considerable pressure to adapt to rapidly changing technologies. While the Internet has been a boon to scholarly research, the physical library is—more than stadiums, more than student centers—the heart of the academic enterprise: It's a place for solitary reflection as well as serendipitous encounters in the context of intellectual seriousness. Nothing can replace libraries as places, even if they are no longer primarily based on the circulation of printed materials.
On the historical dimension of our national day of gratitude, see “Peace, Love and Puritanism” (The New York Times):
Are our present-day values and practices aligned with the historical record, or have they been remade by our consumer culture? Is anything authentic in our own celebrations of Thanksgiving?

Friday, November 19, 2010

For Friday

Good hats, jolly fellows, and speechifiers—some great book reviews over at Common-place (11:1.5).

Now I’m off, leaving my computer to itself …

Wednesday, November 17, 2010

Data: The Next Big Idea in the Humanities

An interesting exemplar of a digital humanities project is “Reading: Harvard Views of Readers, Readership, and Reading History,” an exploration of the history of reading through historical materials in Harvard’s libraries.

Saturday, November 13, 2010

Diary Riddle

Last week I picked up for our collection the diary of Northwest historian W. D. Lyman. The first volume opens with this: “The year 1884 dawns chill and dark upon the world.” In the last two entries, from 1920, Lyman records that has become professor emeritus, that his pension is in place, and that he feels rotten. Lastly, he writes: “This period of my history marks another stage of my life. It may have some fine opportunities.” He died two days later.

The future of his history is now in our archives, waiting for a new kind of life (although not of the sort He-Who-Must-Not-Be-Named got through his fifty-year-old diary). 

Friday, November 12, 2010

The Real Purpose of a Library

From “Problematizing Patron-Driven Acquisitions” (Library Journal):
a library is more than a shopping site built to satisfy immediate patron needs. A well-chosen collection is a cartography of knowledge that helps guide the novice researcher toward books that they would never think to ask for. … Umberto Eco, who argued for library coffee shops decades before they became trendy, said at the opening of a new library in Milan that "the whole idea of a library is based on a misunderstanding: that the reader goes into the library to find a book whose title he knows." Its real purpose, he said, "is to discover books of whose existence the reader has no idea."
There is also the importance of developing collections for patrons yet unborn.