I just got a university-wide email regarding a pilot project that Harvard is starting with Google. It looks like Google will also be joining with other universities in this project, which will begin the work of digitizing, and in the case of public domain works providing public access to, the contents of the Harvard library system. The email included a short summary of the initial pilot and didn’t ask me to keep this confidential so I will reproduce the description of the project below:
Harvard University is embarking on a collaboration with Google that could harness Google’s search technology to provide to both the Harvard community and the larger public a revolutionary new information location tool to find materials available in libraries. In the coming months, Google will collaborate with Harvard’s libraries on a pilot project to digitize a substantial number of the 15 million volumes held in the University’s extensive library system. Google will provide online access to the full text of those works that are in the public domain. In related agreements, Google will launch similar projects with Oxford, Stanford, the University of Michigan, and the New York Public Library. As of 9 am on December 14, an FAQ detailing the Harvard pilot program with Google will be available at http://hul.harvard.edu.
The Harvard pilot will provide the information and experience on which the University can base a decision to launch a large-scale digitization program. Any such decision will reflect the fact that Harvard’s library holdings are among the University’s core assets, that the magnitude of those holdings is unique among university libraries anywhere in the world, and that the stewardship of these holdings is of paramount importance. If the pilot is deemed successful, Harvard will explore a long-term program with Google through which the vast majority of the University’s library books would be digitized and included in Google’s searchable database. Google will bear the direct costs of digitization in the pilot project.
By combining the skills and library collections of Harvard University with the innovative search skills and capacity of Google, a long-term program has the potential to create an important public good. According to Harvard President Lawrence H. Summers, “Harvard has the greatest university library in the world. If this experiment is successful, we have the potential to provide the world’s greatest system for dissemination as well.”
In addition, there would be special benefits to the Harvard community. Plans call for the eventual development of a link allowing Google users at Harvard to connect directly to the online HOLLIS (Harvard Online Library Information System) catalog (http://holliscatalog.harvard.edu) for information on the location and availability at Harvard of works identified through a Google search. This would merge the search capacity of the Internet with the deep research collections at Harvard into one seamless resource-a development especially important for undergraduates who often see the library and the Internet as alternative and perhaps rival sources of information.
Eventually, Harvard users would benefit from far better access to the 5 million books located at the Harvard Depository (HD). If the University undertakes the long-term program, Harvard users would gain online access to the full text of out-of-copyright books stored at HD. For books still in copyright, Harvard users could gain the ability to search for small snippets of text and, possibly, to view tables of contents. In short, the Harvard student or faculty member would gain some of the advantages of browsing that remote storage of books at HD cannot currently provide.
According to Sidney Verba, Carl H. Pforzheimer University Professor and Director of the University Library, “The possibility of a large-scale digitization of Harvard’s library books does not in any way diminish the University’s commitment to the collection and preservation of books as physical objects. The digital copy will not be a substitute for the books themselves. We will continue actively to acquire materials in all formats and we will continue to conserve them. In fact, as part of the pilot we are developing criteria for identifying books that are too fragile for digitizing and for selecting them out of the project.
“It is clear,” Verba continued, “that the new century presents unparalleled challenges and opportunities to Harvard’s libraries. Our pilot program with Google can prove to be a vital and revealing first step in a lengthy and rewarding process that will benefit generations of scholars and others.”
When Harvard or Google make their official announcement, I’ll link to whatever I can find. I personally think this is really big news. I hope that after other major universities join this movement the Library of Congress will follow. Of course, I’m not happy about Google having a monopoly on this sort of thing but I suspect that these universities will not consent to any kind of exclusive licensing and we will see competing services emerge soon. This is a great day for scholarship and, in my opinion, for democratizing access to knowledge.
UPDATES: 21:00 – The first hit for this on Google News is a KOTV article about this here which just got posted. I’m guessing this will be big news tomorrow. In the article Harvard comes out as one of the least cooperative of the libraries involved in the project. 22:00 – The New York times has just now posted an article on this. Notice the article is dated December 14th so it is probably in tomorrow’s print edition. The article also had this to add:
Last night the Library of Congress and a group of international libraries from the United States, Canada, Egypt, China and the Netherlands announced a plan to create a publicly available digital archive of one million books on the Internet. The group said it planned to have 70,000 volumes online by next April.
It looks like Harvard is only allowing some 40,000 volumes to start and is being very protective about its collection. I don’t believe for a second Harvard President Summer’s quote in the NYT article saying that Harvard has always held its library to be a “global resource” especially when, judging only from these initial articles, it seems like they are one of the least enthusiastic participants in this new project. Michigan and Stanford are leading the way by committing millions of books in their collection. I’m really excited about this, I really hope the movement will spread quickly to Japan’s National Diet Library, and countless of libraries around the world. This is truly an exciting time!
Does the digitization mean optical text recognition, or just digital photographs of each page? If it is optical text recognition, how progressed is this technology in different languages?
Good question, but I think there will definitely be text recognition of some of the works since they are to be searchable. This is probably harder for old works or those in bad condition.
I don’t know much about this project yet though, so I look forward to hearing more…
I am wondering how this will interface with the Gutenberg Project?