Published February 17, 2006 | https://doi.org/10.63485/mqww6-aa739

Google's book scans are not of archival quality

Creators & Contributors

Jim Jacobs, Thoughts on Google Book Search, Diglet, February 16, 2006. Excerpt:

Yesterday, I went to the Stanford EE Computer Systems Colloquium to hear Daniel Clancy, the Engineering Director for the Google Book Search Project....Clancy mentioned that Google was NOT going for archival quality (indeed COULD not) in their scans and were ok with skipped pages, missing content and less than perfect OCR -- he mentioned that the OCR process AVERAGED one word error per page of every book scanned!. The key point that I took away from this is that Google book project IS NOT an alternative to library/archive/archival/preservation scans....When I asked if there would be links to libraries on ALL results pages, he hemmed and hawed a bit and wouldn't say one way or the other. He mentioned about the difference between the publisher-supplied content and the library-supplied content and seemed to hint that the publisher-supplied content is subject to stricter licensing agreements....92% of the world's books are not generating revenues for copyright holders or publishers!...Someone asked what had surprised him the most since he started. One thing he was surprised about was that about 70% of the book project use was coming from India.

Additional details

Description

Jim Jacobs, Thoughts on Google Book Search, Diglet, February 16, 2006.

Identifiers

UUID
47b83524-21d8-4726-9934-f97618e16cd7
GUID
tag:blogger.com,1999:blog-3536726.post-114018727677765746
URL
https://legacy.earlham.edu/~peters/fos/2006/02/googles-book-scans-are-not-of-archival.html

Dates

Issued
2006-02-17T14:33:00Z
Updated
2006-02-17T14:41:16Z