Published March 9, 2006 | Version v1 | https://doi.org/10.63485/1vw3a-c0k88

How well do search engines index the OA repositories?

Creators

Frank McCown and three co-authors, Search Engine Coverage of the OAI-PMH Corpus, IEEE Internet Computing, March/April 2006.

Abstract: The major search engines are competing to index as much of the Web as possible. Having indexed much of the surface Web, search engines are now using a variety of approaches to index the deep Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their holdings, some of which are indexed by search engines and some of which are not. To determine how much of the current OAI-PMH corpus search engines index, we harvested nearly 10M records from 776 OAI-PMH repositories. From these records we extracted 3.3M unique resource identifiers and then conducted searches on samples from this collection. Of this OAI-PMH corpus, Yahoo indexed 65%, followed by Google (44%) and MSN (7%). Twenty-one percent of the resources were not indexed by any of the three search engines.

Additional details

Description

Frank McCown and three co-authors, Search Engine Coverage of the OAI-PMH Corpus, IEEE Internet Computing, March/April 2006.

Identifiers

UUID
a413450f-726f-4e40-8509-57dc72311ca0
GUID
tag:blogger.com,1999:blog-3536726.post-114187436182976780
URL
https://legacy.earlham.edu/~peters/fos/2006/03/how-well-do-search-engines-index-oa.html

Dates

Updated
2006-03-09T03:19:21Z
Issued
2006-03-09T03:15:00Z