Internet Archive
Web & Data Services

Access Services

There are two ways to make a domain crawl available for users: either by deploying and supporting an access system internally (usually Wayback or a variant) or by utilizing a hosted instance supported and maintained by IA, but potentially designed in accordance with a partner’s website. The first method is dependent upon the custodial institution and its resources and capabilities. The latter version is something that IA has done for domain-scale crawling partners and examples are provided below.

Wayback Portal

Example of a Wayback Portal Designed for the German National Library.

Some examples:

Search functionalities

Site search: Includes both URL search as well as keyword search. Keywords are derived from the anchor text of all webpages linking to a host. Site search functionality is currently viewable in the new Wayback Machine at https://web.archive.org.

Media search: Media search takes an archived web media resource (such as an image) and “tokenizes” its URL name by turning the filename into individual words which then become the text for a search index. An example of URL tokenization search can be seen in GifCities, where the search engine is powered by the words in the (in this case) .gif filenames. Tokenization provides a way to allow for search of resources that themselves may contain no text.

All search indexing at the Internet Archive is done using ElasticSearch, an open-source and widely utilized search tool. ElasticSearch is used across the Internet Archive for both web and non-web search and includes and monitored and maintained search cluster for high performance and easy addition of multiple indicies.