Research Services

Research Services of the Internet Archive include:

New data-driven forms of research, analysis, and digital humanities scholarship to further demonstrate the value of web archives
Increase use of archived web collections by expanding how these collections can be accessed and queried by users, researchers, and scholars
Allow institutions of any size access to collection-derived datasets whose creation requires complex processing and substantial computing infrastructure
Provide the data and access necessary to support new tools, interfaces, visualizations, and other R&D for collecting, managing, and using web archives

ARS is an ongoing program working to expand the ways that users can access and study web archives. Feedback and input from the community is welcomed. Email us at aitresearchservices@archive.org

Available Datasets

Below are links to technical documentation describing how specific datasets are generated, their format and structure, relative size, and their relation to, and impact by, the computational processes involved in web archiving. Each dataset also has a corresponding page that includes example use cases, outlines some of the types of analysis possible, and sample data visualizations created using these datasets.

Types of Datasets Currently Available

WAT: Web Archive Transformation files feature key metadata elements that represent every crawled resource in a collection and are derived from a collection’s WARC files.
- WAT Overview and Technical Details & WAT Example Use Cases
LGA: Longitudinal Graph Analysis files feature a complete list of what URIs link to what URIs, along with a timestamp, within an entire collection.
- LGA Overview and Technical Details & LGA Example Use Cases
WANE: Web Archive Named Entities uses named-entity recognition tools to generate a list of all the people, places, and organizations mentioned in each URI in a collection along with a timestamp of URI capture.
- WANE Overview and Technical Details & WANE Example Use Cases

Coming soon:

Talks, Workshops, Tutorials, Presentations, Papers + Partner Projects