Whole Earth Web Archive - Explore The 50 Smallest Countries On The Web

Andorra (Domain suffix: .ad)

Antarctic (Domain suffix: .aq)

Aruba (Domain suffix: .aw)

Barbados (Domain suffix: .bb)

Benin (Domain suffix: .bj)

Brunei (Domain suffix: .bn)

Burkina Faso (Domain suffix: .bf)

Chad (Domain suffix: .td)

Comoros (Domain suffix: .km)

Cook Islands (Domain suffix: .ck)

Cuba (Domain suffix: .cu)

Curaçao (Domain suffix: .cw)

Eritrea (Domain suffix: .er)

Falkland Islands (Domain suffix: .fk)

French Guiana (Domain suffix: .gf)

French Polynesia (Domain suffix: .pf)

Gambia (Domain suffix: .gm)

Great Britain (Domain suffix: .gb)

Guam (Domain suffix: .gu)

Guinea (Domain suffix: .gn)

Guinea-Bissau (Domain suffix: .gw)

Iraq (Domain suffix: .iq)

Kiribati (Domain suffix: .ki)

Lesotho (Domain suffix: .ls)

Liberia (Domain suffix: .lr)

Malawi (Domain suffix: .mw)

Marshall Islands (Domain suffix: .mh)

Martinique (Domain suffix: .mq)

Mauritania (Domain suffix: .mr)

Nauru (Domain suffix: .nr)

Netherlands Antilles (Domain suffix: .an)

Niger (Domain suffix: .ne)

Norfolk Island (Domain suffix: .nf)

North Korea (Domain suffix: .kp)

Northern Mariana Islands (Domain suffix: .mp)

Papua New Guinea (Domain suffix: .pg)

Pitcairn Islands (Domain suffix: .pn)

Republic of the Congo (Domain suffix: .cg)

Saint Kitts and Nevis (Domain suffix: .kn)

Sierra Leone (Domain suffix: .sl)

Solomon Islands (Domain suffix: .sb)

South Sudan (Domain suffix: .ss)

Swaziland (Domain suffix: .sz)

Timor-Leste (Domain suffix: .tl)

Togo (Domain suffix: .tg)

United States Virgin Islands (Domain suffix: .vi)

Vanuatu (Domain suffix: .vu)

Vatican City (Domain suffix: .va)

Wallis and Futuna (Domain suffix: .wf)

Yemen (Domain suffix: .ye)

Top Sites

From Wikipedia

About the project | Donate

Project Summary

The Whole Earth Web Archive (WEWA) is a proof-of-concept to explore ways to improve access to the archived websites of underrepresented nations around the world. Starting with a sample set of 50 small nations and extracting their archived web content from the Internet Archive’s total web archive, we have built special search and access features on top of this subcollection and created a dedicated discovery portal for searching and browsing. Further work will focus on improving IA’s harvesting of the national webs of these and other underrepresented countries as well as exploring collaborations with libraries and heritage organizations within these countries, and via international organizations, to contribute technical capacity to local experts who can identify websites of value that document the lives and activities of their citizens.

Project Context

Archived materials from the web play an increasingly necessary role in representation, evidence, historical documentation, and accountability. However, the web’s scale is vast, it changes and disappears quickly, and it requires significant infrastructure and expertise to collect and make permanently accessible. Thus, the community of National Libraries and Governments preserving the web remains overwhelmingly represented by well-resourced institutions from Europe and North America. We hope the WEWA project helps provide enhanced access to archived material otherwise hard to find and browse in the massive 25+ petabytes of IA’s web archive. More importantly, we hope the project provokes a broader reflection upon the lack of national diversity in institutions collecting the web and also spurs collective action towards diminishing the overrepresentation of “first world” nations and peoples in the overall global web archive.

Technical Approach

As with prior special projects by the Web Archiving & Data Services team, such as GifCities (search engine for animated Gifs from the Geocities web collection) or Military Industrial Powerpoint Complex (ebooks of Powerpoints from the archive of the .mil (military) web domain), the project builds on our exploratory work to provide improved access to specific, valuable subsets of the overall global web archive behind the Wayback Machine. The preliminary set of countries in WEWA were determined by selecting the 50 “smallest” countries as measured by number of websites registered on their national web domain (aka ccTLD) (a somewhat arbitrary measurement, we realize). The underlying search index (in Elasticsearch) is based on internally-developed search tools for search of both text and media. Indices are built from features like page titles or descriptive hyperlinks from other pages, with relevance ranking boosted by criteria such as number of inbound links and popularity and include a temporal dimension to account for the historicity of web archives. Additional technical information on search engineering can be found in "Exploring Web Archives Through Temporal Anchor Texts".

Future Work

We intend both to do more targeted, high-quality archiving of these and other smaller national webs and also have undertaking active outreach to national and heritage institutions in these nations, and to related international organizations, to ensure this work is guided by broader community input. If you are interested in contributing to this effort or have any questions, feel free to email us at webservices [at] archive [dot] org. Thanks for browsing the WEWA!

Credits