Skip Header

NARA Archives.gov Home

Web Harvest of the 111th Congress (2010) FAQ

What is the "111th Congress (2010) Web Harvest"?

The 2010 Congressional Web Harvest for the 111th Congress is a National Archives and Records Administration (NARA) project that produced a collection of congressional web sites copied, or harvested, from the World Wide Web between 12/1/10 and 01/12/11.

How accurate is the harvest?

The accuracy of each harvest was affected by these factors:

NARA has made every reasonable effort to ensure that web sites' code and programming were captured accurately. NARA is not responsible for any web sites' compliance with Federal laws, regulations, and requirements. NARA is responsible for providing public access to these copied web sites but is not responsible for maintaining code such as links, accessibility features, search or site maps, or other functionality that may have been true of the sites before they were copied.

What does "harvested" mean?
Web harvesting is the process of automatically copying and organizing unstructured information from pages and data on the World Wide Web. It is also known as web mining, web scraping and web crawling. Web sites are identified with a "seed list" of URLs which are "harvested" so that content within, or linked to an identified site, is captured and copied.
Who conducted the harvest?
NARA contracted CACI-ISS to manage the project while Internet Archive (IA), a San Francisco nonprofit, performed the harvest.
How large is the collection?
The harvest collection contains approximately 1.3TB of information and roughly 14,592,000 downloaded files active between 12/1/10 and 01/12/11.
Why doesn't form input or streaming video work in the collection?
A harvest engine is not able to read and use the forms, video, or complex javascript. That means that forms and databases will not be active in the harvest, and files that can only be streamed from a website have not been harvested.
Can I search the archive?
Yes, by:
Why isn't the site I'm looking for in the archive?
Sites were not harvested because: