House Publications as Historical Documents
Recommendation Summary
While it is essential that citizens have timely access to current congressional information, it is equally important for citizens to have guaranteed permanent access to the historical record of congressional activities. The provision of timely access does not, by itself, guarantee long-term access. Preservation and long-term access require specific procedures in addition to those that are necessary for short-term access. Those procedures include providing timestamps and hashes for documents, establishing policies for the inclusion of copyrighted works in public materials and distributing documents to memory institutions.
History of the Availability of Congressional Information
Before the digital age, preservation of and long-term access to congressional information was ensured by the deposit of books and other materials into a legally defined system of federal depository libraries. To facilitate this, the United States Code establishes a Federal Depository Library Program (FDLP) of congressionally authorized libraries in virtually every congressional district. These libraries operate under rules and regulations about selecting, acquiring, organizing and preserving government publications and making them publicly accessible. 13
With the advent of better tools for digital publishing, the Government Printing Office (GPO), which administers the FDLP, has stopped distributing most congressional information to depository libraries, in favor of providing online access to these materials through government-controlled Web servers.
This change from a “deposit� model to an “access� model significantly affects the mechanism for preservation. 14 Where once the physical and financial responsibility for preservation resided in a distributed system of independent libraries operating under GPO guidelines, the current system relies on GPO alone to provide the “single authoritative source� for “permanent preservation for public access� to all federal documents. 15 This puts the preservation of congressional information at risk in two significant ways.
First, because this new method relies on a single government agency (GPO) rather than a large number of distributed, separately funded, separately administered depository libraries, a change in federal policies or priorities or inadequate funding of GPO could cause loss of access or loss of information for everyone.
Second, the new system makes it more difficult, and in some cases impossible, for any memory institution 16 to identify and acquire digital materials for preservation. One reason is that the government permits the deposit only of what it calls “tangible� publications, thus making most digital congressional information unavailable for preservation at FDLP libraries. This situation is further complicated by the lack of timely notices—in a machine-processable format—of the addition of digital information. This forces institutions to use inefficient and often ineffective procedures for identifying and acquiring information from government Web sites. 17 Further, because the “access� model results in systems that are designed for use by individuals who access small amounts of information at a time, these systems often make it difficult for organizations to comprehensively acquire information in bulk for systematic preservation.
Historically, there have always been gaps in what information got into the depository system and was therefore preserved. Materials not published by GPO were rarely deposited. Audio-visual materials, such as videos of congressional hearings and House debates, were not made available to depository libraries. Databases of information were deposited sometimes as static, published documents and sometimes as applications in proprietary formats—if they were deposited at all. Technological advances have made it possible to deliver many of these “fugitive� materials in ways that were technically or financially difficult in the past.
Key Stakeholders and Political Climate
GPO and the FDLP are not the only entities preserving and providing access to congressional materials:
- •The Library of Congress provides access to some legislative materials through its THOMAS Web site. 18
•The National Archives and Records Administration retains legislative information, but posts little of it online.
•A variety of collections of digital government information are publicly available from private organizations, such as the Internet Archive, 19 and universities such as the University of North Texas. 20
•C-SPAN television network provides “gavel-to-gavel proceedings of the U.S. House of Representatives and the U.S. Senate� 21 and recently announced a “liberalized copyright policy� that will allow some copying, sharing and posting of C-SPAN video on the Internet. 22 Carl Malamud, who created the first webcasts of the House and Senate floors, has recently experimented with a variety of methods and locations for hosting video on the web and has summarized his findings in an open letter to Speaker of the House Nancy Pelosi. 23 (This is covered in detail in the Congressional Video section of this report.)
•Stanford University Libraries are collaborating with GPO on a pilot project to investigate the effectiveness of using the preservation and distribution software LOCKSS (Lots of Copies Keeps Stuff Safe) for government information. 24
•Private sector companies offer congressional information selectively for fees. 25
Despite the work of all these stakeholders, there are still gaps in preservation, and barriers to preservation and access. There are at least three barriers to long-term preservation:
- •Some government publications, which are not normally copyrightable, contain copyrighted information (e.g., hearings may have copyrighted materials submitted to accompany testimony). 26 This adds a potential legal complication to the preservation and re-distribution of digital materials. The Google Books project, for example, treats the U.S. Serial Set as if it were copyrighted. 27
•GPO has a stated mission of distribution of electronic government information “on a cost recovery basis.� 28 This creates an economic obstacle for memory organizations that wish to preserve the information, opens the door for GPO to apply Digital Rights Management technologies and licensing restrictions to such information products to prevent their re-distribution and creates an economic obstacle to the public’s making direct use of government information. The effects of this policy can already be seen in the way GPO handles digital Congressional materials in a format suitable for preservation—rather than offer these materials for deposit into FDLP libraries, GPO sells them for high fees. 29
•When there is a change in committee leadership or in leadership of the entire Congress, the current system supports completely wiping the history of past work from house.gov . Any files not deposited before such changes are at high risk for loss.
Recommendations
To ensure guaranteed, long-term, free, public access to a comprehensive collection of government information, it is essential to provide more than timely access. The government must distribute preservable versions of the information directly to memory institutions that have as their primary mission the long-term preservation of, and free public access to, government information. To accomplish this for House publications, the House of Representatives should take the following actions:
- •Create all digital content with preservation in mind. Specifically:
•Use only open file formats (e.g., PDF/A 30 ) rather than proprietary formats (e.g., Apple Quicktime, RealMedia .rm, Microsoft .doc, .xls, .ppt).
•Use operating-system neutral formats rather than formats that require a specific computer operating system.
•Avoid data formats that depend on commercial software or that require commercial operating systems.
•Reject the use of Digital Rights Management technologies that restrict or limit access in any way.
•Reject licensing restrictions on the use and reuse of congressional information, including instances where copyrighted works are included in the public document and third parties may attempt to restrict distribution of the document or portions of it.
- •For each discrete digital object, provide a digital hash or digital signature that can be used to verify that the object has not been modified. 31
•For each discrete digital object, provide a time stamp and version number, or other indications of when the object was created and its relationship to other versions of the same information.
•Explore legislation that provides for the preservation and redistribution of copyrighted materials that are incorporated into government information products.
•For audio-visual information, such as videos of committee hearings and House proceedings, provide files for downloading after the fact in addition to live streaming feeds.
•Deliver content directly to GPO and to FDLP libraries using an appropriate digital deposit technology. 32 Work with GPO to ensure that new procedures for deposit of digital materials are developed and enforced for FDLP libraries and that all online content is transferred to GPO before changes in committee or other leadership take place.
•Establish a centralized message system for communication among preservationists and between preservationists and the House.
Conclusion
Preservation as an institutional activity is sometimes overlooked in the age of instant access. When we see the rapid distribution and redistribution of newsworthy congressional information on the Web, it is easy to forget that information that may seem mundane or even unimportant when it is released may not get preserved without a concerted effort by memory institutions. Such information will be vital to historians, journalists, lawmakers and citizens.
Congress should not rely on the government alone (e.g., GPO, the Library of Congress, and the National Archives) to guarantee both the preservation of and long-term, free, public access to all congressional information. Nor should Congress assume that congressional information will be comprehensively preserved by others without explicit partnerships and technological planning. By relying on the existing law (44 USC 19) and ensuring the deposit of congressional information in open formats suitable for preservation, Congress can guarantee robust and reliable access to congressional information for future generations.
31 For example, FIPS 186 ( http://www.itl.nist.gov/fipspubs/fip186.htm ). Accessed March 27, 2007.
32 For example, LOCKSS ( http://www.lockss.org/lockss/For_Publishers ), File Transfer Protocol, OAI-PMH ( http://www.openarchives.org/pmh/ ), the Sitemaps protocol ( http://sitemaps.org/ ), or similar tools and protocols.



Make a Suggestion
1 response so far ↓
Library of Congress Blog, and Preservation | The Open House Project // May 24, 2007 at 10:47 am
[...] It’s fun to see the LOC putting up this kind of content; it provides a great window into what’s on the radar of some people at the LOC. I’m also glad to read about their interest in digital preservation, since that’s an entire chapter of our report. [...]
Leave a Comment