Publishing a Structured Database of Legislative Information
Congress should make available to the public a well-supported database of all bill status and summary information currently accessible through the Library of Congress. This database, as well as its supporting files, should be in a structured, non-proprietary format such as XML. Records in the database should be updated in a timely manner. Such a database would enable independent Web sites to use information in new and creative ways, including educating the public about Congress and providing citizens with customized views of its proceedings. The database, being authoritative, would ensure the accuracy of the information presented to the public by independent Web sites. We also make recommendations regarding THOMAS (Library of Congress) and Government Printing Office products, the availability of bills online before they are debated and legislative language for future congressional databases.
Overview of Access to Legislative Data
The Library of CongressÃ¢â‚¬â„¢ launch of the Web site THOMAS 2 , the publicÃ¢â‚¬â„¢s primary Internet source for the status of federal legislation, was a milestone for transparency in 1995. The Web has changed dramatically since then, growing from a web of static pages to a web of pages and data, from which information can be downloaded and integrated into a variety of customized information resources. What it means to be on the Web todayÃ¢â‚¬â€on what is informally called Ã¢â‚¬Å“Web 2.0Ã¢â‚¬?Ã¢â‚¬â€involves not just creating a Web site to be browsed, but supplementing it with authoritative, structured data that facilitates the efficient reuse of information. Examples of structured data include RSS feeds, as well as XML data downloads such as the roll call vote XML files currently on the House Web site.
We recommend that the House embrace structured data by publishing the status of legislation and other information to the Web not only as it is now, but also in structured data formats.
An example: The Illinois General Assembly maintains an FTP site for its structured legislative data in XML alongside its public access databases. The stateÃ¢â‚¬â„¢s Legislative Information System Executive Director, Tim Rice, said making the data available was straightforward: Ã¢â‚¬Å“Since we had the data in XML, it made sense to provide it. . . . It made the data available in a useful format, and it provided access to data that kept our site from being crawled constantly by those wanting that data. . . . There really havenÃ¢â‚¬â„¢t been any problems with maintaining the FTP site. The data is moved there as part of our regularly scheduled processes, so it doesnÃ¢â‚¬â„¢t require special attention.Ã¢â‚¬?3 .
History of THOMAS
THOMAS was launched by the Library of Congress, at CongressÃ¢â‚¬â„¢ request, in January 1995 at the start of the 104th Congress. Its creation was spearheaded by then-Speaker of the House Newt Gingrich, following on the heels of a Democratic initiative to modernize the Government Printing Office (GPO)4 . At the time, the House of Representatives, the Senate, the Library of Congress and the LibraryÃ¢â‚¬â„¢s Congressional Research Service (CRS) already had a history of sharing data for mainframe legislative information systems. The core legislative database on THOMAS is the CRS Bill Summary and Status database, which features CRS summaries of legislation and daily updates on the status of legislation. It integrates this information with selected full-text documents from GPO and links to other data, such as roll call votes on the House and Senate Web sites.
THOMAS now provides online access to authoritative legislative information from Congress, including:
- Ã¢â‚¬Â¢Bill Summary and Status, 93rd Congress (1973) to present
Ã¢â‚¬Â¢Text of Legislation, 101st Congress (1989) to present
Ã¢â‚¬Â¢Congressional Record, 101st Congress (1989) to present
Ã¢â‚¬Â¢Committee Reports, 104th Congress (1995) to present
Ã¢â‚¬Â¢Senate Status of Treaties and Nominations, 90th Congress (1967) to present
THOMAS has evolved over time to include helpful links from a bill to related information, such as committee Web site documents, House special rules, Congressional Budget Office cost estimates, and the Public Laws database at GPO.
The work that has gone into making the THOMAS and GPO Web sites as informative as they are is highly commendable. With recent technological advances, even more is possible. For example, todayÃ¢â‚¬â„¢s online citizens expect stable links back to a document. THOMAS pages for some documents, including the text of a bill and pages within the Congressional Record, have temporary addresses that expire after a short period of time. 5 These temporary links cannot be cited or e-mailed to another person, discouraging individuals and members from linking to THOMAS and limiting its effectiveness. With up-to-date Web technology, this problem could be resolved.
In a related matter, GPO fees currently discourage the use of GPOÃ¢â‚¬â„¢s authoritative version of legislative texts. GPO charges $7,280 yearly for electronic receipt of its Daily Bills file, which contains the text of legislation-as-drafted in a format suitable for reuse by independent Web sites. THOMAS itself purchases this product from GPO to display the text of legislation on its own Web site. The cost is governed by 44 U.S.C. 1708, which states that documents are to be sold at cost plus 50%, and by 44 U.S.C. 4102, which states that electronic documents are to be sold to recover only the incremental cost of distribution. While the distribution of the text of legislation may have been costly in 1995, justifying GPOÃ¢â‚¬â„¢s past need to charge for distributing these data files, there is virtually no cost to distribute text over the Internet today. The cost should be adjusted to reflect current costs, if not abolished, and the public should be encouraged to use this public resource.
The Library of Congress is currently working on improving THOMAS and is testing a beta version with a new search engine and a new design 6 . GPO is also working on its Future Digital System (FDsys). It is an opportune time to implement changes that will advance the availability of legislative status information from 20th century state-of-the-art to model 21st century best practices.
Structured Information in the Government
Allowing people to use computers to check the status of legislation will help them understand what Congress does day to day. With the ever-increasing amount of information that is available online, we need computers to help us organize, digest and integrate the information put out by Web sites. Webmasters who publish content in two formsÃ¢â‚¬â€once in the way we all think of as a Web page, and again in a structured data formatÃ¢â‚¬â€make it possible for computers to transform that content in new ways, giving users new perspectives. RSS and iCal feeds, two popular examples of structured data, are used to aggregate news and blog headlines and events calendars, allowing computers to help people manage the increasing amount of information coming at them. Making legislative information available in a structured data format will provide the public with new and important views into the legislative process.
The notion of structured data is not new to the federal government. The Census Bureau, for instance, has for years not only provided a Web interface for census statisticsÃ¢â‚¬â€that is, a page where users can find simple data such as population numbersÃ¢â‚¬â€but also the complete set of numeric data files to be downloaded and imported into database and statistics programs. The benefit of a download of the data is that with the complete data set computers can help people delve more deeply into the data and put it in new forms, such as charts and maps, that would be too time consuming to create by hand. 7 Another example is the Securities and Exchange CommissionÃ¢â‚¬â„¢s practice of making investment filings available to the public in XML format through its EDGAR program. 8 Likewise, the Federal Election Commission makes campaign contribution information available in a downloadable structured data format 9 , allowing the public to absorb the information in a variety of ways.
Nor is structured data new to the legislative branch. The House has for the last several years published the results of roll call votes in XML, an advance that has made it easier for independent Web sites to put the information to new uses, such as creating visual maps for votes and e-mailing results of votes to interested citizens. 10 Moreover, the House and Senate have been drafting legislation using XML for the last several years, a use of structured data that has improved the internal bill drafting work flow. 11 Making structured data on legislative status available to the public would be a natural extension of CongressÃ¢â‚¬â„¢ efforts to make its operations more open and transparent.
Six states already publish the status of their state legislation in a structured data format: Connecticut, Illinois (as described above), Minnesota, Oregon, Texas and Virginia. The Virginia General Assembly provides a continually updated database of the status of state legislation. As a result, independent Web sites like RichmondSunlight.com can help educate the public about the activities in the legislature in creative ways, such as through graphs and Ã¢â‚¬Å“tag clouds.Ã¢â‚¬? Virginia and these other states have shown that the goal presented here is both attainable and beneficial to the public good.
Recommendations for the Legislation Database
Making the status of legislation available in a non-proprietary structured data format, such as XML or RDF, would support the creation of Web sites that can help peopleÃ¢â‚¬â€among them students, journalists, researchers, attorneys, federal employees and D.C. insidersÃ¢â‚¬â€to understand and follow the proceedings of Congress. Citizens can track bills that interest them, follow the actions of their representatives and better understand the legislative process, furthering the goal of making Congress open and transparent. Publishing the information in a structured format allows it to be put to new, creative uses, not all of which the THOMAS staff has the mandate or resources to implement. Such uses could include:
- Ã¢â‚¬Â¢sending e-mails to citizens when a bill they are following moves forward in the legislative process
Ã¢â‚¬Â¢creating an education-oriented Web site that presents legislative terminology on a basic level
Ã¢â‚¬Â¢comparing bills and tracking changes to bills as they are amended
Ã¢â‚¬Â¢providing RSS feeds of recently introduced bills for a particular subject
Ã¢â‚¬Â¢creating a Web site in the style of a wiki for the public, collaborative analysis of legislation
These are just a few ideas, all of which would have significant benefits for the public. None of these applications is possible without a structured database of legislative information. It would be too time consuming and costly for any person to individually process each of the thousands of bills introduced in Congress.
The Legislation Database can be created as an adjunct to what already exists, through an addition to THOMAS, as a new product from the Library of Congress, or simply as a new FTP-based data source. Because the information that would go into the database already exists as an XML database within the Library of Congress (it is precisely what powers THOMAS and the internal Legislative Information System), there may be only a minimal cost to create a database suitable for the public.
An authoritative database of this information is sorely needed. Some of the applications mentioned above already exist in some form because entrepreneurs are independently constructing partial legislative databases of their own, but these independent databases have the potential to unknowingly spread inaccurate information. They are not long-term solutions. The only freely available source for downloading structured legislative data is created and maintained by GovTrack.us , a private, independent effort. GovTrackÃ¢â‚¬â„¢s database is the source for the information behind other public Web sites, such as OpenCongress.org , and as a result any errors in the original database have a wide impact. Common errors include delayed bill records, outdated cosponsor lists and incomplete committee membership listings. The errors, gaps and delays stem from the automated way in which the independent databases are reconstructed from the scattered, unstructured information that is available now. An authoritative structured database directly from Congress would provide a current, complete, accurate and reliable basis for these applications.
A database that is well-documented (including supporting files) and provides regularly updated records for download will provide its technical users with the reliable source of data that is now lacking. The importance of timeliness for the existing public legislative database has recently begun to find traction. The Sunlight Act of 2007 (H.R. 170), introduced by Rep. Steve King, would require that bills, resolutions, amendments and conference reports be made available on the InternetÃ¢â‚¬â€in some cases at least 48 hours before their consideration. We believe the passage of this bill would be a clear signal to the public of a commitment to transparency. (S. 1, already passed by the Senate, would require the same but only of conference reports.) Supporting files for the legislative database should include a roster of Congress and of committee assignments, but other related data sets developed in the House could be included as well. This might include the database of congressional district ZIP codes that powers the Ã¢â‚¬Å“Find Your RepresentativeÃ¢â‚¬? tool on the House Web site. Making that additional database available, either as a download or Web API, would widen its impact by allowing independent Web sites to help more people learn which district they live in. The increased number of data sets (beyond the status of legislation itself) and their timeliness will ensure that the public has comprehensive access to congressional information.
It is important for future bills and resolutions that establish new databases to provide for the availability of a downloadable, structured data counterpart to any searchable Web interface. For instance, H. R. 169, introduced by Rep. Dennis Moore, would require a list of earmarks in bills and amendments to Ã¢â‚¬Å“be made available on the Internet in a searchable format to the general public for at least 48 hours before consideration.Ã¢â‚¬? We strongly support the consideration of this bill, and recommend amending it to ensure the availability of a downloadable, structured database of the same information. 12 Language already employed in S. 1 to improve lobbying disclosure takes into account the importance of a structured database:
Ã¢â‚¬Å“maintain, and make available to the public over the Internet, without a fee or other access charge, in a searchable, sortable, and downloadable manner, an electronic databaseÃ¢â‚¬?
By adding Ã¢â‚¬Å“downloadable,Ã¢â‚¬? the bill provides for the possibility of a structured database of the information and addresses the importance of making information available in a form that encourages reuse. We strongly recommend the inclusion of such language in other bills and resolutions addressing the availability of information on the Internet.
As a potentially low-cost addition to Congress’s Web presence, a structured database of legislative information would have innumerable benefits for an open and transparent Congress. Such a database would enable independent transformative uses of the information, which ultimately create an informed and responsible public. We recommend specifically that the following structured databases be made available:
- the status of legislation (i.e., what is on THOMAS and the Legislative Information System. Provisions for the timely inclusion of bills, resolutions, and amendments are also recommended.)
supporting information, including the roster of Congress and committee membership
the congressional district ZIP code database used by the House Web site
We also recommend that:
- THOMAS provide permanent links to all documents, in an obvious way, to enable Web researchers to directly refer to these documents
GPO lower or abolish its fee for downloading the base text of legislation files (the Ã¢â‚¬Å“Daily BillsÃ¢â‚¬? product) to reflect todayÃ¢â‚¬â„¢s lower file distribution costs (as GPOÃ¢â‚¬â„¢s mandate requires), thereby encouraging the use of GPOÃ¢â‚¬â„¢s authoritative texts of legislation
We also suggest that legislative language in future bills and resolutions regarding Ã¢â‚¬Å“searchable and sortableÃ¢â‚¬? public data sets include a provision for making the data sets Ã¢â‚¬Å“downloadableÃ¢â‚¬? in a structured format.