<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Open House Project &#187; google</title>
	<atom:link href="http://www.theopenhouseproject.com/category/google/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.theopenhouseproject.com</link>
	<description>Recommendations, Resources, and Reform</description>
	<lastBuildDate>Wed, 23 Feb 2011 16:24:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Google Research and the Cloud</title>
		<link>http://www.theopenhouseproject.com/2008/01/22/google-research-and-the-cloud/</link>
		<comments>http://www.theopenhouseproject.com/2008/01/22/google-research-and-the-cloud/#comments</comments>
		<pubDate>Tue, 22 Jan 2008 18:46:07 +0000</pubDate>
		<dc:creator>John Wonderlich</dc:creator>
				<category><![CDATA[OpenHouse]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[google]]></category>

		<guid isPermaLink="false">http://www.theopenhouseproject.com/2008/01/22/google-research-and-the-cloud/</guid>
		<description><![CDATA[It looks like Google will be announcing a new public service, to live at research.google.com, where they&#8217;ll provide free hosting for large public data sets (per tech crunch and Wired).
While this strikes me as a great development, since increasing access to public information should only increase its usefullness and impact, this also raises questions to [...]]]></description>
			<content:encoded><![CDATA[<p>It looks like Google will be announcing a new public service, to live at research.google.com, where they&#8217;ll provide free hosting for large public data sets (per <a id="lb0v" title="tech crunch" href="http://www.techcrunch.com/2008/01/19/google-to-become-open-source-science-repository/">tech crunch</a> and <a id="nk32" title="Wired" href="http://blog.wired.com/wiredscience/2008/01/google-to-provi.html">Wired</a>).</p>
<p>While this strikes me as a great development, since increasing access to public information should only increase its usefullness and impact, this also raises questions to me.</p>
<p>It strikes me that this kind of cloud computing (which I learned about at Princeton&#8217;s CITP <a href="http://citp.princeton.edu/cloud-workshop/">Cloud Computing</a> event) will start to affect the way we think about what is a public utility.  New kinds of relationships will exist between established institutions and new &#8220;cloud&#8221; service providers, which come with new opportunities for gain, abuse, conflict of interest, unseen liabilities, etc.</p>
<p>For example, I expect that Google will be able to see all sorts of interesting metadata about who links to specific Hubble images, or who queries scientific databases, or how.  The question, then, is whether that sort of information will be publicly available (or even if it could be).  If not, then Google&#8217;s benevolence starts to look a lot more like self interest, where they gain not only by becoming the arbiter of the public&#8217;s access to their information stores, but also by gaining a privileged view of how we relate to our public data.</p>
<p>This isn&#8217;t an isolated academic question, either.  The way research data are cited and linked is itself the subject of scientific inquiry, will certainly continue to be invaluable.</p>
<p>Perhaps this is gift-horse-mouth looking, and we should be glad that someone wants to provide a free accessible home to public data.  A little cynicism however, seems in order, and we might have to rethink what it means to provide a free public service.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theopenhouseproject.com/2008/01/22/google-research-and-the-cloud/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Web Harvest Archive</title>
		<link>http://www.theopenhouseproject.com/2007/12/04/web-harvest-archive/</link>
		<comments>http://www.theopenhouseproject.com/2007/12/04/web-harvest-archive/#comments</comments>
		<pubDate>Tue, 04 Dec 2007 16:58:16 +0000</pubDate>
		<dc:creator>John Wonderlich</dc:creator>
				<category><![CDATA[CLA]]></category>
		<category><![CDATA[Congress]]></category>
		<category><![CDATA[NARA]]></category>
		<category><![CDATA[OpenHouse]]></category>
		<category><![CDATA[archive]]></category>
		<category><![CDATA[archivist]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[government websites]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[sitemap protocol]]></category>
		<category><![CDATA[sitemapping]]></category>

		<guid isPermaLink="false">http://www.theopenhouseproject.com/2007/12/04/web-harvest-archive/</guid>
		<description><![CDATA[I&#8217;m glad to have just found the archive of old Web sites from members of Congress, maintained by the Center for Legislative Archives under the National Archives and Records Administration (NARA).
The collection seems well organized and easy to peruse, with solid explanations of their methodology and disclaimers about what&#8217;s available based on the crawling.
My main [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m glad to have just found the <a title="archive of old Web sites" id="e1od" href="http://www.webharvest.gov/collections/">archive of old Web sites</a> from members of Congress, maintained by the <a title="Center for Legislative Archives" id="dugl" href="http://www.google.com/url?sa=t&#038;ct=res&#038;cd=1&#038;url=http%3A%2F%2Fwww.archives.gov%2Flegislative%2F&#038;ei=_YVVR9HQBoX8gAS0tKnyCA&#038;usg=AFQjCNEzLtvCA2NtPrVmqTpY4kDzbg5oNw&#038;sig2=89L1QgDMUonK8VSoLQOmyQ">Center for Legislative Archives</a> under the National Archives and Records Administration (NARA).</p>
<p>The collection seems well organized and easy to peruse, with solid explanations of their methodology and disclaimers about what&#8217;s available based on the crawling.</p>
<p>My main suggestion is that the archiving happen with greater frequency, perhaps coordinated in order to capture the greatest amount of material possible, and for those responsible for the Web Harvest to coordinate with the CAO, systems administrators, and vendors to be sure that the digital records management practices used in organizing member sites encourages easy crawling and archiving by NARA and CLA.</p>
<p>The House has a document laying out best practices for documents management for House offices; I wonder if the digital materials management should be expanded to include digital materials availability, perhaps including standards like <a title="sitemapping" id="f5e_" href="http://googlepublicpolicy.blogspot.com/2007/11/senate-helping-make-govt-more.html">sitemapping</a>, in order to ensure the preservation of member sites?</p>
<p>My other suggestion is to increase the exposure of the captured sites, perhaps encouraging links from the <a title="bioguides" id="gkt1" href="http://bioguide.congress.gov/biosearch/biosearch.asp">bioguides</a>, or current member sites, and to ensure that the collection itself is crawlable through search engine indexing <a title="practices" id="frdt" href="http://googlepublicpolicy.blogspot.com/2007/11/senate-helping-make-govt-more.html">practices</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theopenhouseproject.com/2007/12/04/web-harvest-archive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sitemap Protocol</title>
		<link>http://www.theopenhouseproject.com/2007/11/20/sitemap-protocol/</link>
		<comments>http://www.theopenhouseproject.com/2007/11/20/sitemap-protocol/#comments</comments>
		<pubDate>Tue, 20 Nov 2007 22:17:47 +0000</pubDate>
		<dc:creator>John Wonderlich</dc:creator>
				<category><![CDATA[OpenHouse]]></category>
		<category><![CDATA[executive]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[government websites]]></category>
		<category><![CDATA[openhouseproject]]></category>
		<category><![CDATA[sitemaps]]></category>

		<guid isPermaLink="false">http://www.theopenhouseproject.com/2007/11/20/sitemap-protocol/</guid>
		<description><![CDATA[Google has been working with federal agencies to help them ensure that their data are accessible through search engines.  Many government databases providing critical information or statistics have existed for much longer than the current standards for public Internet accessibility, so the disconnect between search engines and public databases is understandable.
There is a clear [...]]]></description>
			<content:encoded><![CDATA[<p>Google has been working with federal agencies to help them ensure that their data are accessible through search engines.  Many government databases providing critical information or statistics have existed for much longer than the current standards for public Internet accessibility, so the disconnect between search engines and public databases is understandable.</p>
<p>There is a clear public benefit, however,when search terms like &#8220;Colorado census 1990&#8243;, &#8220;federal childhood immunology standards&#8221;, &#8220;Pennsylvania superfund sites&#8221;, or &#8220;Congressional Record 1930 Stock Market&#8221; result in the information the searcher is obviously interested in&#8211;government information.</p>
<p>The solution to this problem is a non-proprietary standard championed by Google, called the <a href="http://www.sitemaps.org/">sitemap protocol</a>.  Implementing this standard helps automated web-crawlers (the stuff of search engines) find their way around your entire site.  Given the immense number of government databases and agencies to reach, getting all government information to show up in web searches will take some time&#8211;unless you have a flexible bureaucracy, or an administrative commitment to modern digital government. </p>
<p>Enter Google&#8217;s <a href="http://googlepublicpolicy.blogspot.com/2007/11/senate-helping-make-govt-more.html">public policy team</a>, and the Senate Homeland Security and Government Affairs Committee:  </p>
<blockquote><p>The Senate Homeland Security and Government Affairs Committee will consider S. 2321, which extends and updates the E-Government Act of 2002. Part of the bill directs the Office of Management and Budget to create guidance and best practices for federal agencies to make their websites more accessible to search engine crawlers, and thus to citizens who rely on search engines to access information provided by their government. It also requires federal agencies to ensure their compliance with that guidance and directs OMB to report annually to Congress on agenciesÃ¢â‚¬â„¢ progress.</p></blockquote>
<p>From <a href="http://www.govtrack.us/congress/billtext.xpd?bill=s110-2321">the bill</a>:</p>
<blockquote><p>`(i) GUIDELINES- Not later than 1 year after the date of enactment of the E-Government Reauthorization Act of 2007, the Director shall promulgate guidance and best practices to ensure that publicly available online Federal Government information and services are made more accessible to external search capabilities, including commercial and governmental search capabilities. The guidance and best practices shall include guidelines for each agency to test the accessibility of the websites of that agency to external search capabilities.</p></blockquote>
<p>This measure will go a long way toward making governmental information relevant to people&#8217;s lives, by making them accessible in the places we would first expect to find them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theopenhouseproject.com/2007/11/20/sitemap-protocol/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

