It looks like Google will be announcing a new public service, to live at research.google.com, where they’ll provide free hosting for large public data sets (per tech crunch and Wired).
While this strikes me as a great development, since increasing access to public information should only increase its usefullness and impact, this also raises questions to me.
It strikes me that this kind of cloud computing (which I learned about at Princeton’s CITP Cloud Computing event) will start to affect the way we think about what is a public utility. New kinds of relationships will exist between established institutions and new “cloud” service providers, which come with new opportunities for gain, abuse, conflict of interest, unseen liabilities, etc.
For example, I expect that Google will be able to see all sorts of interesting metadata about who links to specific Hubble images, or who queries scientific databases, or how. The question, then, is whether that sort of information will be publicly available (or even if it could be). If not, then Google’s benevolence starts to look a lot more like self interest, where they gain not only by becoming the arbiter of the public’s access to their information stores, but also by gaining a privileged view of how we relate to our public data.
This isn’t an isolated academic question, either. The way research data are cited and linked is itself the subject of scientific inquiry, will certainly continue to be invaluable.
Perhaps this is gift-horse-mouth looking, and we should be glad that someone wants to provide a free accessible home to public data. A little cynicism however, seems in order, and we might have to rethink what it means to provide a free public service.



Make a Suggestion
1 response so far ↓
John Wonderlich // Jan 22, 2008 at 6:58 pm
wanted to connect this post with what I was referring to about scientific data citations being the subject of rigorous study, by adding a link to “impact factor”: http://en.wikipedia.org/wiki/Impact_factor
Leave a Comment