If a researcher wants to call her work science, others must be able to reproduce her results. This “replicability” is at the core of scientific inquiry. Replicability is what turns one person’s experience into a resource for humanity, what moves an anecdote toward a fact.
But much new social and economic research—based on the recent explosion of digital data collection—cannot be reproduced because the necessary data are not available to other researchers. The more researchers can provide open access to the data and computer code needed for replication of their results, the more their work can aspire to the label of social science.
In this spirit, the Center for Global Development has instituted a new research data disclosure policy. It reads in part,
The original data and all computer code needed to prepare and perform an analysis should be posted on our website in an accessible form so that others can understand and replicate our results. … There will be times when full disclosure of data will not be possible, as a result of confidentiality requirements, commercial ownership, or other professional costs of disclosure. In such cases, the reason for not disclosing data will be made public.
For example, I recently posted the computer code and data needed to reproduce the results in my paper with Gabriel Demombynes on problems with the evaluation of the Millennium Villages Project (earlier, ungated version here). Before we posted these materials online, we provided them to anyone who requested them. Thus anyone on earth can scrutinize and critique every calculation we did in that paper, at no cost. This approach contrasts sharply with the approach of the Millennium Villages Project itself, which has published evaluation results based on data that the project does not release, either online or in response to requests. This fact by itself does not make the Project’s work pseudoscience, but it greatly impedes the scrutiny critical to credibility in scientific inquiry.
In this way we add CGD to the growing global movement for open data. Many of the top research journals, such as the American Economic Review and Political Analysis have begun to require that authors provide code and data, unless dispensation is granted in special situations and other arrangements for data access are made. Harvard’s Gary King has an excellent website on data sharing and replication in social science, and has exemplified the approach in his own work. There is a parallel global movement for open data to promote transparency in foreign aid.