The Case for Global Standardized Testing

April 27, 2016

Somewhere in a village in Nigeria, a young girl is sitting in school today, just like she does every day, packed onto a crowded wooden bench in a faded school uniform. She represents a victory in the global effort to get all children learning, and her presence will be recorded as progress in the global databases maintained by UNESCO and the World Bank.

There's just one catch. She's not learning anything. She tries to follow along, repeating in unison as her classmates regurgitate what her teacher reads from the blackboard. But after three years of this, she still can't really read. By the time she eventually learns the basics it'll be too little too late, and she'll drift away toward the end of primary school -- an adolescent entering the labor market with several years of schooling and roughly the education of my eight-year-old niece.

Don't bother fact-checking my story. You can't – mostly because it's hypothetical, but also because there are no facts to check. Nobody's keeping track of whether this girl or millions like her are actually learning anything. In theory, Nigeria's education ministry can tell you how many children are enrolled, and how many teachers and desks and bathrooms are in each school. But it can't tell you anything about whether anybody's learning anything. Nigeria is not alone, nor even particularly exceptional among low-income countries.

What gets measured gets managed, and for now, learning isn't.

Done well, standardized testing is an egalitarian enterprise 

One of the most attractive features of standardized testing with a representative sample of children is that it forces school systems to confront the reality of children who would otherwise fall through the cracks. The poorest kids, the kids who are struggling, the ones in the worst schools in the most remote places, count just as much for the national ranking on standardized tests as the kids in model schools in rich neighborhoods.

In contrast, existing international standardized tests do roughly the opposite – they celebrate the success of the successful, and sweep most poor kids in most poor countries under the rug. Most children in the developing world are not included in the sampling frame of any of the well-known international learning assessments.


If you follow education news, you'll have heard of tests like PISA run by the OECD, which measure learning levels of fifteen-year-olds in 65 countries around the world, or TIMSS and PIRLS which do something very roughly equivalent for primary students. Every few years, PISA or TIMSS or PIRLS makes big headlines when they release rankings of countries. South Korea places reliably near the top, and China has carefully cultivated its public image by only letting affluent and fast-growing cities like Hong Kong and Shanghai participate in these tests. 

The fanfare around international test scores is mostly about who's on top. Much less is written about who's on the bottom. The most obvious reason is because the kids on the bottom of the economic ladder didn't take the test. 

Figure 1 shows coverage of these tests by country income level at three different points in the school cycle, roughly corresponding to the three points the UN has proposed measuring for its global goals. Most children are excluded from TIMSS, PIRLS, or PISA, and this is especially true in poorer countries. Even if you broaden the range of tests to include regional initiatives like LLECE in Latin America or PASEC and SACMEQ in Africa, most of the world's population is left out. 

Where learning data is missing, developing-country NGOs are filling the gap

In 2004, an Indian NGO called Pratham launched an ambitious project to measure how many children could read and do basic arithmetic in each of India's 600-plus districts. Pratham assembled an army of over a hundred-thousand volunteers through a network of community-based organizations to go village to village and test half a million children in their homes. The model spread. The test, known as ASER, spawned parallel efforts in Pakistan, then in Kenya, Uganda, and Tanzania, and more recently in Mali, Senegal, Nigeria, and Mexico.

The political theory behind ASER was that change happens at the local level. Parents need to see that their children can't read, and district officials need to see that their own schools are failing. ASER's enormous size – allowing it to produce statistics at the district level – was designed to provide that granularity, and the local implementation and simple learning metrics were designed to communicate directly to parents and local officials.

After a decade of ASER testing in India though, scores have not improved. An evaluation of these "citizen-led assessments" commissioned by the Hewlett Foundation last year was careful to highlight the many ways ASER in India and Uwezo in East Africa have influenced the education debate, but revealed some frustration at the lack of progress.

For all the appeal of the bottom-up approach, perhaps the route to school reform is not through the village. The evidence on information campaigns that seek to stimulate local school accountability by publicizing test scores in the developing world is decidedly mixed. Political scientists Evan Lieberman, Dan Posner, and Lily Tsai ran an experiment in Kenya to test the ASER theory of change by disseminating the results of ASER's sister initiative, Uwezo, back to the villages that were tested, together with invocations to take action and get involved in your child's school. They found no impact on parental participation, which they attributed in part to parents' general satisfaction with current learning levels.

A cheap, low risk gamble

This grassroots approach contrasts sharply with the theory of political change espoused by many advocates of international standardized tests like PISA or PIRLS.

This theory posits that the impetus for policy change comes from the national level, not local schools. To be slightly less than charitable, the theory here is that nationalistic competition among elites drives countries to reform. For those elites, the league tables of PISA scores discussed in The Economist magazine matter more than ASER's community engagement in a remote village. Popular politics may still have a role to play, but those politics play out through a national conversation in the mass media, producing a collective realization that education is in crisis. 

There is some evidence that tests like PISA have actually had this effect. Informed observers say that while Brazil (until very recently) saw its economy as an emerging global power, PISA scores revealed its students were still very much in the developing world, opening up a conversation about evidence-based education reforms. And Poland’s realization that its high school students were falling behind their German peers led to a full restructuring of the junior and senior high school curriculum.

Will this same political dynamic work in India or Nigeria, where the political institutions to translate bad test scores into accountability may be less developed? Nobody knows for sure, but it may be relatively cheap to find out. 

The OECD estimates that it could expand the number of countries in the PISA test of fifteen-year-olds by about 30 countries per year, including low-income countries requiring significant technical assistance, for a price of about a million Euros each.

It's also worth noting these international tests are not nearly as onerous as the repetitive battery of tests that suburban parents in the U.S. complain about – they're done once every few years, and crucially, they focus only on a small sample of children in each country. (The odds of your child getting sampled are less than 1% in most countries.) The goal is to measure the performance of the school system as a whole, not a particular child, teacher, or school. 

Getting down to specifics: start at the beginning

Setting politics aside, there is another area where homegrown NGO assessments have blazed a trail for international standardized tests to follow. Tests like ASER have focused on testing all children, age six and up, with an instrument that prioritizes very basic literacy and numeracy – implicitly shining a light at the bottom of the learning distribution rather than the top.

And if the world is going to measure learning, there's a good argument for starting at the beginning – ensuring that the littlest kids master the most basic skills.

Experts in early-childhood education, like Nobel-prize winner James Heckman, argue that the return to human capital investments is highest for the youngest children, and decreases as they get older. It's somewhat unfortunate then, that the biggest gap in international testing in Figure 1 is for the youngest kids. Almost nobody does comparable testing at early primary ages. So if the UN aims to measure global progress on basic literacy around grade two, it's going to be starting almost from scratch. Regional tests like PASEC and LLECE have recently inaugurated early-grade math and reading tests, but together they only cover about 12% of the world's children.

One practical challenge that’s particularly acute in developing countries is that kids of a given age are spread out across a wide range of grades – or not in school at all. As seen in Figure 2, the biggest share of Indian eight-year-olds are in third grade, while in Kenya they’re in second grade – and if you focus on just boys in Uganda, the biggest chunk are still in first grade. But in all cases, you have a sizeable share of eight-year-olds spread across four or five grades.


These differences in grade progression can make comparisons of learning by grade level very deceptive. The ASER and Uwezo-style tests, which are based on household samples of kids at all ages, provide a birds-eye view of this problem. Comparing India and Kenya at sixth grade gives the impression of much higher learning levels in Kenya (see Figure 3). But this is purely an illusion of different proclivities for promoting failing students. Comparing kids by age in the bottom panel of the figure shows no gap between countries whatsoever.


Advocates of grade-based tests are not oblivious to this reality. Their argument is that measuring learning levels by grade promotes accountability for schools and teachers, highlighting whether they’re keeping up with the curriculum. The counter-argument is that it’d be small consolation that Kenyan sixth-graders were keeping up with the curriculum (though they’re not) if most of the sixth-grade cohort was still back in fourth grade trying to grasp basic multiplication. Age-based testing can assess whether the system is not only reaching kids who are ready for sixth grade, but also ensuring that kids get there in the first place.

Beyond these sampling issues, there are some sound reasons why serious psychometricians balk at the simple literacy tests used by ASER and Uwezo, and even more sophisticated tools like the Early Grade Reading and Math Assessments (EGRA and EGMA) promoted by USAID.  Ensuring comparability across languages and curricula is difficult. ASER and EGRA have a good comeback to this critique: normal standardized tests may be impossible in their contexts. Especially in the poorest countries where reading levels are low, you can't just herd second-graders into an empty classroom, give them a number 2 pencil and a bubble form, and expect them to fill in a multiple choice test.

But these are technical debates, subject to technical resolution. Political will is much harder to manufacture.

In an ideal scenario, the world will settle on an international testing regime that makes it possible to have national level debates – comparing Nigeria's performance to Ghana's as well as Malaysia's – while embracing NGO’s concern for the youngest and most disadvantaged learners. For the time being, international organizations have constructed a vast system of statistics to measure mastery of complex concepts among students in Boston and Shanghai, while making do with crude metrics of enrollment and textbooks in rural Nigeria and the slums of Dhaka. Illiteracy remains a mostly silent epidemic, and it seems unlikely that we’ll fix it before we bother to measure it. It’s time for global standardized testing.

This is one of a series of blog posts from “RISE"–the large-scale education systems research programme supported by the UK’s Department for International Development (DFID) and Australia’s Department of Foreign Affairs and Trade (DFAT). Experts from the Center for Global Development lead RISE’s research team.


CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.