“Do kids in developing countries need less reading and math skills than OECD Kids?” This question did not appear on the agenda at a three-day workshop recently organized by USAID. It was not even articulated. But the entire event—rather opaquely titled: “Linking Assessments to a Global Standard with Social Moderation”—was predicated on the assumption that some new global standards were needed because the definitions of basic reading and math skills used by the OECD are too unattainable for many/most developing countries. If that sounds horribly retrograde and paternalistic, it is. Which is why it is never openly discussed. However, new US government policy mandates the State Department and USAID to demonstrate increases in the “percent of learners who attain minimum grade-level proficiency in reading at the end of grade 2 and at the end of primary school” in countries receiving US support. With an $800 million international basic education budget on the line, there are high stakes around how “minimum grade-level proficiency” is defined and measured.
Searching for a global baseline to measure progress
Three years after the SDGs were adopted and established goals for education focused on learning, rather than enrollment, there is still no solid global baseline against which to measure progress and 100 countries still do not measure learning at all. The SDG commission mandated tracking the proportion of young people achieving a least a minimum proficiency level in reading and math at three points in the education cycle: grades 2/3, end of primary school, and end of lower secondary school. For the end of primary and lower secondary school UNESCO’s Institute of Statistics (UIS) has made heroic efforts to patch together data from many different sources to arrive at grosso modo estimates of learning levels, and they are disturbing. Over half of children globally do not achieve minimal proficiency in reading and math by the end of primary school—even while staying in school through completion. For lower secondary, UIS estimates that 6 of 10 adolescents either fail to complete the cycle or do so without attaining minimal reading and math skills.
This is patent evidence that waiting until the end of primary school to measure learning is too late. Countries need an earlier warning that their system is failing their children. The SDG commission seemingly recognized this, in calling for a proficiency measure at grades 2/3. But it is not clear they thought through the technical challenges and high costs of measurement at grades 2/3 versus grade four. No international assessments measure learning below grade four, because children are generally not ready for paper and pencil standardized tests below this age; only the Latin America regional assessment tests children in grade three.
EGRA and EGMA: good for early grade literacy; bad for global comparison
Then why set a measure at grade 2/3? The Commission’s formulation may have been influenced by the spread of diagnostic tools for Early Grade Reading Assessment (EGRA) and Early Grade Math Assessment (EGMA) promoted by USAID across the developing world, and especially in Africa. These tests use one-on-one examiners to test children orally, so they can be applied in grades two and three.
EGRA and EGMA are undeniable success stories; their use has already had profound and valuable impacts in many of the 100 countries that have no national learning assessments. EGRA, for example, breaks the ability to read into constituent skills that must be mastered along the way: letter recognition, word recognition, sentence recognition, oral comprehension, ability to read a text, and ability to read a text quickly; there is some evidence that reading a text “fluently” (say, at 60 words per minute) is correlated with the ability to comprehend what is read. Using EGRA to measure these skills in a systematic way has exposed problems in the methods used to teach reading in many countries. More importantly, it has inspired the development of new curriculum standards, new reading and textbooks and new teacher practices in the early grades. In countries such as Kenya, which has just scaled up nationally an EGRA-inspired reform of the first two grades of primary school, the early results are pretty amazing. (Stay tuned for my forthcoming blog post on Kenya’s “reading revolution.”)
But psychometricians agree that the same properties that make EGRA and EGMA good at diagnosing early grade literacy and numeracy skills make them unstable and unsuitable for cross-country comparison and global monitoring. First, with oral tests given to individual children in sequence, it takes a long time (and substantive expense) to test a whole class. Using more evaluators working in parallel is equally expensive and even more difficult to standardize. Second, linguistic experts document large variance in the complexity of mastering letter and word recognition in different languages. Given wildly different alphabets and phonics, “fluent” reading in some scripts is 40 words per minute, not 60.
Critical requisites of tests that can reliably track student performance over time and make fair comparisons across settings are complete standardization of both the instrument and its administration, down to the need to prove the psychometric equivalence of different test booklets with test items presented in a different order. This degree of standardization over time and place simply cannot be achieved with oral tests such as EGRA and EGMA. Still worse is the risk that attaching high stakes (donors’ need to show learning increases) to oral tests whose administration cannot be easily standardized or audited would sooner or later corrupt them -- and destroy all the good they do as low-stakes diagnostic tools.
Enter PIRLS and TIMSS
Standardized administration is achieved by the major international tests: PIRLS (which measures reading skills in grade four) and TIMSS (math and science skills in grades four and eight). Their costs of application are much lower, as they are paper and pencil tests given to an entire class of students at once. Their validity in measuring what is important for students to know and be able to do by the fourth year of schooling has been refined over decades of experience and expert consultation (to eliminate cultural and linguistic biases) in the 70+ OECD and developing countries that already participate.
PIRLS defines “minimal literacy” (in grade four) as the ability to use a short text “to locate and retrieve explicitly stated information, action or ideas: make straightforward inferences about events and reasons for actions; or begin to interpret story events and central ideas.”
This is not a low standard, and even though 96 percent of fourth graders in the US and 97 percent in England achieve it; in Egypt and Morocco the share is only one-third, and in South Africa, one-fourth. (Spaull, 2018) But it is a highly meaningful standard; not only does it assure that students are ready to benefit from additional years in school, it demonstrates that the school system is delivering a globally relevant level of education quality.
Why is this standard not relevant for children everywhere?
Put differently, imagine a world where every child in every country truly could read for comprehension to this level in grade four. The SDG vision of a quality education for all children would have been achieved. African children who migrate with their families to Europe or the US attain this standard by grade four. Why should it not be the goal for those remaining at home?
Some argue that given the education quality challenges in many low-income countries, PIRLS would reveal close to 0 percent proficiency in reading comprehension (scores below the 400 benchmark) in many countries, and that this would be a meaningless—and politically embarrassing—exercise. But since the results are on a scale, countries would see whether they are in the 300 range, 200 range, etc. And they would have an accurate measure of their distance to a goal that all children deserve.
It is important to recognize that the alternative to accepting the OECD standards embodied in PIRLS and TIMSS—and in the words of World Bank assessment guru Marguerite Clarke, “getting on with it”—is spending time and money on consultation processes to develop alternative standards and new measurement instruments. This means global education funding diverted away from actual measurement of countries’ current performance and programs to improve it. This means yet more talk and more years—three have already passed since the SDGs were adopted—with no solid global baseline on learning.
Medium- and long-term solutions: better learning data is an investment that pays for itself
Over the medium term, it may be preferable for every region to develop and implement its own assessment, as the Latin America and Southeast Asia regions have done. In Africa, it is time to look beyond the colonial legacies that produced separate learning assessments for francophone and anglophone Africa (and left out 20 other countries) and – building on what exists -- develop a single high-quality regional assessment that serves all of Africa. There are straightforward processes for making the results of regional tests comparable to PIRLS and TIMSS results and creating a solid base for global monitoring. But this will take longer and should be carried out in parallel. The need for some data on learning for every country is urgent.
UNESCO Institute of Statistics chief Silvia Montoya is a forceful advocate that better data on learning is an investment that pays for itself. Joining PIRLS or TIMSS costs a country roughly $500,000 every four years—a total expenditure of $125,000/year for each test. Compared to the estimated $5.8 billion/year that low and middle-income countries spend on education, it is a pittance. UIS estimates that solid data on learning—gauging whether policies and programs are working, or reforms are needed—could improve education spending efficiency by 5 percent, generating $30 million/year in savings in the average country, paying for the assessments hundreds of times over.
Putting ideas into action
Three years after the SDGs were signed, there are still 100 countries with no data on how much students learn, and no consensus on how to measure learning at grades 2/3. In this context, UIS is calling for pragmatic action to fill the current measurement gaps as efficiently as possible. One proposal is to accept “plus or minus one (grade)” learning data for country reporting, if the assessment used is good quality. This means that joining PIRLS and TIMSS grade four tests could substitute for a grade 2/3 learning measure.
This strategy is sound, and the US government would do well to adopt it for measuring the effectiveness of its foreign aid. No other current option can generate equally meaningful learning data for as many countries as quickly or cheaply. The technical challenge of working with very low-income, low-capacity environments cannot be dismissed. But most developing countries have in fact built their own assessment capacity through the process of working with international experts on regional or international assessments.
After three years, Silvia Montoya is right: the world’s children deserve pragmatic action now.