BLOG POST

From 1,056 Studies to 49 Candidates for Tracking Learning’s Long-Run Effects

Does raising test scores in primary school lead to better economic and health outcomes in adulthood? When we launched the first pilots for the Return to Learning Initiative—seeking to answer exactly that question—we assembled a list of about 35 education experiments to test the feasibility of identifying long-term effects of early skills gains. Now that we are taking the Initiative forward, we have gone back and screened more than 1,000 studies. After applying criteria for study design, sample size, country income, cohort age, and outcomes measured, we are left with 49 eligible studies from 23 countries (Figure 1).

Figure 1. Characteristics of the 49 studies across 23 countries

The Initiative’s policy question remains the same. Random assignment to an effective education programme provides a source of variation in early skills. The question is whether those better skills translate into differences in adult wages, health, and living standards a decade or more later.

No single study will provide a complete answer on its own. The original trials were designed to detect short-run effects on test scores, not long-run effects on adult outcomes, and were conducted in a wide range of countries. After more than 15 years in most cases, sample sizes are likely to shrink too. Our strategy has therefore always been to pool estimates across multiple studies. Pooling evidence across studies can reduce uncertainty around the overall estimate even when individual studies are imprecise. To do that, we need a well-defined pool of candidates.

In a new CGD Note, Identifying Studies for the Return to Learning Initiative: An Updated Candidate Studies Dataset, we document the search and screening process used to build that pool. The note describes how we expanded and improved the original candidate list, the criteria used to identify eligible studies, and the reasons particular studies were included or excluded. The aim is to provide a transparent basis for deciding which studies to prioritise for long-run follow-up (and as a resource for other researchers).

As a snapshot of what we have found, Figure 2 shows the 20 most promising studies from which we will select our next wave of feasibility pilots. The horizontal x-axis is cohort age in June 2027; the vertical y-axis is the initial effect size on a literacy or language outcome, in standard deviations. Each marker is a separate experimental arm, and four studies appear more than once because they tested more than one design. Where a study has already been followed up (between 3 and 24 months post-intervention), the hanging marker shows the persistent treatment effect. Colour encodes dimensions of tracking feasibility, as described in the figure note.

How did we get to this prioritised set of studies? The short version is that we’re looking for studies with at least 0.1 standard deviation effect size (to avoid weak instruments) and in which the original participants will be aged 23 or above in June 2027 (so we can conceivably see labour market impacts). In this set, there is a cluster of studies above age 30 but with relatively small short-run effects, another cluster of only four markers with large short-term effects, and the majority of studies fall in the 23-27 years-old age band, with short-run effects of between 0.10 and 0.35 standard deviations. For full details (there are many more!) of how we got to this set, read the note.

Figure 2. The most promising 20 candidates

The most promising 20 candidates

Notes: Each marker is one treatment arm. Colour categories are as follows. Active (dark teal): five intervention arms currently tracked. Child panel (yellow) and Repeat cross-section (light grey): the study's original sampling design, indicating what identifying data are held on participants and may be used to trace them. Pathway concern (light teal): studies in which we are less confident that the treatment effect may pass only through a cognitive skills increase—for example, where school or household financial resources also changed. Infeasible (dark grey): two studies piloted in 2024 without success (explained here). Five studies have persistence measures; hanging markers show the persistent effect at the later follow-up point.

You can also jump straight to the Candidate Studies Dataset, which provides all the variables associated with each step in the screening and prioritisation process. Individual sheets within the dataset move with increasing detail from the original long-list down to the most promising candidates.

Lastly, we are always on the lookout for studies that may have been overlooked. If you think there are omissions, send one of us an email and let us know!

DISCLAIMER & PERMISSIONS

CGD's publications reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions. You may use and disseminate CGD's publications under these conditions.


Thumbnail image by: GPE/Luis Tato