How We Can Score Development Agencies on Evaluation and Learning Systems

Catherine Blampied
January 14, 2020


The capacity for donors to evaluate and learn from development activities is crucial for supporting evidence-based decision-making and ultimately, aid effectiveness. In this blog post, we set out a new approach to scoring agencies on their evaluation and learning and are keen to get feedback on the proposed approach.

We are interested in quantitatively measuring agencies’ capability in learning and evaluation, so that we can assess donors, enable comparisons, and drive up standards as part of a wider project to update and strengthen our suite of Quality of Official Development Assistance (QuODA) measures. There is no existing dataset to directly compare the quality of bilateral and multilateral development agencies’ efforts to evaluate and learn from their aid programmes, so we devised and piloted an approach to create our own.

Evaluation, learning, and links to aid effectiveness

The connection between evaluation, learning, and aid effectiveness is simple–donors that evaluate development programs, and learn from and implement the knowledge gained, should have the greatest potential to advance aid effectiveness through informed decision-making. Evaluations also support the accountability of development spending and results achieved, providing a basis for external assessments of organisational performance (Clements, 2020).

There are lots of reasons to expect that good quality evaluations are associated with better outcomes. Studies of World Bank projects, for instance, have found a significant positive association between the quality of monitoring & evaluation for a project and its outcome rating, and that projects with an impact evaluation are less likely to experience disbursement delays. However, the presence of an evaluation does not guarantee better results, as the quality of development evaluations and the uptake of their findings and lessons for aid programming and policymaking remain highly variable. Such differences mean that there is likely variation in the quality of evaluation and learning systems across development agencies.

Assessing learning and evaluation in development agencies

The inclusion of a quantitative measure of evaluation is not completely new, with earlier editions of QuODA using an indicator on “quality of evaluation policy.” This involved assessing donors’ evaluation policies to identify whether they ensured that evaluations were independent, that results were transparent, that donors learned from findings, and that the scope of evaluations was enshrined in policy. We build on this initial approach, adding a broader suite of sub-indicators and supplementing the assessment on evaluation systems with components on institutional learning. The new approach enhances objectivity and comparability since we now base our scores on both qualitative and quantitative findings of existing assessments conducted by other institutions (see more below on the sources). It increases the emphasis on actual implementation and practice rather than merely stated policy, and it reduces somewhat the labour-intensity of the process (though there is still a need for CGD analysis based on qualitative source material, so there will remain a degree of subjectivity).

Our proposal

We propose developing two new composite indicators to measure the quality of evaluation and learning systems respectively. These indicators will use a comparable set of sub-indicators to assess evaluation and learning systems across both bilateral and multilateral donors.

Framework, indicators and data sources

The indicators take the framework developed by the OECD Development Assistance Committee’s (DAC) Principles for Evaluation of Development Assistance, DAC Criteria for Evaluating Development Assistance, and the DAC Quality Standards for Development Evaluation as a reference for thinking about best practice in evaluation and learning systems. While there isn’t a clear consensus on what good evaluation practices look like, the OECD have invested significantly in providing, refining and updating their evaluation framework, which has been widely used by donors to inform evaluation systems even beyond DAC members.

To ensure comparability across scores for bilateral and multilateral agencies, we develop a methodology that draws from similar data on evaluation and learning systems that is regularly assessed by the DAC Peer Reviews (for bilateral donors) and the Multilateral Organisation Performance Assessment Network (MOPAN) reviews (for multilateral agencies). Chapter 6 of the DAC Peer Reviews provides a qualitative assessment of donors’ evaluation systems and institutional learning, using a framework that draws on the DAC Evaluation Principles. MOPAN’s method is “guided by” DAC’s guidance and numerically scores several aspects of multilaterals’ evaluation, learning and accountability systems under Key Performance Indicator 8 of the MOPAN 3.0 framework. For UN agencies, MOPAN data can be supplemented by periodic reviews conducted by the United Nations Evaluation Group (UNEG)/DAC, as available. Table 1 provides the list of proposed sub-indicators per indicator, and where the data will be sourced for both bilateral and multilateral donors.  

Table 1. Framework, indicators and sources for assessing Evaluation and Institutional Learning

Area and indicator Description DAC MOPAN
1) Evaluation      
a) Policy Evaluation policy with defined roles and responsibilities 6.2 Narrative
b) Plan & budget Dedicated evaluation plan and budget to allow consistent coverage of activities 6.2

8.1 - 4&5 8.2 - 3

c) Independence Evaluation function is independent and impartial 6.2 8.1 - 1, 3&7
d) Expertise Sufficient expertise and systems in place to ensure quality 6.2

8.1 - 2&6 8.3

2) Learning      
a) Accountability Programme management and accountability systems ensure follow-up on recommendations and learning 6.3 8.6
b) Knowledge management A knowledge management system based on results and evidence is used and there is uptake of lessons and best practices 6.3 8.7
c) Improvement The donor has implemented past recommendations / made progress in areas identified as weak in the previous assessment Annex A Narrative

A more detailed version of this table is available here.

Any set of indicators gives a partial view of donors’ evaluation and learning systems, but we chose these indicators to provide a reasonable minimum standard that can be applied universally and for which there is comparable information available from the two independent sources.

Scoring and aggregation

How do we get from the above sources to a quantitative score? For multilateral agencies, quantitative scores against each sub-indicator are already calculated by MOPAN as part of the assessment process; we take the relevant scores directly from MOPAN reporting and derive a final score by averaging across the sub-indicators for the evaluation and learning indicators, respectively. In cases where selected MOPAN micro-indicators are used to compile our sub-indicator score, we take the simple average of the micro-indicator scores we included in our sub-indicator.

For bilateral agencies, we will code the relevant sections of descriptive text from the DAC Peer Reviews for each sub-indicator, using the same underlying scoring framework developed by MOPAN—agencies receive a score on a scale of 1-4 where 4 represents “highly satisfactory” performance while a score of 1 is considered “highly unsatisfactory.”

We tested this approach to score a small selection of bilateral and multilateral donors with different characteristics. Our conclusion from this work is that the approach is feasible across DAC and MOPAN reviews undertaken at different times.


This approach has its weaknesses; in particular, translating the qualitative findings of the DAC Peer Reviews—which consider each donor in its own context—into quantitative scores that compare donors, and are consistent with the scores awarded to multilaterals by MOPAN. We have sought to mitigate this problem by choosing indicators that mostly reflect specific, objective aspects of practice, but we realise that this exercise will rely on a certain amount of subjective judgement. There will be missing data points—for example, on the “improvement indicator”: not all DAC members have undergone two or more DAC Peer Reviews (Hungary does not yet have one), and not all multilaterals have undergone a MOPAN assessment.

Timeliness will also be an issue in some cases, since each donor is assessed only once every six years or so (though cross-agency assessments of evaluation systems, like this in 2016, can help). While changes to evaluation and learning systems are likely to be slow-moving, we can supplement information between DAC Peer Reviews using data reported in the country profiles of the OECD’s Development Cooperation Reports, which also report on donor evaluation systems and could help to capture any notable changes.

Beyond the problems of data and measurement, the underlying measures also suffer from some limitations—namely, that some of the sub-indicators measure the processes and policies that donors use rather than the quality of these policies or their suitability or implementation. This is particularly true for the sub-indicators measuring policy, plan and budget, and independence in the evaluation systems indicator. For instance, the sub-indicator measuring the presence of a policy (including roles and responsibilities) cannot capture the quality of the policy and its suitability for the donor’s development management context. For instance, in cases where donors allocate aid through multiple ministries and departments, policies that apply only to single agencies may not be best suited to the system as a whole. Similarly, while we measure the presence of a plan and budget for evaluation, we cannot tell whether the budget is sufficient to fund systematic, high-quality evaluations, nor whether the evaluation plan makes strategic sense. One option for assessing the sufficiency of budgets and plans would be to look at the share of evaluation staff relative to development agencies. However, data availability on aid agency staffing is notoriously underreported. Lastly, despite assessing whether donors have an independent and impartial evaluation function—in line with DAC best practices—we are unable to capture the degree of independence/impartiality of any conclusions from the evaluations.

How would your agency score and compare? Let us know.

Despite the challenges outlined above, we think this is a sound and informative way to score the quality of donors’ evaluation and learning systems, given the available data.

Do you agree? How could we improve upon the proposed approach? For example, are we capturing the most important elements of the OECD/MOPAN frameworks or are we missing any relevant data sources? Please get in touch with any comments or ideas, or if you’d like more information.

We plan to develop this measure over the coming months as part of a wider suite of aid quality measures as part of our Quality of ODA indicators.

*Catherine Blampied is an independent consultant.

We’re grateful for thoughts and ideas on this measure from Hetty Kovach, Caitlin McKee, and Andrew Rogerson; and for very helpful feedback from John Egan and Suzanne Steensen on an earlier draft. This proposal, and any mistakes, are those of the authors.


CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.