Generative AI Evaluation Playbook: Policy Brief

Han Sheng Chia; Tim Ohlenburg

and

June 08, 2026

From math tutors to farmer advisory tools, generative AI (GenAI) is rapidly expanding in low- and middle-income countries. While some evidence shows development gains, other findings point to harm. Evaluations can assess and help address these risks, but there is little agreement on what they should include. Tech teams prioritize product performance, often overlooking impact, while impact evaluators focus on outcomes but may neglect the underlying technology.

The Center for Global Development convened 30 experts across computer science, economics, gender studies, and development to bridge this gap. Experts aligned on a standard set of evaluation practices captured in the Generative AI Evaluation Playbook, released in April 2026. The practices are organized into four levels that each address a key question:

Level 1 – Model evaluation: Does the AI system perform as intended?
Level 2 – Product evaluation: Does the overall product engage and retain users?
Level 3 – User evaluation: Does the product impact users’ thoughts, feelings, and behavior towards the development outcome?
Level 4 – Impact evaluation: Does the product improve development outcomes?

An accompanying policy brief summarizes the playbook’s core concepts—what to evaluate at each level, why it matters, who should do it, and how. It also outlines Minimum Viable Evaluations (MVEs), which are the most basic set of practices organizations should pursue at each level.

Graphic showing arrows between the four levels of evaluation

Source: Generative AI Evaluation Playbook

How to use the Playbook

• Builders of GenAI tools: The primary audience includes development practitioners, engineers, product managers, data scientists, behavioral researchers, and impact evaluators. They can use the playbook to plan evaluations and revisit it to check progress.

• Funders and policymakers: While they may not use it daily, they can reference the playbook to define high-quality evaluations and encourage grantee adherence.

The Playbook is a living document, updated as new practices emerge. Builders of GenAI tools are encouraged to contribute amendments and case studies. Beyond detailed implementation guidance for each level, the Playbook discusses cross-cutting themes such as risk mitigation.

Read the full policy brief here.

Topics

AI for Global Development

CITATION

Sheng Chia, Han, and Tim Ohlenburg. 2026. Generative AI Evaluation Playbook: Policy Brief. Center for Global Development.

DISCLAIMER & PERMISSIONS

CGD's publications reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions. You may use and disseminate CGD's publications under these conditions.

Thumbnail image by: C. De Bode/CGIAR

Financing Medicines for Primary Health Care in Low- and Middle-Income Countries

BRIEF

Generative AI Evaluation Playbook: Policy Brief

Recommended

Blog Post

Continuous Improvement Through Evaluation

How to use the Playbook

Topics

CITATION

DISCLAIMER & PERMISSIONS

Events

Financing Medicines for Primary Health Care in Low- and Middle-Income Countries

BRIEF

Generative AI Evaluation Playbook: Policy Brief

Recommended

Blog Post

Continuous Improvement Through Evaluation

How to use the Playbook

Topics

CITATION

DISCLAIMER & PERMISSIONS

More Reading

Blog Post

Cutting Through the Noise: My AI Priors

Blog Post

Why AI Is Eating the Policy World

Blog Post

Continuous Improvement Through Evaluation

Blog Post

The Developing World’s Jobs Crisis Was Here Before AI

Sign up to get weekly development updates: