Generative AI products are popping up all around us. There’s an AI for doing your taxes and an AI for your kid’s homework. While generative AI has created significant buzz, new technologies do not immediately result in commercial success. When a new technology is developed, there is often ambiguity in how it can be commercially applied. Companies invest huge sums translating new technology into applications that customers love, with many failing in the process.
Not wanting to be left behind, public institutions like the United Nations are paying more attention. The UN Secretary General António Guterres recently stated that “AI offers the possibility of leapfrogging outdated technologies and bringing services directly to people...” To deliver on the impact outlined by Guterres, public institutions must go through a similar process of cutting through the hype and ambiguity to develop applications that enhance their missions.
How can practitioners in development and humanitarian work filter out the noise, prioritize use cases for AI, and make good bets that maximize impact for those we serve?
Start with what we already know is effective, and find ways to supercharge it with AI
One way to bring clarity to this emerging field is to look at programs that we already know work, and to identify if AI can help increase its scale, speed, or quality.
Over the last 20 years, the sector has experienced an evidence revolution, with more than 4,000 randomized impact evaluations revealing what works in international development and why. This revolution culminated with the award of the Nobel Prize in Economics to Professors Banerjee, Duflo, and Kremer for pioneering the experimental methods behind this evidence base. This has led to the emergence of “Smart Buy'' recommendations, which essentially categorize interventions based on how effective they are, per dollar invested, as demonstrated by the evidence base.
This post argues that early applications of AI in our sector should take these Smart Buys, and identify ways to “supercharge” them with AI. This is instead of trying to dream up entirely new AI based interventions, or hoping that AI remedies the ills of ineffective programs in the sector. AI can “supercharge” many programs by increasing their scale or reducing cost, but common sense suggests we should not scale up programs that are ineffective.
Unfortunately, a quick scan of the UN’s AI for Good program finds multiple agencies exploring how AI can enhance categories of interventions that global evidence already indicates are routinely ineffective. For example, several of the projects involve traditional Technical and Vocational and Education (TVET) Programs which have been generally found to be costly and ineffective at improving earnings. AI is not going to solve the fact that even if worker skills increase, labor demand in emerging economies is often weak, which is what commonly limits improvements in worker earnings. AI does not replace the fundamental mechanisms, or active ingredients that make programs work.
It is important to remember that while new AI applications can delight users and lead to virality, virality does not equate to impact in the social sector. This is contrary to the private sector, where virality and engagement is almost akin to success. For commercial firms, once a product is widely used, achieving financial return through advertisements or subscriptions is a natural next step. Achieving social impact is more challenging. For example, a viral generative AI-based job training program may enhance skills but may not uplift livelihoods if labor demand is low. Similarly, a popular children's literacy program might keep kids glued to it, but lack impact without a proven phonetics curriculum.
Fortunately, the last 20 years of research has shown us many programs that are routinely effective, and has even helped us understand why. Using AI to supercharge existing interventions that are effective was the central idea behind GiveDirectly’s approaches to utilize AI to target cash transfers during Covid-19 and other natural disasters (I previously led GiveDirectly’s early efforts to apply AI to cash programs). The organization already knew that cash transfer evidence was very robust. However, the time consuming nature of targeting and screening meant that aid would take too long to be delivered during a large-scale crisis.
In 2017, responding to Hurricane Maria's impact in Puerto Rico, GiveDirectly faced challenges in manually estimating the extent of damage across the island's million homes, resulting in slow and incomplete assessments. Fast forward five years, advancements in AI, such as algorithms developed by companies like Maxar and CrowdAI, along with Google's open-source library, enable rapid satellite based assessments of all homes affected by a natural disaster. In 2023, leveraging Google's automatic damage detection tool, GiveDirectly improved its response time by six times compared to its 2017 efforts in Puerto Rico. Beyond using AI to analyze satellite imagery, remote assessments are taking another step forward with generative AI, with early research from Stanford showing that Large Language Models like ChatGPT can accurately predict the socio-economic wellbeing of communities remotely.
Another intervention that has a strong evidence base is known as Graduation. Graduation programs have several components and have been shown to deliver durable and large impacts on household economic well being. In addition to delivering households an asset (often a large cash transfer or livestock) or social assistance (a monthly small cash transfer or a food basket), Graduation programs often include a coaching component that delivers frequent mentorship on issues such as how to plan for and invest the distributed assets.
Coaching has been shown to complement the asset transfer and social assistance, enhancing a recipient’s psychosocial wellbeing and ability to plan and utilize the assets they are provided. Coaching can however be costly–and new programs are evaluating whether instead of individualized coaching, impacts would be as sizable if delivered in groups, or remotely via phone. What if AI agents, powered by generative AI, and AI voice and translation tools could be utilized for coaching? A single human coach may be able to reduce the number of touchpoints she conducts, enabling her to reach more households.
Other sectors have comparable use cases. In the education sector, an intervention called Teaching at the Right Level (TaRL) has shown remarkable results in improving learning. In fact, during Covid-19, when students were not allowed to attend school in person, Botswana experimented with delivering TaRL-based lessons over the phone and demonstrated sizable impacts when evaluated with a randomized control trial. What if generative AI agents were used to supplement a teacher’s touch points? They could help answer student’s questions and even run through the curriculum via an AI powered interactive voice system.
The advent of new technology has often tempted public sector practitioners to kludge on the new technology to programs they may already be doing, whether or not those programs are independently effective without the new technology. The reality is that new technology often does not remedy a program that is ineffective, and hoping otherwise misunderstands both the promise of technology, and how impact is achieved in development and humanitarian programs.
New innovations will continue to surprise us, but AI does not remove the need to understand why a program works
While I have argued that investments should focus on supercharging programs that already work, generative AI will birth unexpected new products and industries. For example, not only are generative AI technologies creating new websites in seconds, they are helping firms such as this insurance company create dynamic, personalized home pages and offers for customers. Google’s Gemeni, and startups like Elicit tout the ability to read thousands of academic papers over lunch, and extract the results into a neat table for researchers, potentially making meta-analyses orders of magnitude faster. Academics may no longer need to hire scores of research assistants to extract data manually, and can instead shift them to more sophisticated work. Products that create hyper-personalized websites, or review academic papers en masse, are novel inventions we have not previously seen. This is only the tip of the iceberg– generative AI will certainly transform how we work, create new markets, products, and even interventions in international development that we have not yet imagined.
However, the most successful applications, while novel in form factor, will likely be premised on foundational human experiences that existed before the advent of AI, such as the desire to feel one’s unique needs are listened to and met (as with the example of the insurance provider creating personalized offerings). This also means new risks, as generative AI can create both customized products and personalized disinformation campaigns. Hyper-personalized web pages, offering useful products and disinformation, tap into our desire for individual recognition, and can affect us profoundly even when the information is not beneficial.
Even when applications themselves may be novel, the success of new products often hinges on familiar underlying mechanisms. This holds true for international development and humanitarian efforts. For instance, the success of the Teaching at the Right Level program discussed above relies on the value of phonetics instruction targeted at the student’s learning level (instead of basing instruction on a student’s age). Applying generative AI to offer customized delivery of these education components over the phone mirrors the novel customization seen in insurance websites.
Innovation is a good and transformative thing, but when trying to achieve social impact, it has to be coupled with a disciplining mechanism that ensures what we invest in is rigorously tested and shown to be effective. We should be open to novelty in the form factor of programs, but the need to have an underlying causal reason for why programs work will not go away.
AI holds immense potential for humanitarian and development programs. This blog post contends that to navigate the proliferation of new ideas, practitioners should assess AI investments based on their effectiveness at improving development and humanitarian outcomes (literacy, incomes, food security, disease prevalence, etc.). While embracing innovative applications, it's crucial not to compromise on the fundamental causal mechanisms that drive program effectiveness. Prioritizing programs with well-understood mechanisms and enhancing them with AI is likely to yield substantial impact for the world's most vulnerable.
CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.