Taking Complexity and Uncertainty Seriously: Increasing Evidence Use in the Conflict Stabilization Sector

I recently finished a detail as an Advancing Evidence-in-Policy fellow with the State Department’s Office of Foreign Assistance, where I led and supported the Global Fragility Act (GFA) interagency Monitoring, Evaluation, and Learning (MEL) group. I discussed that experience in an earlier blog post, and in this follow-up piece, I suggest ideas to increase evidence use in the conflict prevention and stabilization sector.

My suggestions are based on two core principles. First, MEL practitioners—and knowledge producers more generally—should adopt an approach that is attuned to a realistic model of the stages of the policy process and be ready to seize windows of opportunity to apply evidence. This may require navigating competing demands, recognizing tradeoffs, and making hard choices. Second, MEL practitioners should embrace methods more appropriate for situations of deep uncertainty and complexity. 

The vision of GFA strategic MEL

The GFA and associated Strategy to Prevent Conflict and Promote Stability (SPCPS) underscored the importance of data and evidence use. The SPCPS explicitly invoked the Foundations for Evidence-based Policymaking Act and emphasized two key points:

  • the primary purpose of MEL is ongoing learning, decision-making, and course corrections—which often takes a backseat to compliance and accountability.
  • there should be two levels of MEL: one focused on the overall Strategy (the SPCPS), and the other on the priority country plans.

The explicit reference to MEL of the overarching Strategy led agencies to focus on “strategic MEL,” which aims to capture collective effects at a level higher than individual activities implemented by specific agencies or bureaus. This is a notable departure from more traditional efforts to track progress and measure impact.

This ambitious vision faces immense hurdles—conceptual, psychological, bureaucratic, and logistical. MEL is most often oriented towards measuring the effects of individual foreign aid projects, and an agreed-upon template for executing this type of higher-level, decision-focused MEL doesn’t exist in the US foreign aid bureaucracy.

However, the aspirations of strategic MEL are similar to ideas developed within the United Nations Development Program’s MEL Sandbox over the past two years, which uses the phrase “systems-level” or “complexity-aware” MEL. The MEL Sandbox is built on a prodigious literature in global development about how to learn and adapt in complex systems when intended changes are at the systems level. The basic idea is the same and reflects a desire to design a learning and adaptation approach at the level of the overall policy, strategy, or system.

This well-intended vision has been challenging to implement. Relevant US agencies and departments are finalizing GFA MEL plans, at both the country and the global levels. But the process has been slow. Congress first appropriated funding for the GFA in FY 2021 and the country 10-year plans were finalized in March 2023. The MEL plans, once finalized, will need to play catch up.

Prioritize evidence use at the critical juncture of the strategy and intervention design stage

The emphasis on designing innovative, strategic MEL systems to collect and analyze new data is laudable but it came at a cost. A critical window of opportunity to prioritize the application of existing evidence to the design of the 10-year country strategies and associated interventions may have been missed.

While the causes of evidence adoption aren’t fully understood, evidence suggests that the stage of the policy process is an intervening variable, and that the initial design stage is a critical window of opportunity. A core finding from cognitive psychology and behavioral economics—that perceptions of “sunk costs” thwart rational decision-making—also suggests the significance of the initial design stage. Furthermore, a rigorous study of evidence adoption by city governments found that organizational inertia is the main barrier, reinforcing the theory that policy or strategy design is a critical juncture and should be fully exploited.

The MEL field typically prioritizes the implementation stage, namely developing systems to collect new data on the results of interventions during implementation. However, in the face of limited resources, it may be more critical to front-load evidence use, prioritize the initial stage of strategy and intervention design, and ensure that the selected approach is both logically sound and supported by the best available empirical evidence.

Use causal effects evidence synthesis products

MEL practitioners in the conflict sector—or other knowledge producers—could embrace the use of causal evidence synthesis products to identify promising intervention types. Evidence synthesis products aggregate results from multiple studies and are more reliable than a single research study. They are more prevalent in the health and education sectors and prioritize quantitative impact evaluations, especially randomized controlled trial evidence, but increasingly such products are available in the peacebuilding sector. For example, a 2020 synthesis of peacebuilding evidence by the International Initiative for Impact Evaluation (3ie) identified 29 completed and 5 ongoing systematic reviews, benefitting from 195 completed and 47 ongoing impact evaluations, a 150 percent increase from its review in 2015.  

MEL practitioners can also use evidence synthesis products that aggregate qualitative research. While the synthesis of qualitative research is inherently more subjective, the US Holocaust Memorial Museum has designed a valuable evidence platform about the effects of atrocity prevention interventions that incorporates qualitative research.  

The reluctance of many donors and implementers in the conflict sector to use causal effects evidence is systematic, not accidental. A typology developed by the political scientist Jeffrey Friedman in Power Without Knowledge elucidates the epistemological disagreement.

Friedman argues that there are four distinct types of knowledge relevant to policymaking:

  • Type 1: Knowledge of which social problems are real and significant.
  • Type 2: Knowledge of what is causing the significant problems.
  • Type 3: Knowledge of which technocratic actions can efficaciously solve, mitigate, or prevent the significant problems.
  • Type 4: Knowledge of the costs of efficacious solutions, including both intended and unintended costs.

Conflict practitioners have typically prioritized type 2 evidence about the causes of the problem, rather than type 3 evidence about the effects of different actions. The GFA embedded this bias: both the law and Strategy called for a locally informed analysis of the causes of violent conflict, described as a baseline analysis, to design the country 10-year plans. But none of the guidance documents—including an unclassified cable to posts about designing the 10-year plans—suggested, let alone mandated, a review of existing intervention effectiveness evidence, through platforms such as the 3ie and Holocaust Memorial Museum websites.

Evidence about the causes of conflict is, of course, essential. But, as Friedman argues, policymakers still have a responsibility to apply other evidence types. For example, if the type 2 evidence about the causes of conflict concluded that unemployment is a driver of conflict, it is still useful to review the type 3 evidence about the effects of interventions that target the reduction of unemployment as an intended outcome, and perhaps conflict reduction as a longer-term outcome.

In recent years, as this report from the United States Institute of Peace suggests, conflict practitioners appear more willing to prioritize type 3 effectiveness evidence but still resist “global” impact evaluation evidence because they argue it is not necessarily generalizable to other contexts.

I would offer two responses. First, the generalizability challenge is relevant to all social science evidence, including qualitative research, as all scientific evidence is generated in specific contexts but must be applied to new contexts. Second, as Mary Anne Bates and Rachel Glennerster have argued, this is a false choice and policymakers should ideally synthesize local evidence with global impact evidence to design programs. They offer a generalizability framework to help address this challenge.

The Haiti example

One example from the GFA illustrates how the application of causal effects evidence could have improved strategy and intervention design. The 10-year plan for Haiti suggests the US will support “hot-spot” or place-based policing. According to, USAID obligated $3.2 million between FY 2020 and FY 2023 to a five-year citizen security program, which instead uses a community policing strategy. Specific components of community policing programs vary, but according to several evidence synthesis products, such as this 2022 State Department-funded synthesis by the National Academy of Sciences, community policing is not effective at reducing crime and violence. Moreover, the conditions under which community policing is more likely to succeed—such as a steady presence of researchers and staff on the ground—are unlikely to hold in an extremely volatile and high-risk context like Haiti. 

In March 2023, the State Department’s Bureau of International Narcotics and Law Enforcement summarized and disseminated these findings in an unclassified cable, as my former colleague, Dr. Jessica Lieberman, noted in her remarks at a CGD event last year (37:00).

To be fair, Haiti was an incredibly challenging pilot case, and the community policing program represents a small share of peace and security funding in the country. Far more funding is allocated to training and equipping the Haitian National Police, but details about whether this involves hot-spot policing are not publicly available. The situation in Haiti is fluid and the ongoing crisis may prompt a re-assessment of the current strategy and existing programs.

From theories of change to structured forecasting of possible scenarios

During the strategy and program design stage, MEL practitioners typically develop theories of change and their graphical counterpart, logic models, which clarify the relationship between program activities and intended outcomes, both short and long term.

Although theories of change are useful, they tend to be linear. The challenge is that recurring conflict and violence in fragile contexts is a “wicked” problem characterized by extreme complexity and uncertainty. In addition, policies like the GFA that simultaneously deploy multiple foreign policy tools are complex, multicomponent interventions. As a result, implementation may result in system effects, such as unintended consequences and interaction effects. Therefore, a linear theory of change or logic model may be insufficient as a tool to anticipate results.

The analytic tools that may prove more useful for complex and uncertain contexts exist in the intelligence community, which uses structured forecasting and scenario planning methods. In the aftermath of intelligence failures about Iraq’s weapons of mass destruction, the US intelligence community embarked on a concerted effort to improve its forecasting methods through the creation of a research division—the Intelligence Advanced Research Projects Activity. One of the most important outputs of this program was cognitive debiasing training to improve forecasting accuracy among intelligence analysts. In addition, the Defense Department often uses the structured forecasting method of wargaming for strategic decision-making. USAID’s Policy Framework, which was updated last year, recommends investing in broader foresight capabilities, such as scenario-planning, red-teaming, and policy-gaming.

Evidence during implementation and the role of uncertainty

The application of existing evidence during the strategy design stage is a starting point, not the solution. Because of complexity, all knowledge claims are uncertain, and unintended consequences and interaction effects are likely.

The GFA and other conflict stabilization initiatives require systematic data collection and analysis during implementation to facilitate course corrections and policy pivots.  This is a tough challenge, and I am not sure anyone has developed the perfect solution. However some scholars and analysts have suggested ideas worthy of pursuit.

Monitor leading indicators, not only lagging indicators

Several peacebuilding scholars have suggested reforming donor MEL systems that prioritize short-term, quantitative metrics and instead focusing on continuously monitoring the feedback from a diverse array of local stakeholders. Susanna Campbell and Séverine Autessere, in Global Governance and Local Peace and Peaceland: Conflict Resolution and the Everyday Politics of International Intervention, respectively, argue that typical MEL systems create perverse incentives that constrain the ability of country staff to engage, learn from, and adapt to the conflict-affected contexts they are supposed to be helping.

Campbell, along with co-authors Dan Honig and Sarah Rose, suggested a GFA MEL approach in this 2020 Center for Global Development note—one that would monitor a few key strategic outcome indicators and prioritize continuous analysis and adaptation to meaningful local feedback about the activities of external peacebuilders. This type of “social listening” can increasingly be achieved with faster methods such as mobile phone surveys and sentiment analysis of social media data. The key is to capture the perspectives of a set of local stakeholders who meet predefined criteria (such as diversity and influence), whose perceptions are essential to monitor if external peacebuilding efforts are to succeed.

This approach shares common ground with the concept of using leading indicators rather than lagging indicators to forecast future economic changes and set policies—as advised by a number of economists, businesses, and investors. Leading indicators are measurable data that may correspond with a future movement or shift in the economy. Examples include customer satisfaction data, jobless claims, the consumer confidence index, and the purchasing managers’ index. In contrast, lagging indicators are measurable data based on past performance, such as sales, revenues, and profits. Local feedback from communities affected by external conflict stabilization policies is a form of customer satisfaction data and can be classified as a leading indicator.

Learning about disconfirming evidence

A second idea is to prioritize testing a conflict stabilization strategy’s theory of change through a targeted analysis of confirming and disconfirming evidence. The Luminate Foundation, which works in the governance sector, has moved away from an emphasis on performance indicators and metrics, instead using this type of intentional learning approach. Given what we know from cognitive psychology about confirmation bias, this focus on disconfirming evidence may be a helpful debiasing strategy that can serve policymakers’ needs.

This approach aligns with some recent decision-making research that foregrounds uncertainty.  Nassim Nicholas Taleb’s work emphasizes the significance of random, black swan events and the errors of naïve empiricism for decision-making. Because the most consequential events cannot be predicted, it is more important to hone alternative decision-making skills, heuristics, and methods. To design more resilient, anti-fragile systems, he recommends a heuristic of via negativa—a prioritization of negative knowledge—or evidence that disconfirms a hypothesis. In Taleb’s words:

We know a lot more about what is wrong than what is right, or phrased according to the fragile/robust classification, negative knowledge (what is wrong, what does not work) is more robust to error than positive knowledge (what is right, what works). So knowledge grows by subtraction much more than by addition—given what we know today might turn out to be wrong but what we know to be wrong cannot turn out to be right, at least not easily….Rephrasing it again: since one small observation can disprove a statement, while millions can hardly confirm it, disconfirmation is more rigorous than confirmation.

When designing MEL approaches to assess conflict stabilization effects, embracing this proposed asymmetry between positive and negative evidence may be appropriate.

Skin in the game

A third idea—also proposed by Taleb—is to advocate for more decision-maker “skin in the game,” which requires that decision-makers bear some consequences of their decisions. Taleb persuasively argues that the transfer of downside risk from powerful decision-makers to the powerless has created more fragile systems.

To design a more effective evidence system in fragile contexts, Taleb’s theory would argue for more direct consequences for decision-makers. While this approach complements the local feedback analysis suggested earlier—it is difficult to imagine government officials assuming the level of downside risk (such as job loss or salary reduction) that would be sufficient to optimize decisions. Nonetheless, it seems logical that greater proximity between decision-makers and affected communities could improve decisions. The participants at this Data for Peace panel argued that bringing decision-makers closer to the evidence could help bridge the gap between evidence and action.


As agencies continue to finalize the GFA MEL plans, the end products should be considered serious efforts by dedicated professionals to grapple with some of the most vexing issues in evidence-based policymaking. Congress handed the administration a tough challenge. As researchers and policymakers increasingly confront the realities of complexity and uncertainty—not only in the conflict stabilization sector in faraway contexts but also closer to home—the GFA MEL experience will provide useful lessons for the evidence-based policy community. All stakeholders—evidence producers, policymakers, and communities impacted by policy decisions—will want to stay tuned.


CGD blog posts reflect the views of the authors, drawing on prior research and experience in their areas of expertise. CGD is a nonpartisan, independent organization and does not take institutional positions.

Image credit for social media/web: Flickr United States Mission Geneva CC BY-ND 2.0 DEED