The results of the three randomized controlled trials (RCTs) of medical adult male circumcision have all agreed.  As recently reviewed by the Cochrane Collaboration, male circumcision reduces the odds that a man will become HIV infected by somewhere between 38 % to 66 % over a period of 24 months.  Furthermore, the incidence of “adverse events” was deemed low.  For an overview of the last five years of findings on male circumcision, see UNAIDS web site on the topic here and  here.

Wow!  A vaccine this efficacious would be cause for celebration.

But medical researchers distinguish between the efficacy and effectiveness of an intervention.  An intervention, they point out, can be wonderfully “efficacious” under the controlled conditions of a randomized controlled trial (RCT), but might fail miserably in the typical setting for which it is designed.

Last week I had the pleasure of participating in a workshop that gathered many of those involved in planning  or potentially evaluating the rollout of medical male circumcision in the countries of Botswana, Kenya, South Africa, Swaziland, Zambia and Zimbabwe.  Sponsored by the Bill and Melinda Gates Foundation, the workshop in Johannesburg was partly to expose all of these policymakers and researchers to the latest efficacy information - and to a new device for male circumcision called the ShangRing – and partly to consider how the planned rollout of male circumcision in each country should be evaluated.

One view, held by some of the researchers at the meeting, is that medical efficacy has already been evaluated by rigorous randomized design so there would be little benefit in further rigorous evaluation during scale up.  In particular there is no need, they felt, to use HIV incidence as the endpoint of any future evaluation activity.  All that is necessary is routine monitoring and operations research to determine how to deliver circumcision as efficiently as possible.  They came to this conclusion because (1) they think that the efficacy is now virtually a biological certainty and (2) they think that any problems with effectiveness could be picked up by simply counting the number of circumcisions performed and the frequency of adverse events, without checking HIV incidence.

The contrary view is that the range of possible reductions in vulnerability suggested by the three existing trials of 38 % - 66 % leaves room for substantial concern.  In actual practice, perhaps the benefits would be even smaller than a 38 % reduction in risk.  For example, maybe during the rollout only those whose sexual practices are already safe would choose the intervention.  If this is the case, then counting successful circumcisions without noting HIV infections would overestimate the national effectiveness of the program – and leave policy makers puzzled by the continuing momentum of the epidemic.

In a presentation on the application of the concept of statistical power to evaluating the effectiveness of interventions, Sergio Bautista and I proposed that the design of an evaluation of a program rollout can and should differ from the evaluation of the medical efficacy of the same intervention in several dimensions, each of which would inform an important policy question:

External validity

While a medical efficacy RCT, such as those done for male circumcision, is intended only to achieve internal validity, the evaluation of a large-scale rollout needs to establish internal validity and external validity. Internal validity is necessary to be sure that the outcomes can be attributed to the intervention. Getting information on the context and conditions under which the program is rolled out is necessary for judging the external validity of the findings, so that results can be applied to estimate the program’s effectiveness on the whole country.

Cost-effectiveness threshold

Since a medical efficacy RCT should and usually does ignore costs, it need only have the statistical power to reject the hypothesis that the intervention is no better than competing interventions.  However, given the costs of achieving and sustaining high coverage of adult male circumcision in African countries, policy makers need to know that its efficacy is large enough to render it cost-effective in relation to other interventions.  Application of the male circumcision planning model created by Lori Bollinger and colleagues of the Futures Institute calculates that through the year 2015 at 60 % effectiveness MC will cost $1,560 dollars per HIV infection averted, while at 20 % effectiveness MC will cost $4,917 per HIV infection averted.  The former figure is attractive, the latter not so much.  So instead of just rejecting a null hypothesis of no effect, policy makers might be interested in rejecting the hypothesis that MC is 20% effective or less.  This is a more difficult hurdle for MC to clear, but could potentially be answered with the large samples that are available in a full-scale rollout.

Standard of care is a more defensible counterfactual in a rollout

Because standard IRB ethical standards required that the MC efficacy trials provide other HIV prevention interventions to those who did not receive the MC, the RCTs may have underestimated the effectiveness that male circumcision would have in a real setting, where these other HIV prevention interventions are less accessible.  But an effectiveness evaluation of a rollout would typically compare the effect in those communities that first receive the intervention first to that in the communities that receive it later.  Until they later receive the rolled out MC, those in the comparison group will be getting no more than is typically available in the country.  So the measured impact of MC is likely to be larger in this setting than in the RCTs.

Determinants of effectiveness

Policymakers will want to know how they can maximize the effectiveness of MC.  In the course of a full-scale rollout, there will be natural variation in various factors that influence both the supply and demand side of MC.  A selected few of these factors can be singled out for experimental variation and the rest can be studied with non-experimental methods.  Lessons on the determinants of effectiveness will help those managing the MC program, but would rarely result from efficacy evaluation studies alone.

Secondary outcomes and their determinants

Among the most important potential “spillover” effects of MC are (1) compensating risk behavior that might offset the benefits of MC; (2) infection rates among the female partners of the circumcised; (3) the effect of massive numbers of male circumcisions on the availability of and access to other types of health care; (4) the variation in unit cost of MC as health facilities first become more efficient (due to increasing scale and learning by doing) and then less so (due to decreasing returns); (5) the reproductive rate of HIV in the whole community as MC coverage increases.  The efficacy trials found cause for alarm on the first of these indicators in one of the three studies, but were unable to consider the other four issues.  Effectiveness trials can hope to examine all five -  and with much greater external validity than could be achieved in a small RCT.

With all the benefits of rigorous evaluation of full-scale rollout, it would be unconscionable not to undertake such studies. Years ago, the feasibility of such studies might have been questioned on many grounds: financial, methodological, political, and ethical. But despite difficulties, these excuses can no longer be sustained. Bill Savedoff reminds me that public agencies and foundations are beginning to provide substantial sums of money to rigorous impact evaluations, including through the recently created International Initiative for Impact Evaluation (3ie).  Researchers have demonstrated the feasibility of evaluating large-scale programs, most dramatically with rollouts of national conditional cash transfer programs. Even the political and ethical dimensions of these evaluations have been confronted and worked out by researchers and policymakers, especially those who are native to the countries in question, who recognize that their need to know the answers to the above questions is sufficiently important to society as to justify the effort of explaining the studies to the public and protecting a rigorous evaluation design.