clear all set more off global data "H:\CGD work\Health\UNAIDS data" *** Import UNAIDS data from AidsInfo database available at http://www.aidsinfoonline.org/. *** Variables requested: Annual number of AIDS deaths and Estimated new HIV infections (All ages), *** point estimates, high and low estimates from 2005 - 2011 import excel "$data\DeathsAndIncidence_2012Report.xls", sheet("Sheet1") firstrow *** Clean dataset, remove blank variable "I" and empty observations: drop I drop if TimePeriod == "" destring TimePeriod, gen(year) //Year variable gen region = substr(AreaID,1,3) // Region variable "AFR" is Africa encode Indicator, gen(indicator) // UNAIDS indicator 1 = deaths 2 = incidence encode Subgroup, gen(esttype) // 1= pt estimate 2 = low estimate 3 = high estimate gen imputeddata = real(DataValue) gen imputed = imputeddata == . * Replace range estimates from UNAIDS with midpoint of range: replace imputeddata = 50 if DataValue == "<100" replace imputeddata = 150 if DataValue == "<200" replace imputeddata = 350 if DataValue == "<500" replace imputeddata = 750 if DataValue == "<1000" * Mark African countries that UNAIDS considers North Africa for exclusion from aggregation: gen notssa = . replace notssa = 1 if inlist(AreaName, "Tunisia", "Egypt", "Morocco", "Algeria", "Libya", "Sudan", /// "South Sudan") replace notssa = 0 if notssa == . & region == "AFR" * AIDS deaths for each year: total imputeddata if region == "AFR" & indicator == 1 & !notssa & esttype == 1 , over(year) mat T = e(b) forval i = 1/7 { local y = 2004 + `i' sca deaths_est_`y' = T[1,`i'] } * Incidence for each year: total imputeddata if region == "AFR" & indicator == 2 & !notssa & esttype == 1 , over(year) mat T = e(b) forval i = 1/7 { local y = 2004 + `i' sca incid_est_`y' = T[1,`i'] } *** Create dataset from aggregate estimates to plot the AIDS epidemic graph: clear set obs 7 gen deaths_est = . gen incid_est = . gen year = 2004 + _n local varlist deaths_est incid_est foreach v of local varlist { forval i = 1/7 { local y = 2004 + `i' replace `v' = `v'_`y' in `i' } } foreach v of local varlist { // rescale to millions gen `v'_mil = `v'/1000000 } *** Create variable for additional persons living with HIV, which is the difference between incidence and deaths: gen addlplwh = incid_est - deaths_est gen addlplwh_mil = addlplwh/1000000 //rescale to millions *** Create graph: graph twoway (connected deaths_est_mil year, color(black) ) /// (connected incid_est_mil year, color(forest_green) ) /// (connected addlplwh_mil year, color(maroon) ) , /// ylabel(, angle(horizontal)) ytitle("Annual numbers of people" "(millions)", margin(small) ) /// xtitle("") xlabel(2005 (1) 2011) legend(off) plotregion(lstyle(none)) /// text(1.95 2009.5 "Number of new infections", color(forest_green) ) /// text(1.55 2006 "Number of AIDS deaths", color(black) ) /// text(0.4 2006.2 "Number of additional persons" "living with HIV", color(maroon) ) /// title("The number of persons living with HIV/AIDS" "continues to increase ", margin(medium)) /// note("Source: Author's calculations based on incidence and deaths data from UNAIDS AidsInfo." /// "UNAIDS estimates are aggregated for Sub-Saharan Africa Region." /// "Number of additional persons is the difference between incidence and deaths.", margin(medium) )