Contrary to popular opinion, there is little reliable evidence showing strong links between student achievement and teachers’ formal qualifications. On the other hand, numerous studies document the relationship between teachers’ classroom performance and student learning outcomes. Getting high-level and consistent performance from teachers in the classroom is central to improving delivery of education services. Yet the performance and effectiveness of teachers varies widely across and within education systems—and even within schools.

Why are some teachers more effective than others? To answer this question, it is necessary to observe what actually happens inside the classroom, the “black box” where the “magic” of education takes place: the transformation of schooling resources into learning outcomes. Classroom observations are a key way to capture a holistic view of teacher practices inside the classroom. Teachers’ effort—for example, how much class time to devote to teaching—largely depends on a teacher’s discretion and judgement rather than formal qualifications.

Barbara Bruns and colleagues at the World Bank have been researching teachers’ classroom practice in Latin American and Caribbean countries for the past six years using a standardized instrument called the Stallings “classroom snapshot” (technically referred to as the Stanford Research Institute Classroom Observation System), with over 18,000 different teacher observations in over 4,000 schools. The Stallings results provide a sobering picture of average teacher performance in the region, but uncover consistent patterns of highly variable teacher performance across and within schools.

While the Stallings instrument has advantages for teacher observations in relatively large samples (either for system diagnosis or program impact evaluation), it is focused on a single domain of teacher practice—classroom management. As such, it is more limited than the “gold standard” teacher observation instrument called CLASS (Classroom Assessment Scoring System), which is extensively used in the US and is beginning to be used in developing countries. The strength of the CLASS instrument is that it measures teacher performance in three key domains: classroom management, instructional support for students, and emotional support for students. This multi-dimensional measure of teacher quality has been validated as predictive of student learning results in several US studies, most notably the Measures of Effective Teaching (MET) study.

The following table summarizes some key differences between the two tools:



High level of observer expertise and training required to produce consistent observations

Observers achieve high inter-rater reliability with one-week training

Material is proprietary

Open-source instrument

Costly to use at any scale

Low costs suitable for large-scale samples


Despite their different rubrics and scales, Stallings and CLASS capture some overlapping dimensions of teacher practice. But how consistent are they? Bruns, along with Soledad De Gregorio and Sandy Taut, found an opportunity to research this question for a new RISE working paper.

In the domain that both instruments measure—classroom management—how well are Stalling variables correlated with those of CLASS? If comparable, the simpler and less costly Stallings instrument could be a useful tool in developing country settings for gaining a basic measure of teachers’ differential performance.

The team applied both instruments to a set of 102 high-quality videos of seventh grade math teachers in Chile and produced the first direct evidence on the comparability of the two instruments.

Data Source: Bruns, B., De Gregario, S., & Taut, S., 2016, Measures of Effective Teaching in Developing Countries Note: * significant with p<0.1, ** significant with p<0.05, *** significant with p<0.01


The classroom organization domain of CLASS is consistently correlated with important Stallings measures in the directions we would expect: CLASS scores on classroom organization are positively correlated with Stallings measure for time spent on instruction and negatively correlated with time spent on classroom management. CLASS scores on classroom organization are also positively correlated with the Stallings measures of teachers’ ability to keep students engaged.

In their area of overlap, the two instruments provide consistent estimates of teachers’ effectiveness. Both instruments can clearly help school systems identify schools and teachers with outstanding and problematic performance in maximizing instructional time and engaging students. For countries where schools struggle with these issues, Stallings is a more practical and cost-effective instrument. For creating a baseline measure of system-wide performance and tracking the impact of new policies and programs, Stallings can also be useful. However, the lack of correlation between the Stallings measures and the other two CLASS domains (and even across domain scores within the CLASS instrument) suggests that for any high-stakes assessment of teachers’ individual performance—for example, for determining promotions—the Stallings instrument is insufficient and the multi-dimensional CLASS measures are needed.

The central focus of the RISE research program is understanding why learning outcomes in many developing countries remain low despite increases in access and spending. Bruns and colleagues believe that the answers lie in observing how teachers perform in the classroom and building evidence on how policies and programs actually change teacher practice. Robust measurement of classroom dynamics is essential. This working paper provides some confirmation that the Stallings instrument can be a useful tool.

The complete RISE working paper (Measure of Effective Teaching in Developing Countries) by Bruns et al. can be found on the RISE website.

This is one of a series of blog posts from “RISE"—the large-scale education systems research programme supported by the UK’s Department for International Development (DFID) and Australia’s Department of Foreign Affairs and Trade (DFAT). Experts from the Center for Global Development lead RISE’s research team.