Should schools group students by ability?
One of the most contentious debates in American education focuses on whether to group students into classrooms using some measure of prior achievement.
Editor’s note: This week, Advance features a guest article by Scott J. Peters, a Senior Research Scientist in the Center for School and Student Progress at NWEA, and Jonathan Plucker, the Julian C. Stanley endowed professor of talent development at Johns Hopkins University. The other elements of the newsletter were compiled by Brandon Wright, Advance’s regular writer and editor.
One of the most contentious debates in American education focuses on whether to group students into classrooms using some measure of prior achievement. Whole class grouping by prior achievement or content mastery is most common for math instruction, less common for English and reading, and least common for other subjects; it appears to be more common in middle schools (48 percent) than in elementary schools (24 percent). The back and forth about grouping has become especially heated in recent years, with several high-profile states (Virginia, California) and districts (San Francisco, Seattle, New York) either eliminating grouping or considering adopting the practice for some subjects, grades, or performance levels.
The research on instructional grouping, however, is more positive than grouping critics would have us believe. Complicating the careful study of this practice is the many forms that grouping can take and the roles of grade level, subject matter, and teacher ability to differentiate instruction in the effectiveness of any grouping strategy or intervention. On balance, though, we find that large-scale studies and meta-analyses of grouping show evidence of positive effects for high-performing students and little downside (and often upside) for lower-performing students.
In concept, any kind of grouping by readiness or prior content mastery is a response to American students of any given age being academically diverse in what they know and can do. In some recent work, we and our colleagues found that the typical American classroom includes students that span three to seven grade levels of achievement mastery. This translates to a fifth-grade classroom that includes students who have yet to master second-grade math content, as well as those who have already mastered eighth-grade math content.
A separate 2021 study by Blaine Pedersen et al., using international TIMSS data, showed that the typical American fourth-grade classroom includes student achieving at all four international benchmarks in math. Although that may be hard to picture, note that there are only four international benchmarks, meaning that the entire possible range of student performance is present in the typical fourth-grade classroom! This was true before the pandemic, and recent research suggests that Covid-19 has made American students even more diverse in terms of grade-level content mastery.
Some schools respond to this diversity in academic readiness by grouping students by performance level for instruction. This was the topic of a recent National Bureau of Economic Research working paper by Kate Antonovics, Sandra E. Black, Julie Berry Cullen, and Akiva Yonah Meiselman, who looked at the effects of “tracking” on Texas students from 2010 to 2019. It’s important to note that this wasn’t a study of tracking per se. Rather, it evaluated the relationship between math achievement and being exposed to classrooms that were more or less academically diverse than the school as a whole. Under a scenario where students are assigned to classes at random, every student is taught alongside students from the entire achievement distribution present in the school. What skills a student has mastered has no bearing on class placement. We make this point because there is no way to know exactly what mechanism led to some students being in more- or less-homogenous classrooms, though the authors did control for things like the number of classrooms available in a given school (i.e., if there’s only one classroom for a grade, then of course the classroom will include the full range of academic diversity).
The researchers came to several important conclusions, in addition to confirming that this form of instructional grouping is more common in middle than elementary school. They also report that grouping by prior test scores is much more common than any kind of grouping by race/ethnicity or socioeconomic status. Although, of course, prior achievement is correlated with demographics, this finding means that most of the observed disparities in racial, ethnic, or socioeconomic representation across classes is due to achievement and not demographics. And finally, the extent of instructional grouping is correlated with degree of academic diversity within a school grade. Schools with a wider range of academic needs within a given grade tended to group more. In fact, this remained one of the only significant predictors of schools implementing instructional grouping. Average achievement, whether the school was a magnet, the political lean of the county, and the demographic makeup of the school were all non-significant predictors in the final model.
So schools with more academic diversity tend to do more instructional grouping. But is this good? The authors measured how exposure to more- or less-grouped classrooms influenced math achievement as measured by where students fell in comparison to their peers across the state. In other words, how does being exposed to more- or less-grouped classrooms influence student academic achievement, as compared to the rest of the state, and how do those effects differ for students who start out lower achieving versus higher achieving?
The findings are somewhat complicated because of rigorous multiple estimation methods and because effects were examined for students achieving at or below the 25th percentile, as well as students at or above the 75th percentile. In the end, the analyses suggest, greater exposure to instructional grouping is associated with no change in predicted math achievement for low-achieving students, but is positively predictive of upward percentile mobility for high-achieving students. In other words, when lower-achieving kids are taught in classrooms with a narrower range of the achievement distribution than is present in the entire schools, it does them no harm. They do as well as if they taught in classrooms that didn’t involve any form of ability grouping. But when higher-achieving kids are grouped, they do better.
How much better? Not a ton. A 1 standard deviation increase in grouping exposure for kids who were in the top 25 percent of math achievers in third grade predicted a 1.3 percentile-point increase in eighth-grade test scores. Instead of scoring at the 80th percentile among eighth graders in Texas, the student would score just over the 81st percentile. And although there were no similar positive effects for lower-achieving students, the students did end up in smaller classes. The authors hypothesized that this might have helped mitigate any hidden negative effects of grouping on lower-achieving students, but there’s no way to know from these data.
This study does not support or contradict a school’s decision to engage in more or less grouping by prior achievement. Instead, it simply shows that this kind of grouping does not appear to harm or hold back lower-achieving students, while it does seem to help higher-achieving students a bit. An implication of such findings is that higher- and lower-achieving students would become even more different in their math achievement, similar to what was seen in the recent NAEP 2022 data.
The study also raises two issues that are common in this type of research. First, flexible grouping and tracking are not the same thing. “Grouping” refers to placing students in flexible groups where membership depends on interest, subject matter, and recent performance and can change throughout the year (a mix of between- and within-class grouping); “tracking” refers to placing students in long-term, full-class, essentially permanent groups. Tracking also has a negative connotation, so tends to make people (ourselves included) nervous when it appears—the word, that is—in the literature, regardless of the grouping arrangements being studied. Again, the data alone don’t tell us or the researchers what mechanisms led some students to be taught in more- or less-academically-diverse classrooms than others.
Second, what happens within the groups is of paramount importance, but most studies—the present one is no exception—do not look at the curriculum, instructional strategies, or quality of differentiation within the groupings of interest. Given the well-documented importance of these and other factors on student learning, it’s hard to examine the effects of any type of “grouping” without knowing what actually happened within each classroom. As noted earlier, we understand why these facets of education aren’t included in the present study (it’s really difficult!), but it’s definitely a limiting factor in much of the grouping literature.
Our interpretation of these grouping/tracking studies (and other research on reforms that are primarily organizational in nature) is that they average out the role of curriculum and teachers, which are probably the most important factors. In other words, if we accept that high-quality instruction and curriculum is normally distributed within each group or track (which it probably isn’t), then those factors would balance each other out in most studies. If that is the case, then the results of these studies tell us something important—the organizational reform of grouping students by readiness for instruction appears to have small benefits for the brightest students and no negative impact on students at the lowest readiness levels—but they don’t tell us what the organizational strategies would do in the presence of, for example, pre-differentiated, prescriptive curricula with teachers skilled in differentiation (see here for an interesting, nuanced take).
Put differently, this latest study may show us the lower-bound of what a form of instructional grouping can do—that it didn’t hold any students back or restrict their learning—but it also doesn’t show us the full scope of possible benefits. This is by no means a criticism of the present study, but rather a guide and call for future research that informs the types of grouping that can best facilitate learning for all students.
Scott J. Peters, Ph.D. is a Senior Research Scientist in the Center for School and Student Progress at NWEA.
Jonathan Plucker, Ph.D. is the Julian C. Stanley Endowed Professor of Talent Development at Johns Hopkins University.
QUOTE OF NOTE
“There is a lot we still don’t know about how schools can effectively maintain high expectations and support the academic needs of students across all achievement bands while redoubling their commitment to and investment in our most vulnerable students.”
—“Why it’s time to talk about equity and excellence,” Jessica Levin, Ed Post, September 28, 2022
THREE RECENT STUDIES TO STUDY
“The Two Sides of Cognitive Masking: A Three-Level Bayesian Meta-Analysis on Twice-Exceptionality,” by Furkan Atmaca and Mustafa Baloğlu, Gifted Child Quarterly, Volume 66, Issue 4, 2022
“We compared the Wechsler scores of individuals with twice-exceptionality (2e) and giftedness using a three-level Bayesian meta-analysis. Ninety-five effect sizes were calculated from 15 studies (n = 2,106). Results show that individuals with 2e who have learning disabilities perform lower than individuals with giftedness in Full-Scale Intelligence Quotient (FSIQ; g = −0.62), working memory (g = −0.79), and processing speed (g = −0.75). Individuals with 2e who have attention-deficit/hyperactivity disorder have a distinct profile in which only processing speed differs from individuals with giftedness (g = −0.55). Results suggest that using a single Intelligence Quotient (IQ) score in the identification process will be misleading. Moreover, IQ may mask the strengths or weaknesses of individuals with 2e.”
“Acceptability of a Preventative Coping and Connectedness Curriculum for High School Students Entering Accelerated Curricula,” by Elizabeth Shaunessy-Dedrick, Shannon M. Suldo, Lindsey O’Brennan, Robert Dedrick, Janise Parker, John Ferron, and Letty DiLeo, Journal for the Education of the Gifted, Volume 45, Issue 3, September 2022
“Students report experiencing elevated levels of academic stress while in Advanced Placement (AP) and International Baccalaureate Diploma (IBD) classes. In response, we developed a classwide, preventative coping and connectedness curriculum, which consists of 12 50-minute modules for 9th-grade students enrolled in accelerated coursework. In this pilot study, we implemented the curriculum in 2 schools and sought user feedback... Overall, all stakeholders—including students, parents, and educators—deemed the curriculum highly acceptable. Teachers, administrators, and parents rated the content and lessons as highly acceptable for addressing students’ academic stressors and development of necessary coping and strategies...”
“An analysis of the impact of school closings on gifted services: Recommendations for meeting gifted students’ needs in a post-COVID-19 world,” by Charlton Wolfgang and Daniel Snyderman, Gifted Education International, Volume 38, Issue 1, 2022
“Gifted support services were directly impacted by the COVID-19 shutdown in Spring 2020. This qualitative research study consisting of parents (n = 110) and gifted support teachers (n = 53) explored the impact on gifted students’ services and instruction. Utilizing surveys, open-ended response questions, and in-depth interviews, teachers and parents shared their thoughts and perceptions about challenge, enrichment, and students’ social-emotional health throughout the shutdown... Utilizing the data collected, a model was created to help teachers, parents, and school districts provide challenge, enrichment, and acceleration, as well as address social-emotional concerns in a virtual environment.”
WRITING WORTH READING
“NYC changes controversial high school admissions process,” New York Post, Cayla Bamberger, September 29, 2022
“Why it’s time to talk about equity and excellence,” Ed Post, Jessica Levin, September 28, 2022
“Remembering Betty Meckstroth, trailblazer in gifted education, kind and adventurous appreciator of life,” Berkleyside, Karen and Anne Meckstroth, September 28, 2022
“Indiana teen is only student in the world to receive perfect AP Calculus exam score this past spring,” Yahoo! News, William Yuk, September 28, 2022
“$8M [Javits] grant to support underserved children [in gifted education and STEM fields], family engagement in education,” University of Hawai’i News, September 28, 2022
“Who gets to be brilliant?” K–12 Dive, Autumn A. Arnett, September 21, 2022
“Japan is hard on gifted children,” Japan Today, Michael Hoffman, September 14, 2022
“Tensions high as NYC soon starts middle and high school admissions season,” Chalkbeat New York, Reema Amin and Amy Zimmer, September 14, 2022
“Court throws out claim that selective NYC high schools discriminate against Asian American students in admissions,” NBC News, Zachary Schermele, September 9, 2022