Are AP exams getting easier?

The College Board is revising the scoring process for a gradually increasing portion of its principal revenue source, the Advanced Placement Program, and has offered two reasons for the change.

Jul 26, 2024

Three decades ago, the College Board “recentered” the SAT. Now it’s “recalibrating” Advanced Placement. Though both adjustments in these enormously influential testing programs can be justified by psychometricians, both are also probable examples of what the late Senator Daniel P. Moynihan famously termed “defining deviancy down.”

Citing Durkheim, Moynihan was referring mostly to crime that was rising across much of the country when he wrote in 1993, but his seminal essay addressed education, too. He observed that America was growing accustomed to low achievement and failing schools—this just ten years after A Nation at Risk—as educators rationalized and justified shoddy performance rather than resolving to rectify it. Sometimes they excused faltering scores by blaming parents, home situations, and poverty. Rarely did they mention the extra funding that often followed weak achievement. (“There is good money to be made out of bad schools,” Moynihan noted.) Whatever the rationale, the key point was acceptance of mediocrity, not resolution to alter the situation.

One year later, the College Board set about to deal with SAT scores, which had been slipping since the mid-1960’s. Here, too, many explanations were given for the slippage, some of them based on actual evidence, such as the fact that many more students from more diverse backgrounds were taking the test. Yet all those explanations also radiated acceptance of the depressing status quo.

Yes, one can certainly contend that the College Board and ETS, which administers, scores, and analyzes the tests, are psychometric organizations charged with accurate measurement of what is, not zealous ed-reformers or wishful thinkers about what should be. But it didn’t smell quite right when they said all they were doing was making sure that the center of the score distribution returned to 500 (on the 200–800 scale) rather than the 424 it had fallen to. The reason the center had slipped was that overall student performance on the SAT had been declining for thirty years.

Here’s how Michael Winerip wrote about recentering in the New York Times in June 1994 in a piece titled “S.A.T. Increases the Average Score, by Fiat”:

The S.A.T. score of the average American high school student will soon be going up 100 points. However, that doesn’t mean that anyone is getting smarter.... The bottom score will still be 200 and the top 800, but it will be easier for everyone to get higher scores.
A 430 score on the verbal section of the S.A.T. will suddenly become a 510 under the new scoring method. A 730 verbal score will become an 800.
College Board officials know they are inviting potshots on this one. They know they are going to be accused of instantly turning a generation of Roger Marises into Babe Ruths.

He was surely right about potshots. Newsweek’s editorial about the SAT recentering was headlined “Merchants of Mediocrity.”

Fast forward to 2024, and the College Board is revising the scoring process for a gradually increasing portion of its principal revenue source, the Advanced Placement Program.

I’ve long been an ardent fan of AP, dating back to the days when it enabled me to skip my freshman year of college by obtaining credit for college-level work done in high school. (Thousands of others did likewise.) With Andrew Scanlan, I wrote a book lauding the AP program. Its “gold star” designation coming from many places, including my Fordham colleagues, makes it the gold standard for demonstrating academic excellence. It’s the best thing going for high school students who are capable of high-powered learning and acceleration. (IB is great, too, as are some “honors” and “dual credit” courses, but the latter categories have nothing akin to the uniform standards and external quality control of AP.) That’s why Andrew’s and my book is titled Learning in the Fast Lane.

Though College Board insiders and attendees at the AP program’s big annual confab have known about “recalibration” for several years, to date there’s been no public announcement or explanation for the changes. (I understand they’re working on one.) It fell to test-prep superstar John Moscatiello to break the news. Here’s a bit of his lengthy revelation:

The Advanced Placement program is undergoing a radical transformation. Over the last three years, the College Board has “recalibrated” seven of its most popular AP Exams so that approximately 500,000 more AP exams will earn a 3+ score this year than they would have without recalibration. If this process continues in other exams in the coming years (as we expect it will), approximately 1,000,000 more AP Exams every year will earn a 3+ score. The end result will be a win for AP students everywhere: millions of high school students will save millions of dollars in college credits in the coming years.

Note that he calls the change “a win” for students. That’s because high scores on AP exams do bring tangible benefits in college—and in getting into college: skipping introductory courses, getting into smaller seminar-type classes, often earning actual credit toward diplomas and thus potentially speeding up graduation, and saving some tuition dollars, not to mention wowing admissions committees with what one has accomplished during high school. Thus the more 3+ scores earned by more students, the bigger the “win.”

By contrast, Ira Stoll of The Editors sees the “recalibration”

...as part of an overall trend of confusing mediocrity with excellence, and of trying to address persistent racial and economic inequality by eliminating standardized testing and merit-based distinctions rather than by improving education and expanding opportunity. It’s less complicated to just give a student a higher grade on a test than it is to do the hard work needed to make sure the student can master the material. But at some point, when tasks that really matter are on the line—a patient on an operating table, an airplane being engineered, a presidential vote being cast in a swing state—the person doing the job needs to really know how to do it.

When asked—it’s not yet on the record—College Board leaders proffer a two-part explanation for recalibrating. First, while asserting that it’s not driven by collegiate grade inflation, they also contend that it’s unfair to kids for their AP scores to be significantly lower than their grades would be in the college courses that AP is supposed to be equivalent to. (It’s that equivalence that justifies colleges awarding credit or course-skipping in response to “qualifying” AP scores.) Why ding a student with a 2 on his AP exam when the same performance in a similar course in an actual college would net him a B+ or A? The “fairness” problem is obvious—but it’s impossible not to see inflated college grades tugging AP scores in the same direction.

Note, too, how the marketplace is changing. If that same student can be sure of obtaining college credit just by passing a “dual credit” course—which probably costs him nothing, compared with $98 to sit for an AP exam—isn’t the College Board destined to lose customers over time when credit earned via AP is both pricier and iffier than available alternatives? (It’s true, AP participation has steadily risen to date—but so has dual credit.)

The second explanation AP leaders offer for recalibrating is complex, starting with how best to set standards for what sort of performance deserves what score on an AP test and how to keep those standards consistent over time. Traditionally, AP sought consistency via “common item equating,” whereby some of the same multiple choice questions would be recycled from one year to the next, and results on those items could be analyzed in ways that yielded stable criteria for judging student work.

As the AP program has overhauled a number of its course frameworks in recent years, however, it can no longer recycle old test questions. And some of the new exams don’t even have multiple choice items, which led to the deployment of panels of educators to try to reach consensus on what level of student performance warrants what scores. But the standards generated by these panels fluctuate, sometimes quite a lot, such that the criteria applied to students weren’t consistent from year to year. This is both unfair to the students and psychometrically hard to justify—maybe even vulnerable to litigation—which launched the College Board on a quest for a stabler and more scientific process.

What they settled on—termed an “evidence-based” method—is hard to wrap one’s brain around if one isn’t a bona fide psychometrician. I’m just partway there myself (maybe you’ll fare better). Here’s how the AP Program describes it:

We use the following steps to define the knowledge and skills required to earn scores of 1, 2, 3, 4, and 5 on an AP Exam.
1. Gather data. First, we survey college and university faculty to gather data on performance of college students in comparable introductory courses. Higher ed faculty review AP Exams and provide information about exam difficulty based on a comparison to their grade-level expectations.
2. Conduct college comparability studies. In subject areas with high consistency in content across college classrooms, higher ed faculty teaching the comparable AP college course administer the AP Exam to students in their related college course. Student AP scores are correlated to their final exam and course grades.
3. Conduct standard-setting studies. During a standard-setting study, a body of data and evidence is assembled, including:
-data and evidence of AP student performance and qualifications
-higher ed faculty expectations for comparable course performance
-college student grades and academic similarities and differences between the population of students taking the subject in college and the AP population
Psychometricians utilize this assembled information to identify appropriate standards for setting AP scores that will be valid in predicting success when students are placed ahead into subsequent courses in the same discipline at a range of colleges and universities. These processes ensure that AP Exam scores achieve the “predictive validity” that has been a hallmark of the AP Program for decades. As a result of these processes, annual studies of AP student performance in college consistently find that AP students with scores of 3 or higher outperform in subsequent college coursework the comparison groups of college students who took the colleges’ own AP-equivalent course.

Because this is opaque for non-specialists, and because specialists, too, can make mistakes, any change in standard-setting and scoring methods on a high-stakes test like AP inevitably invites skepticism, especially when its practical effect seems to have been a big expansion in the number and percentage of “qualifying” scores.

Hence the storm of commentary, criticism, and curiosity that is falling on “recalibration” today—more so because everyone knows that AP is a huge revenue source for the Board, that competition is mounting, and that Advanced Placement has also been faulted on equity grounds for encouraging tens of thousands more students from all sorts of backgrounds to enter its classrooms who turn out to get 1’s and 2’s on its end-of-year exams. According to this logic, justice requires some sort of adjustment so that AP students from poor and marginalized groups and weak high schools can reap the same benefits as others who have been garnering more 3’s, 4’s, and 5’s on those end-of-year assessments. Such an adjustment implies a recalibration of scoring rather than beefing up the preparation and within-AP instruction of kids from disadvantaged circumstances.

I’m not saying the College Board is monkeying with AP scores for non-psychometric reasons. I’m saying I understand why one might suspect that they are. And until they produce a transparent explanation of why they’re making these changes, along with clear evidence of why we should have greater confidence in the new system and the resultant higher scores than in the old, we must expect allegations that they’re defining educational deviancy down.

Consider, too, what will eventually happen as colleges face hundreds of thousands more “qualifying” scores. Despite what will surely be valiant efforts by College Board lobbyists to forestall this, colleges and universities will raise the ante on what actually gets credit and become more jaundiced about what AP scores signify maximum performance in high school. The “win” enjoyed by students will gradually lose value, as it has been doing gradually since my day, especially when it comes to bona fide graduation credit. It’s no secret that colleges resist losing tuition dollars—and the more students who seek credit upon arrival, the more loath they’ll be to confer it.

One can’t exit this topic without adding that the College Board has assembled massive data on collegiate grade inflation—which parallels what we at Fordham and others have found at the high school level. American students, both secondary and postsecondary, keep getting higher grades without actually learning more. One can’t be too surprised if the Advanced Placement program—arguably our noblest effort to encourage and reward “college-level” work during high school—is forced into a similar pattern.

“Passing” an Advanced Placement test still brings benefits today, and the AP program does not deserve to have “gold standard” replaced by “fool’s gold.” But is that gold standard a solid 24 karats or more like 18? Might it be headed toward 12 karats, like the lackluster grades given by our colleges and high schools? What I’m pretty sure of is that Moynihan’s oft-cited diagnosis of what’s weakening America is gaining another 24-karat example.

Editor’s note: This was first published by Education Next.

ADVANCE

Discussion about this post