My guest is Justin Snider, who teaches undergraduate writing at Columbia University and writes for The Hechinger Report, a nonprofit, nonpartisan education-news outlet based at Teachers College, Columbia University.
By Justin Snider
Tough talk on teacher accountability is all the rage this summer. Trouble is, we don't know how to handle the perverse incentives that arise the moment we place undue weight on easily manipulated exams. But that hasn't stopped a slew of education leaders from weighing in on the need to hold teachers' feet to the fire.
In the past few weeks, D.C. Schools Chancellor Michelle Rhee made headlines for firing 241 teachers, Secretary of Education Arne Duncan gave a major speech on education reform and Race to the Top finalists were announced for Round Two, many of which agreed to overhaul their state's teacher evaluation and tenure system.
Even President Barack Obama took up the theme of education, weighing in on his administration's reform agenda for three-quarters of an hour at the National Urban League Centennial Conference – although the president who relied on teacher-union support in his election treaded carefully.
"I am 110 percent behind our teachers," Obama said. "But all I'm asking in return – as a president, as a parent, and as a citizen – is some measure of accountability. So even as we applaud teachers for their hard work, we've got to make sure we're seeing results in the classroom."
The president dismissed educators' fears that their evaluations would be based on standardized test scores alone.
"Everybody thinks that's unfair. It is unfair," Obama said. "But that's not what Race to the Top is about. What Race to the Top says is, there's nothing wrong with testing – we just need better tests...."
His remarks reflect a newfound perception that recent progress in New York schools has been mostly a mirage, and that the public trusted in tests that were flawed.
The president is right. Yes, we "just" need better tests. But creating better tests is very hard and very expensive. And in a system as vast and complex as ours, it'll be tempting to continue using tests that can be graded quickly and that don't look very different from the ones we now use.
But without a radically different approach to standardized testing in this country, we are unlikely to get different results.
Some people seem to believe, however, that we've got everything figured out already – that we can precisely measure each teacher's performance, and that our standardized tests are not just good but infallible.
In this brave new age of accountability, student scores on standardized tests are being used by some districts to decide, in whole or in part, the following: which teachers are first laid off; which teachers are fired; which teachers are rated effective or ineffective; which teachers receive bonuses, and how big those bonuses are; which principals receive bonuses, and how big those bonuses are; which students are required to repeat a year; and which students graduate from high school.
These scores also have been at the center of debates on mayoral control of schools, especially in New York City and Washington, D.C. These cities' mayors, Michael Bloomberg and Adrian Fenty, respectively, have asked voters to elect and reelect them based on how they run the schools in their cities and how their students perform.
The educational decisions now made in part on standardized test scores are neither few nor inconsequential. This is hardly about who gets a sticker for a job well done, or who gets a slap on the wrist for a student's substandard performance.
It is worth remembering, then, Campbell's Law: "the more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
In other words, when important decisions are based on a handful of numbers – like standardized test scores – the numbers soon become unreliable. The incentives to distort the numbers prove irresistible to just about everyone, from mayors seeking reelection and principals hoping for bonuses to teachers wanting to keep their jobs and students longing to graduate.
New York City provides a case in point. The public has heard for years from Mayor Bloomberg and Schools Chancellor Joel Klein that the city's schools are improving.
Bloomberg and Klein have regularly cited better student test scores as evidence of improvement – that is, higher percentages of students demonstrating "proficiency" on state exams.
But it was recently revealed that these test scores actually show something quite different: not better performances by students, but lower standards and easier-to-pass tests. The same press that dutifully reported student improvement changed its tune.
The New York Daily News titled its piece, "Big, Fat F in Schools," while The Wall Street Journal's headline read " ‘Hard Truth' on Education."
But what was most surprising about the coverage was that the news surprised anyone.
"You mean students haven't really gotten a lot smarter in the last two years?" some wondered.
No, they haven't. But they haven't gotten a lot dumber either. Their performance is, in fact, largely unchanged.
What changed is simply the state's definition of "proficient." The gains were merely an illusion, sleight of hand on the part of policymakers and politicians.
Mayor Bloomberg said his interpretation was that "the test is harder and more comprehensive," but, in fact, the test isn't harder or more comprehensive; it's just that the minimum passing score was increased.
The real story isn't that years of gains were erased, as The Wall Street Journal said. It's that that there was no academic progress in the first place – just a lower bar for determining who was declared proficient.
The skeptics among us – those who have questioned such results for months, if not years – felt vindicated at last. But it's a shame that vindication was so long in coming.
What can we learn from the New York City example? I can think of at least four lessons.
We shouldn't get excited or depressed about short-term changes in test scores. Often they don't mean much. Long-term trends are more reliable – and therefore more meaningful. Scores on the National Assessment of Educational Progress (NAEP) going back one, two and three decades are trustworthy. An individual state's scores from last year probably aren't.
Politicians are prone to slicing and dicing scores to their advantage. This shouldn't surprise us, but neither should it silence us.
Year-to-year changes in scores are unimpressive? Look at the decade-long trend. Long-term trends show no growth? Look at the change over the past two years.
This is the game in which Michelle Rhee engaged last month when the percentage of elementary students in Washington, D.C. deemed proficient in reading and math unexpectedly dropped this year. Rhee touted instead the gains since 2007-08.
When numbers look too good to be true, they're too good to be true. This is no less true of schooling than baseball and cycling. Seventy-three home runs in a single season? Hmm. An epic comeback in Stage 17 of the 2006 Tour de France? Hmm. Those results strained credulity because they weren't clean – and people suspected so from the start but had to wait years for confirmation.
We've seen similar things in schools. In New York City, 97 percent of elementary and middle schools earned As or Bs on the district report card last year, compared to 79 percent in 2008 and just 61 percent in 2007.
Are most schools getting dramatically better in just one or two years? Probably not. As President Obama said, "Change is hard....We won't see results overnight." We should always be wary of overnight results.
Randi Weingarten, president of the American Federation of Teachers, said in response to President Obama's speech, "There are no silver-bullet solutions for our schools."
There's only hard work, day after day and year after year, with the possibility of gradual – real and substantive – improvement. Instant, immense improvement is as elusive as Halley's Comet. It is therefore also suspect.
We remain very far from an accountability system impervious to perverse incentives. Therefore, we must be very careful in how we use student test scores in any decisions, especially those about personnel.
A new Mathematica study released by the U.S. Department of Education says that "in a typical performance measurement system, more than 1 in 4 teachers who are truly average in performance will be erroneously identified" as below average, with a similar percentage of below-average teachers not showing up as underperformers.
This should scare not just classroom teachers but anyone who believes our current data systems are infallible. They are not.
Importantly, the study also notes that more than 90 percent of the variation in student learning is due to factors beyond a teacher's control. We ignore this fact at our own peril. It does not mean that teachers don't matter, or that teachers cannot or should not be held accountable.
But it does mean that we must proceed cautiously and ask tough questions of those who believe we've finally found the holy grail to measure teacher performance.
Follow my blog all day, every day by bookmarking washingtonpost.com/answersheet. And for admissions advice, college news and links to campus papers, please check out our Higher Education page at washingtonpost.com/higher-ed Bookmark it!