Big, impressive study, questionable policy conclusions

A study of the impact of teachers on student success has been drawing lots of attention, including a big story in the New York Times, praise from columnist Nicholas Kristof and analysis in the blogosphere.

On the one hand, the paper by economists Raj Chetty, John Friedman and Jonah Rockoff offers new evidence that good teaching has long-lasting and far-reaching effects. This suggests that the recruitment, preparation and support of teachers should be a high priority for the nation.

But the economists also use their findings to call for rating teachers on the basis of “value-added” models, which use complex formulas to measure teachers’ impact on student test scores – and for firing teachers who don’t measure up. Annie Lowrey writes in the Times:

The authors argue that school districts should use value-added measures in evaluations, and to remove the lowest performers, despite the disruption and uncertainty involved.

“The message is to fire people sooner rather than later,” Professor Friedman said.

Professor Chetty acknowledged, “Of course there are going to be mistakes — teachers who get fired who do not deserve to get fired.” But he said that using value-added scores would lead to fewer mistakes, not more.”

This is a little surprising, given that, in the study itself, they caution against that sort of policy conclusion.

“Overall, our study shows that great teachers create great value and that test score impacts are helpful in identifying such teachers. However, more work is needed to determine the best way to use VA for policy,” they write in the executive summary.

They add that two important questions must be resolved before value-added models are used to evaluate teachers. One is whether attaching high stakes to test scores will skew results so much that it undermines the accuracy of the models. The other has to do with the economic cost of firing teachers, sometimes in error – the very “mistakes” that Chetty said would be trivial.

Then the New York Times calls and they throw caution to the wind.

The study reportedly breaks new ground Continue reading

Reasons for caution on performance-based evaluation of teachers

If only we could give assigned reading to state legislators. At the very least, Indiana lawmakers should read these brief articles as they consider Senate Bill 1, which mandates performance-based pay for educators and makes it easier to fire teachers who get bad evaluations.

Start with this column by Rutgers education professor Bruce Baker. He explains the drawbacks of evaluating teachers on the basis of student test-score improvements, and why “getting a good rating is a statistical crap shoot” with value-added formulas for measuring teacher effectiveness.

“We may be able to estimate a statistical model that suggests that teacher effects vary widely across the education system – that teachers matter,” Baker writes. “But we would be hard-pressed to use that model to identify with any degree of certainty which individual teachers are good teachers and which are bad.”

Michael Winerip, in his “On Education” series in the New York Times, shows what happens when the dice come up snake-eyes. He writes about Stacey Isaacson, by all accounts a dedicated, hard-working English and social-studies teacher at a selective public middle school in Manhattan. Almost all her students scored proficient on state tests; her supervisors and students say she’s a wonderful teacher.

But according to the complex formula used by the New York Department of Education to measure student learning gains, Isaacson is one of the city’s worst teachers. Continue reading

National Education Policy Center vs. Los Angeles Times

For all the education research and policy types out there, there’s a pretty good food fight taking place between the National Education Policy Center at the University of Colorado and the Los Angeles Times.

It concerns value-added modeling of teacher effectiveness – in particular, the Times project last summer that used test-score data and a value-added formula to give an effectiveness rating to 6,000 elementary-school teachers in the Los Angeles Unified School Districts.

The Times project, which was highly controversial and sent the LA teachers’ union into a frenzy, was based on a value-added analysis by Richard Buddin, a researcher with Rand Corp. The paper put LA teachers into one of five “effectiveness” categories based on their students’ test-score improvement and published a database of the teacher ratings on its website.

UC researchers Derek Briggs and Ben Domingue re-analyzed what they believed to be the same data. First they conducted a “sensitivity analysis,” which produced results that cast doubt on whether Buddin’s approach measured teacher effectiveness, not other factors that influence student achievement.

Next, they ran the data using a slightly different, but arguably valid, value-added model. The result was that about half the LA teachers ended up with a different effectiveness rating, calling into question the validity of the entire exercise.

“This study makes it clear that the LA Times and its research team have done a disservice to the teachers, students, and parents of Los Angeles,” said NEPC director Kevin Welner. “The Times owes its community a better accounting for its decision to publish the names and rankings of individual teachers when it knew or should have known that those rankings were based on a questionable analysis.”

The Times reported on the NEPC study but claimed that it “confirms the broad conclusions of a Times analysis of teacher effectiveness in the Los Angeles Unified School District” because it showed that the effectiveness of teachers varies widely and can be reasonably estimated. Continue reading

More questions about basing teacher evaluations on test scores

Following up on last week’s post, here are some articles and studies about the pros and cons of using test-score data to measure the effectiveness of teachers.

The topic is timely, because Indiana Gov. Mitch Daniels and Superintendent of Public Instruction Tony Bennett want to make such data a major part of teacher evaluations. Evaluations that rely on student test scores, they say, should be used “to inform decisions about hiring, firing, professional development, compensation, placement, transfers and reductions in force.”

This is a national issue, and much is being written about assessing teacher effectiveness with “value-added” measures, which employ sophisticated statistical techniques to rate teachers at improving the test scores of students. (Indiana will apparently use a “growth model,” a less complex measure than value-added, to gauge teacher effectiveness).

Some examples:

An article in District Administration magazine provides an overview. It connects value-added analysis with issues such as merit pay and teacher retention and examines how the approach has been used in New York, Houston and Winston-Salem, N.C.

A New York Times story reveals problems with a teacher ranking system in New York City, where the school district is caught in a battle between the news media and the teachers’ union over whether value-added rankings for individual teachers should be made public. Continue reading

‘Value-added’ evaluation of teachers: A flawed model?

A recent report from the Economic Policy Institute raises questions about the current push to closely tie decisions about teacher evaluation, discipline and pay to the gains that students make in standardized test scores – and, secondarily, about the value of making teacher effectiveness scores public.

The report, titled Problems with the Use of Student Test Scores to Evaluate Teachers, takes aim at “value-added” models, which rely on measures of test-score improvement from year to year and make allowances for the students’ socio-economic status and other factors.

The Indiana Department of Education’s “growth model” for measuring student and teacher performance appears to be sort of a poor cousin to a value-added model. It compares a student’s one-year growth in test scores with that of other students who started at the same place; but it doesn’t adjust for non-classroom factors that might influence how well kids perform.

The authors of the EPI report are a crew of heavy hitters in the world of education policy and research. They include Linda Darling-Hammond, a well-known education researcher at Stanford; Diane Ravitch, who was an assistant secretary of education in the first Bush Administration; and the institute’s Richard Rothstein, a former national education columnist with the New York Times and the author of several books on student achievement.

Citing studies by the National Research Council, Educational Testing Service and others, they argue that value-added modeling produces results that are too unstable and inconsistent for high-stakes decisions about whether teachers will be fired or promoted. Teachers who are effective in one year, according to value-added growth data, may appear to be ineffective the next year, and vice versa. Continue reading