...“Why would anyone work here with so much at stake?” one teacher related to me recently, a teacher who’s ranking went from “Effective” to “Needs Improvement” because her VAM was developed utilizing a small sample size due to a small stability group because she works in a high-poverty school with tremendous student mobility..
The formula for evaluating teachers is complex. Not only is it intricate-it can be unfair to teachers in some locations and to those teachers that teach ESE students.
Previously in our district, we had an evaluation system that was a joke, it was horrendous. And I discussed it frequently because it was so bad; it was almost as bad as a pass/fail civil service evaluation. Everybody is great, everybody wins-you know the type... But it was terrible, it did nobody any good, and needed to be scrapped. Eventually the district put together a much better system that was much more objective.
Recently the state mandated that student test data become a component in the teachers’ evaluation—an idea I strongly support if it is done fairly and correctly. Under this scenario, not only does the teacher get evaluated based upon the test scores of the students the individual teacher teaches-but this data also has huge consequences and can significantly impact a teachers’ overall rating.
In some cases, the addition of the test score data (or VAM data), can take a “highly effective” or “effective” teacher all the way down to “needs improvement” or “unsatisfactory!” (Under state law now, two consecutive teacher evaluations that are “unsatisfactory” can lead to removal of a teacher from the profession—so the stakes could not be higher)
So how can this happen? I was wondering that too so I had a long conversation with the district’s director of evaluation services so that I could understand how the process works. Here it is in a
A teacher in Escambia County has two parts to his/her evaluation; each part is weighted at 50%.
--Part 1 is generated at the individual school-site level and is very comprehensive, built on a multi-part evaluation modeled after the Charlotte Danielson framework, assigning point value(s) to various teaching attributes- pedagogy, instructional delivery, classroom management, after school participation, continuing professional development, and other school-based observations. The Danielson model is very comprehensive, very thorough, and very good.
--Part 2 is generated via test scores. According to the evaluation services department, the state looks at individual students and has developed an algorithm that accounts for historical achievement, absenteeism, previous test scores, and other socioeconomic factors unique to each student. The state’s algorithm projects what academic progress a student should attain in a year, and matches actual student outcomes to this projection to assign a “score” to the teacher. These student achievement scores used for the 50% addition to the teachers’ locally completed evaluations are built using only students that the teacher had in their classrooms the previous year. This group of students is referred to as the teachers’ “stability group.” The state determines this because they mandate that thelocal school district match students to teachers at three points during a year—October, February, and again in May. The district then publishes the list, and notifies teachers by “flagging” students on the list who are in each individual teacher’s stability group. Pretty complex, right?
But here’s where it gets even more complex.
In previous years, if a teacher did not have a large stability group, the teacher was assigned a school-wide average. The utilization of a school-wide average ameliorated the negative effect of small sample sizes for some ESE teachers and for teachers in some schools where there is high student “mobility” (inner-city schools).
This year, however, only the students that the individual teacher taught were used, even if in some cases that resulted in a sample size as small as 12, 10 or even as few as 5 students. A small sample size can skew results badly, and that is why most surveys with extremely small sample sizes are considered invalid.
And here’s where it gets worse.
Suburban schools with good student attendance and decent sized stability groups (18-22) will not be impacted the same way by the way the stability group scores are added in to the overall evaluation; one or two students having a bad test day, or not reaching the target, won’t sink a teacher out in suburbia.
Not so in the inner-city schools.
We now use the stability group of students for teachers in inner-city schools, which all but guarantees small sample sizes for calculating the VAM score. Add to this the tendency for such students in such environments to be frequently absent, tardy, and the recipients of very little support from home, and you suddenly have the potential for a handful of students who are struggling academically to exert huge impacts on the evaluations of the teachers who are trying to help them. If just one (1) student from a small sample size group tanks on test day—that can badly skew the results for the whole group. Imagine if all the students do badly and the sample size is small?
And that’s how an ESE teacher with a dangerously small “stability group”—even from a high performing elementary school --can go from “highly effective” to “unsatisfactory”.
It’s how a teacher from an inner-city school ( with constant student churn) who is rated “highly effective” under the Danielson framework alone can be bumped downward all the way to “needs improvement” when the VAM scores are added in.
So the fact that very few teachers, percentage wise, have contacted the district about their stability group as they’ve been notified, is somewhat surprising. I think they will pay closer attention to these notifications next year.
I just hope that they don’t disproportionately concentrate teaching attention on the students in their individually flagged stability groups to the exclusion of the other students in their classroom; I’d like to think no teacher would do this, but could the way we are evaluating teachers drive them to do this? That would be a concern.
Or what about this one: Encouraging some students not to come to school on test day, based upon some notion of how such students might do if they came and took the test. I hate to think this would ever happen. But I know the potential is there when the stakes are high....
Unfortunately, this method of evaluating teachers is yet another stressor on the teachers that already work in challenging schools, and it needs to be fixed.
First and foremost---these teachers need to be compensated, paid an additional stipend to work in this environment, given all the dysfunction, discipline issues, social issues, and now these uneven testing machinations ----if we don’t pay them more we are going to see even higher levels of teacher churn out of these schools than we already see. It will be bad.
“Why would anyone work here with so much at stake?” one teacher related to me recently, a teacher who’s ranking went from “Effective” to “needs improvement” based upon a small sample size as described above. This teacher is devastated and she’s actively looking to escape from this school…..
If this anomaly in the evaluations is not addressed, I don’t know how I could disagree with such a teacher in the inner city; I don’t know how I could disagree with her assessment of the situation—why would she stay??.
Their careers are at stake, so it is understandable that they would want to work where the deck is not so completely stacked against them!