Tuesday, March 18, 2008

Audit shows CATS Assessment Challenges Persist

Martin Cothran of the Family Foundation issued a challenge to the reliability of including writing portfolios as part of the CATS accountability process. In doing so, he points out a persistent problem.

His claim focuses on the soft underbelly of the CATS assessment, an accountability process that relies on reasonable minds to agree on what constitutes good writing. It's certainly not a perfect science.

In a press release posted on his blog, vere loqui, Cothran reports that "closely held" 2005-2006 audit results show Kentucky's frustrations with inter-rater reliability when it comes to assessing writing portfolios.

“The audit seems to suggest,” said Cothran, “that about 75 percent of the portfolios ranked as “distinguished” in 2005-2006 were graded too high, and almost half of portfolios rated “proficient” were given higher grades than they deserved...

The Family Foundation supports the use of portfolios for instructional rather than assessment purposes. They also support Senate Bill 1 which proposes to test writing mechanics without asking students to write.

Cothran's suggestion that the audit reports are "closely held" seems to lack justification. KSN&C asked for and received the current 2006-2007 report from KDE without objection or inquiry within an hour this morning. By law, KDE must audit portfolios annually to monitor the degree of inter-rater reliability and the report is public record.

There is broad agreement that writing is an important skill. But, accurately assessing such a complex skill is challenging at best. It is unlikely that such an assessment would ever produce reliability numbers that rival, say, mathematics exams. It's the nature of the beast.

And, as Cothran accurately points out, most of the difficulty lies in scoring the best portfolios.

KDE spokeswoman Lisa Gross told KSN&C,

As students become more refined writers, it is sometimes more difficult for two scorers to agree completely on their ability levels. The complexity of their work can cause a score to be "on the edge," meaning that another scorer could consider the work to be of slightly higher or slightly lower quality...What the overall audit results tell us is that we need to enhance scorer training and ensure that everyone is comfortable with the analytic model.

The results, and human nature, might also imply that close calls get bumped up. The only way to even approach reliability is through carefully described process.

Since the 2005-2006 audit Cothran refers to, the scoring methodology for writing changed from holistic to analytic scoring. KDE says this was done in order to provide more feedback on student performance at the local level. Each piece within a portfolio was assigned scores at the subdomain levels of content, structure, and conventions. The subdomain scores were summed to provide a raw score for each of the 10,000 portfolios audited.

It should be noted that the new audit did not affect school scores, but provided research data. Audited schools received reports regarding their scoring, but school scores were not changed.

The scoring methodology included 100% double-blind scoring where any discrepant scores were overridden by an independent third read. Quality control measures included "read-behinds" on all readers at a rate of 10%. Scoring supervisors conducted read-behinds of the read-behinds, also at the rate of 10%.

KDE reports the overall percentage of inter-rater reliability with a tolerance of + or - 1; what they call exact or adjacent scoring. The results for 2006-2007:



No comments: