An application of a many-faceted Rasch model to writing test analysisby Yuji Nakamura (Tokyo Keizai University) |
"Finding a rating method that is practical, reliable, and statistically well-founded is a problem for many writing teachers." |
Task: | Students wrote a composition on a single topic they chose within a 40 minute period in class. | |
Raters: | 4 raters (Rater A, Rater D, Rater M, Rater Y) | |
Pairs: | 6 pairs (AD, AM, AY, DM, DY, MY) | |
Rater Coverage: | Each student composition was evaluation by two raters. | |
Note 1: | Discourse = Logicality, Fluency = Ease of reading (based on word length and accuracy), Content = Originality, Overall = A holistic, general impression | |
Note 2: | Sentence length was not measured per se in this study, though it might have affected raters' judgments. | |
Rating scale: | 4-point scale (1=poor, 2, 3, 4=good) | |
Subjects: | 32 Japanese university undergraduate students |
[ p. 171 ]
[ p. 172 ]
Data | Quality Control | Step Calibrations | ||||||
Category Score | Counts Used | % | Cum. % | Average Means | Exp. Means | Outfit Mean Square | Measure | S.E. |
1 | 12 | 3% | 3% | -1.42 | -1.21 | .8 | ||
2 | 94 | 21% | 24% | -.40 | -.33 | .9 | -2.85 | .31 |
3 | 232 | 52% | 75% | 1.37 | 1.33 | 1.0 | -.49 | .14 |
4 | 110 | 25% | 100% | 3.80 | 3.82 | 1.1 | 3.34 | .15 |
[ p. 173 ]
[ p. 174 ]
Sub-Question 3: What was the degree of rater severity/leniency?Observed Score | Observed Count | Observed Average | Fair-M Average | Mode Measure | S.E. | Infit MnSq | Z Std | Outfit MnSq | Z Std | ? | N raters |
309 | 112 | 2.8 | 2.85 | -.76 | .17 | .9 | 0 | 1.0 | 0 | Rater A | |
314 | 112 | 2.8 | 2.79 | -.99 | .17 | .9 | .17 | .9 | 0 | Rater D | |
365 | 126 | 2.9 | 2.98 | -.22 | .16 | 1.1 | 0 | 1.1 | 0 | Rater M | |
348 | 98 | 3.6 | 3.54 | 1.97 | .23 | .7 | -1 | .9 | 0 | Rater Y | |
334 | 112 | 3.0 | 3.04 | .00 | .19 | .9 | -.6 | 1.0 | -.1 | .38 | Mean (Count: 4) |
23.4 | 9.9 | .3 | .30 | 1.17 | .03 | .1 | .9 | .1 | .5 | .03 | S.D. |
RMSE (Model): .19 Adj S.D.: 1.15 Separation: 6.11 Reliability: .97 Fixed (all same) chi-square: 116.0 d.f.: 3 significance: .00 Random (normal) chi-square: 3.0 d.f.: 2 significance: .22 |
[ p. 175 ]
RMSE (Model): .53 Adj S.D.: 1.50 Separation: 2.85 Reliability: .89 Fixed (all same) chi-square: 296.0 d.f.: 31 significance: .00 Random (normal) chi-square: 30.9 d.f.: 30 significance: .42 |
[ p. 176 ]