Appendix A, Part 2 of _Teacher development and assessment literacy

Teacher development and assessment literacy

Appendix A:

Foreign Language Assessment Literacy Test -
Preliminary Item Screening

Part 1:
Terminology

Part 2:
Procedures

Part 3:
Test Interpretation

Part 4:
Assessment Ethics

PART II. Procedures

(A) Exercise 1

INSTRUCTIONS: Specify the mean and standard deviation for the following types of norm-referenced tests assuming that the curve has a normal distribution:

36.	quartile score	mean =	standard deviation=	Level A Level B Level C
37.	percentile score	mean =	standard deviation=	Level A Level B Level C
38.	stanine score	mean =	standard deviation=	Level A Level B Level C
39.	T score	mean =	standard deviation=	Level A Level B Level C
40.	z score	mean =	standard deviation=	Level A Level B Level C

(B) Exercise 2

INSTRUCTIONS: Look at the data from the test below, then answer Questions 41-45 using any electronic device or software program that you know how to operate.

Raw score sections of four sections of a norm-referenced language test of general English ability. (Correct number of items for each section of the test appears below)
	Section 1	Section 2	Section 3	Section 4	Total
k (# of items)	10	30	20	30	90
1. Diana	8	28	15	14	65
2. Cindy	7	22	10	15	54
3. Marilyn	4	11	9	8	32
4. Jack	10	26	19	26	81
5. Chris	5	15	10	16	46
6. Faith	7	18	15	22	62
7. Doug	9	10	12	21	52
8. James	3	10	5	11	29
9. Emiko	8	23	16	25	72
10. Eric	6	19	12	18	55
etc. . .

41. What is the mean of the total test?	Level A Level B Level C
42. What is the standard deviation?	Level A Level B Level C
43. Which student(s) is/are more than one standard deviation from the mean?	Level A Level B Level C
44. Do any sections of this test correlate closely in a way that's statistically significant at a p<.05 level (If so, mention which)	Level A Level B Level C
45. What sort of distribution curve does this test have so far?	Level A Level B Level C

[ p. 64 ]

INSTRUCTIONS: The table below indicates the hypothetical data for a 50-item test that were given to two different population samples. Look at that data then calculate the statistics mentioned in Questions 46-50:

	Population A	Population B
sample size:	20	80
mean score:	32	25
standard deviation:	7.5	6
low-high:	14 - 48	12 - 50
alpha reliability estimate:	.7	.8

46. ANOVA:		Level A Level B Level C
47. F-ratio:		Level A Level B Level C
48. Chi-square distribution:		Level A Level B Level C
49. effect size:		Level A Level B Level C
50. standard error of measurement:		Level A Level B Level C

(D) Exercise 4

INSTRUCTIONS: Compare the oral interview ratings below by two raters of the same student, then calculate the statistics mentioned in Questions 51-55. Note that all ratings are in terms of 5-point bands, with 5 representing the highest possible rating.

Category	Rater A	Rater B
Grammar	3.5	3
Fluency	4	4
Pronunciation	4	3.5
Cohesion	4	3.5
Vocabulary	4.5	4
Total	20	18

51. The inter-rater reliability coefficient for A and B is .	Level A Level B Level C
52. The Pearson correlation index for the two raters is .	Level A Level B Level C
53. The index of concordance among the two raters is .	Level A Level B Level C
54. The chi-square test of independence for these two raters is .	Level A Level B Level C
55. The kappa coefficient of the combined rating is .	Level A Level B Level C

(E) Exercise 5

INSTRUCTIONS: Read this hypothetical data comparing a 60-item classroom pretest/posttest, then complete the sentences below. Note that following the pretest, the top one-third students were classified into an "upper group" and the lower one-third were classified into a "bottom group":

Category	Pretest	Posttest
sample size:	48	42
total mean:	30	33
total range:	7-44	12-52
total standard deviation:	3.6	4.3
upper group mean score:	45	50
upper group standard deviation:	4.0	3.9
bottom group mean:	20	20
bottom group standard deviation:	4.2	5.8

56. How did the upper group perform differently from the bottom group? .	Level A Level B Level C
57. What sort of distribution curve would this posttest likely have? .	Level A Level B Level C
58. Which type of ANOVA, if any, would be suitable for measuring the pretest/posttest gains made by this sample group? .	Level A Level B Level C
59. What sort claims could validly be made about the "progress" of this class? .	Level A Level B Level C