| Part 1: Terminology | Part 2: Procedures | Part 3: Test Interpretation | Part 4: Assessment Ethics | 
INSTUCTIONS: Below is a list of possible items for three different foreign language assessment literacy tests: one for professional test validators (Level A), one for language teachers with bachelor degrees in education (Level B), and yet another for first year undergraduate education majors (Level C). If you think that an item represents something that professional language test validators should know, mark "Level A". If you believe that an item should be known by foreign language teachers with B.A. degrees, mark "Level B". If you think that an item is something an education major should know before entering college, mark "Level C". If you believe it's not necessary for any of these three populations to know a given item, leave it blank. Please remember that Levels A, B, and C represent - in your view - the minimal competency levels for each of these three populations. If an item is beyond what you believe a member of a group ought to know, then leave it blank.
It's not necessary to answer any of the items below, but you're welcome to do so if you wish. When clicking the boxes for Levels A, B, and C remember that you may click more than one box if it seems appropriate or leave all boxes blank.
When you have completed this document, please email a copy to timothy*at*toyonet*dot*toyo*dot*ac*dot*jp. Thank you for your cooperation.
| (1) df | _13_  (A) chi-square | ||
| (2) F | ___  (B) coefficient of determination | ||
| (3) Ho | ___  (C) degrees of freedom | ||
| (4) k | ___  (D) F-value, variance ratio | ||
| (5) N | ___  (E)  null hypothesis | ||
| (6) n | ___  (F) number of cases in a population | ||
| (7) ρ | ___  (G) number of cases in a sample | ||
| (8) r | ___  (H) number of items in a test | ||
| (9) r2 | ___  (I) Pearson's correlation coefficient | ||
| (10) r2 | ___  (J) probability of a Type I error | ||
| (11) ζ, SD, Sx | ___  (K) sample mean | ||
| (12) s2 | ___  (L) sample variance | ||
| (13) χ2, c2 | ___  (M) standard deviation | ||
| (14)  , M | ___  (N) frequency | ||
| (15) v | ___  (O) x-value | ||
| ___  (P) (1) level of significance, (2) the proportion of  responses to an item that are correct | |||
[ p. 62 ]
(B) Multiple choice questions| Note that some items have more than one "correct" possible response. | 
| 16. Gender, occupation, or nationality are considered  variables in most language studies. | |
| 17. If a test only seems to measure what it claims to, then it is said to have  validity. | |
| 18. A error occurs when a researcher thinks there is no  relationship between two variables, but there actually is. | |
| 19. The cutoff point for a criteria-reference test should be  when the
	   is equal to or greater than 1. | |
| 20. Exams used to determine a student's progress toward mastery of a  content area are known as
	 tests. | |
| 21. How many standard deviations a score is from the mean is revealed  by a test's. | 
| 22. The test excerpt below is an example of a test.     | 
| 23. To find out how well a particular item in a test correlates with the total  test score, a  should be ascertained. | |
| 24. Any variable that is not part of a research study, but still has an effect on its results  is said to  that study. | |
| 25. In a 3-parameter IRT test model, the point on an ability scale at which the probability of a  correct response for a given item is .5 is known as the 	 
	. | |
| 26. To predict how many more items need to be added to a given test to increase its reliability  to a desired value, the  should be calculated. | |
| 27. If a test is uni-dimensional, then it should automatically show a high degree of  . | |
| 28. The tendency of examinee expectations to contaminate test results is known as  . | |
| 29. A test administration procedure in which a large set of test items is organized into  shorter sub-sets, each of which is randomly assigned to a sub-sample, hence avoiding  the need to administer all items to all examinees is known as a  
	 sampling. | 
| 30. To compare a the mean of a particular sub-group to the mean of a larger group  that is within the same population, 
a  should be performed. | |
| 31. Briefly explain the difference between the standard error of estimate (SEE)  and standard error of measurement (SEM) in the space below,  mentioning when each of these statistics should be used. | |
| 32. If you want to see how closely "masters" who scored high on a particular CRT test differed  from "non-masters" who scored closer the bottom, 
which technique(s) might you use? | |
| 33. What's the difference between a predictive and concurrent validation study?  When should each type of study be used? | |
| 34. How do the Kuder-Richardson Formula 20 and Formula 21 differ?  When should each be used? | |
| 35. What does the central limit theorem tell us? | 
 cont'd.
cont'd.
| Main Article | Appendix A: I II III IV | Appendix B | Appendix C: I II III IV | 
 Topic Index
	Topic Index Author Index
	Author Index Page Index
	Page Index Title Index
	Title Index Main Index
	Main Index
 Topic Index
	Topic Index Author Index
	Author Index Page Index
	Page Index Title Index
	Title Index Main Index
	Main Index