The Interface Between Interlanguage, Pragmatics and Assessment: Proceedings of the 3rd Annual JALT Pan-SIG Conference.
May 22-23, 2004. Tokyo, Japan: Tokyo Keizai University.

An examination of situational sensitivity in medium-scale interlanguage pragmatics research

[中間言語を用いたプラグマティックス研究における状況の影響度]
by H. P. L. Molloy and Mika Shimura
(Temple University Japan)


Abstract

How sensitive are L2 English users to situational differences? The particular stimuli used in interlanguage pragmatics research may affect the results. This research examines how three situationally-sensitive measures (word count, speech act features, and a coded set of actions) vary among a collection of Japanese university-age students when initiating twelve scripted complaints in English. Data were analyzed using G- and D-studies and ad hoc scatterplots. (For information about G- and D- studies, refer to Brown, 2005). Situations accounted for less than 3% of variance; individual x situation interactions had a higher degree of variance. Scatterplots of proficiency test scores against the three measures showed similar ranges of variation. The data suggest that participant idiosyncrasies may be a reason why L2 pragmatics performance varies so widely and that the role of situational sensitivity may be less important than previously believed.

Keywords: : situational sensitivity, production questionnaires, pragmatics, generalizability theory

J Abstract

[ p. 16 ]

Introduction and research background

In pragmalinguistics research, the sensitivity of groups of language users to stimuli has been demonstrated in many studies both before and since Blum-Kulka (1991) explicitly stated that learners vary the strategies they use depending on situations (p. 263). Blum-Kulka's used both the results of the study reported in her 1991 work and the results from many of the studies in the Cross Cultural Speech Act Realization Project as described in Blum-Kulka, House, and Kasper, 1989 as evidence for this hypothesis.
Virtually all adult cross-cultural or L2 pragmalinguistics research depends on and lends evidence to the assumption that language users demonstrate situational sensitivity in language behavior. (cf. Kasper, 2001, Kasper and Dahl, 1991, and Kasper and Rose, 1999, 2000).
In other words, nearly all designs in pragmatics research depend on the assumption that different groups of individuals behave differently as situations vary. Much of the published research seems to lend weight to that assumption and underscore the importance of situational sensitivity.
There are cases, however, in which research results have not been correctly analyzed (e.g., Kuha, 2003, and, possibly, Billmyer and Varghese, 2000), have been insufficiently analyzed (e.g., Beebe and Cummings, 1996; Maeshiba et al., 1996; Hill, 1997), or have been presented with ad hoc analyses that prove to be impossible to relate to other studies (e.g., Nakabachi, 1996). (Reanalyses of several of these studies were presented by Shimura and Molloy in 2003a.)
These studies, regardless of their intrinsic merits, serve as reminders of how well-entrenched the assumption of situational sensitivity is in L2 pragmalinguistics research. The analytically problematic studies cited above have various shortcomings. Kuha (2003) and Billmyer and Varghese (2000) used single-sample methods (x2 and t tests, respectively) for repeated-measures studies, resulting in statistical misinterpretations. Beebe and Cummings (1996), Hill (1997), and Maeshiba and colleagues (1996) did not account for measurement error in their findings and so in some instances interpreted small measured differences as reflecting underlying differences in behavior. Nakabachi (1996) simply presented results in an idiosyncratic fashion that makes it difficult to tell whether his interpretations were well-grounded.
Most of these studies are, despite their problems, useful contributions to the literature that happen to have been published in fora in which calls for statistical reform (cf. Wainer, 1996, Thompson, 2002, Fidler, 2002, or Keselman, Cribbie, and Holland, 2002) do not seem to have been heeded. The studies are not brought up here for criticism, but as evidence of the pervasiveness of the assumption of situational sensitivity in pragmalinguistics research: the published results are not presented in a way that allows a clear interpretation that differences in situational sensitivity exist (whether between or within groups), but each of the researchers or research teams interpreted the results as suggesting that they do.
If situational sensitivity was not regarded as so important in current pragmatic research, two differences would probably ensue. First, studies such as those cited above, in which the published evidence does not conclusively point either way, would probably have been interpreted differently. Though some of those studies might suggest that situational sensitivity was important, others would probably surmise it is not. Second, virtually every piece of research done so far using quantitative analytic techniques has been designed to measure situational sensitivity by manipulating situational variables. (The only exception we know of was done by Murphy and Neu, 1996, who studied only one situation.)
". . . the literature has little information on the effect of sample sizes or the relative contribution of behavior by individual participants."

This second point concerns the way data from pragmalinguistics studies are grouped. At times, the grouping divisions are only based on situations (as in Kuha, 2003; or Goldschmidt, 1996); More commonly, it is done either by research instrument (as in Beebe and Cummings, 1996; Houck and Gass, 1996; Sasaki, 1998; or Rose, 1992) or by social background (i.e., first language or mother tongue) (as in Takahashi and Beebe, 1993, or Olshtain & Weinbach, 1987). Differences between groups are the focus, then, of most pragmalinguistics research.
An unfortunate consequence of the confluence is that the literature has little information on the effect of sample sizes or the relative contribution of behavior by individual participants.
Effect-size reporting enables research consumers to estimate how much weight to give to reported results. This is particularly important, we believe, if the consumer of the research intends to use it for purposes such as making pedagogic decisions (as in Ohlstain and Cohen, 1990) or to design further research.
However, effect-size reporting may not be sufficiently rigorous. For example, in a recent study, we (2003b) dutifully reported error estimates, effect size, and power when comparing solicited and invented pragmalinguistic production questionnaire prompts obtained with an analysis of variance procedure. The effect size (h2 = 0.46) would seem to indicate that we had found an important difference (that invented prompts for production questionnaires were considered more realistic than solicited prompts), and the power statistic (0.998) that we had looked at the problem with a sufficient number of participants. Nevertheless, it was clear from the error in the data we presented that the statistically significant difference we had found would not be useful as a sole criterion for deciding whether or not to use invented or solicited prompts: there was simply too much within-groups variance to allow simple group membership an adequate prediction of whether or not a prompt would be considered realistic.

[ p. 17 ]


Within-groups variance has only been examined closely twice in the applied linguistics literature to our knowledge. The first study was done by Benjamin and published in 1992. Benjamin studied the ability of people to guess others' ages simply by listening to their speech. Benjamin found that, although participants as a group were accurate in estimating age, the within-group variance was such that no single participant could be considered able to estimate ages simply because of his or her membership in a given group. Molloy (2004) studied how reactions to advice-giving varied in a test-retest procedure and found that the instrument he used gave consistent results for groups, but not for individuals. The implication is the same as that of the Benjamin study.
Of course, it follows from the estimation of central tendency used in most studies that there will be variation around the mean; though the central limit theorem proves (Maxwell and Delaney, 1990) that the arithmetic mean will be the best estimate, the proof should not be taken to imply that variance is necessarily unimportant or somehow cancelled by the parameter estimation. The use of statistical estimates of error is an acknowledgement of this.
Unfortunately, error estimates are not consistently used in the pragmalinguistics literature; the importance of such estimates is never discussed.
This study can be considered an examination of error. We believe it is an important one because it is one of the first studies to provide evidence regarding the closeness of fit-we use this term here non-technically-between parameter estimates and particular individuals.
Our research question is: We address this question in ways explained in detail below, but it may be of aid to conceptualize our approach as rotating (in the factor-analysis sense) a normal set of production questionnaire data. Instead of, as is usual, looking down the columns of data that represent situations, we turn the data table on its side and look down new columns representing individual participants.

Method

Participants

The participants were 259 Japanese university students of English enrolled at five different universities of varied academic reputations in the Tokyo area, who have been described in greater detail elsewhere (Molloy and Shimura, 2002; Molloy & and Shimura, 2003). The 45 participants who opted out of all twelve situations were not included in the study. Two further participants were dropped because they did not respond to the requisite number of prompts mentioned in Appendix 3. The final total of participantswas 257.
All of the participants were aged from 18 to 21, with 80.2% women. One of the universities involved is an all-women's university.
The participants were normally distributed with regard to English linguistic proficiency, as measured by a reasonably internally reliable (Cronbach alpha = 0.91, Kuder-Richardson 20 = 0.91) cloze test developed (Brown, 1980) and used (Hill, 1997) elsewhere and modified by Molloy and Shimura (2002).
All participants signed a bilingual informed consent agreement, and site permission was obtained orally.

Materials

The materials for the study comprised 3084 responses to a set of twelve complaint initiation prompts. As mentioned earlier, opting out was allowed. Hence, there were 2766 tokens of complaint initiations represented in the data set. (Data collection procedures have been presented in greater detail elsewhere, such as Molloy & Shimura, 2002, 2003.)
The data were stored in a Microsoft Excel 2000 (Microsoft Excel, 1999) spreadsheet. Analyses were done using Microsoft Excel 2000 and the Genova computer program (Brennan, 2001).
All of the data were coded at two levels, as explained in Molloy and Shimura, 2003. One level of codes was termed "speech acts." The definitions for the codes were lifted from Clarke's work on conversation sequencing (1983), with one exception: we added a code to designate acts to draw the attention of the interlocutor. The list of speech act codes and their definitions appear in Appendix 1. (Not all of Clarke's codes are listed because not all of them were used in our analysis.) The speech act codes were specifically designed for the English language and were used to code each clause or clause-like element in each participant's responses.
Initially, the overall inter-rater reliability for the speech act coding was calculated at 0.82 using Cohen's Kappa; after a discrepancy in the way the two coders were treating question forms was discussed, the data were recoded independently, and the recoded data reliability was calculated (using Pearson correlation) at 0.96.

[ p. 18 ]


The second level of codes was termed "actions." These were designed by the researchers to be language-independent. This level of codes was to designate the researchers' judgment regarding the intention of the participant in "saying" a particular thing. These codes can be considered roughly equivalent to the "semantic formula" notion introduced by Beebe, Takahashi, and Uliss-Weltz (1985) and used in many subsequent studies (e.g., Bardovi-Harlig and Hartford, 1993). Note that it was possible to assign more than one action code to a particular string of words, so that the number of speech acts a participant used and the number of actions the same participant used would not necessarily be very positively correlated: repeating a request, for example, would be coded as the speech act "request" twice, but as the action "redress" only once. The action codes are presented in Appendix 2.
The inter-rater reliability calculated (using Pearson correlation) for the action codes was 0.84. Pearson correlation was used in lieu of Cohen's Kappa in this case because the codes were assigned on a simple presense-absense criterion, not in ordered groups, as the speech act codes were assigned. Cohen's Kappa necessitates that each datum be coded only once, but data could receive multiple action codes.
Intra-rater reliability (after a two-week delay) was also checked for a subset of 711 tokens of complaint initiations by one rater. Reliability was 0.94.
The number of words in each response was counted with a Microsoft Excel (Microsoft Excel, 1999) formula.

Analysis

The Genova (Brennan, 2001) program was to conduct three G and D studies for number of words, number of speech acts, and number of actions. The D studies, note, were not performed for the usual reasons (such as reliability calculations); they were done only with parameters identical to those in the G studies because D study output in Genova is more conveniently formatted than G study output and gives a clearer picture of the relative weights of the various facets that are involved in the scores.
The various rates, proportions, confidence intervals, and correlation coefficients reported below were calculated with Microsoft Excel 2000 (Microsoft Excel, 1999).

Results

G- and D-study results

In the terminology of generalizability theory (Brennan, 2001b), the design for the study is a not a completely crossed one or even a nested one, as every participant contributed one score for each of the three measures, but there was only one facet, situations. Individuals are not considered facets in generalizability theory. In the case of missing information (opting out), the contributed score was zero.
The generalizability coefficient (equivalent to classical test reliability measures) for the word count was 0.88959 (with a 8.05723 signal-to-noise ratio), and the phi coefficient was 0.86379 (6.53117 signal-to-noise ratio). The phi coefficient can again be considered equivalent to classical test reliability measures, but it is a more conservative measure incorporating an estimate of possible variance among the entire universe of interest, which in this case would be Japanese university users of English. Presented below, in Tables 1 through 6, are first the ANOVA tables and then tables showing variance components for, respectively, the number of words, the number of speech acts, and the number of actions. For the argument in this study, the most relevant items are the highlighted variance component columns from the three D studies. These show the relative contributions of the single facet in the studies (situations) and the interaction facet to the total variance in the three sets of scores. For convenience in interpretation, we have inserted a column showing the percentage of variance accounted for after Shavelson and Webb (2003).
Roughly, in this study, the variance components estimates for individuals can be interpreted as an estimate of how much the individuals in the study varied in terms of the three measures; that is, roughly this shows simple differences between individuals. The situation variance component estimates how much prompts affect the scores. Roughly, this shows how much situations affect scores. The interaction variance components estimates the extent to which the relative ranking of individuals changes according to prompts. Basically, this shows the extent to which the scores depended on differing reactions to the 12 complaint-initiation prompts.

[ p. 19 ]


Table 1. ANOVA table showing G- and D-study results for number of words used in responses to complaint-initiation production questionnaire.

ANOVA table for number of words.
Sample sizes: Individuals =259. Situations = 12. All universes assumed to be infinite.
F-test degrees of freedom
Effect Degrees of Freedom Sums of squares for mean scores Sums of squares for score effects Mean squares F statistic* Numerator Denominator
Individuals 258 574336.33333 89511.00386 346.94188 9.05723 258 2838
Situation 11 514561.98456 29736.65508 2703.33228 70.57295 11 2838
Individuals x situation 2838 712784.00000 108711.01158 38.30550
Mean 484825.32947
Total 3107 227958.67053
Notes: *For generalizability analyses, F statistics should be ignored.

Table 2. G- and D-study results for word counts.

(from G study) (from D study)
Model variance components Variance components for mean scores
Effect Degrees of freedom Using algorithm/ EMS squares Percentage of variance accounted for Standard error Estimates Standard error
Individuals 258 25.7196979 86.4 2.5371492 25.71970 2.53715
Situation 11 10.2896787 2.9 4.0939562 0.85747 0.34116
Individuals x Situation 2838 38.3055009 10.7 1.0165224 3.19213 0.08471
Note: The "algorithm" and "EMS" estimated variance components are identical if there are no negative estimates.

The generalizability coefficient for number of speech acts was 0.87493 (with a 6.96117 signal-to-noise ratio), and the phi coefficient was 0.86550 (with a 6.43473 signal-to-noise ratio).

Table 3. ANOVA table showing G- and D-study results for number of speech acts used in responses to complaint-initiation production questionnaire.

ANOVA table for number of speech acts.
Sample sizes: Individuals =259. Situations = 12. All universes assumed to be infinite.
F-test degrees of freedom
Effect Degrees of Freedom Sums of squares for mean scores Sums of squares for score effects Mean squares F statistic* Numerator Denominator
Individuals 258 15440.83333 1846.88224 7.15846 7.42569 258 2838
Situation 11 13829.25097 235.29987 21.39090 22.18945 11 2838
Individuals x situation 2838 18412.00000 2735.86680 0.96401
Mean 13593.95109
Total 3107 4818.04891
Notes: *For generalizability analyses, F statistics should be ignored.

[ p. 20 ]

Table 4. G- and D-study results for number of speech acts.

(from G study) (from D study)
Model variance components Variance components for mean scores
Effect Degrees of freedom Using algorithm/ EMS squares Percentage of variance accounted for Standard error Estimates Standard error
Individuals 258 0.5162038 86.5 0.0523633 0.51620 0.05236
Situation 11 0.0788683 1.0 0.0323947 0.00607 0.00249
Individuals x Situation 2838 0.9640123 12.4 0.0255822 0.07415 0.00197
Note: The "algorithm" and "EMS" estimated variance components are identical if there are no negative estimates.

The generalizability coefficient for number of actions was 0.84883 (with a 5.61504 signal-to-noise ratio), and the phi coefficient was 0.83469 (with a 5.04909 signal-to-noise ratio).

Table 5. ANOVA showing G- and D-study results for number of actions used in responses to complaint-initiation production questionnaire.

ANOVA table for number of actions.
Sample sizes: Individuals =259. Situations = 12. All universes assumed to be infinite.
F-test degrees of freedom
Effect Degrees of Freedom Sums of squares for mean scores Sums of squares for score effects Mean squares F statistic* Numerator Denominator
Individuals 258 14717.16667 1571.20013 6.08992 6.61504 258 2838
Situation 11 13450.084947 304.11840 27.64713 30.03104 11 2838
Individuals x situation 2838 17634.00000 2612.71493 0.92062
Mean 13145.96654
Total 3107 4488.03346
Notes: *For generalizability analyses, F statistics should be ignored.

Table 6. G- and D-study results for number of actions.

(from G study) (from D study)
Model variance components Variance components for mean scores
Effect Degrees of freedom Using algorithm/ EMS squares Percentage of variance accounted for Standard error Estimates Standard error
Individuals 258 0.4307754 83.4 0.0445567 0.43078 0.04456
Situation 11 0.1031912 1.7 0.0418692 0.00860 0.00349
Individuals x Situation 2838 0.9206184 14.9 0.0244307 0.07672 0.00204
Note: The "algorithm" and "EMS" estimated variance components are identical if there are no negative estimates.

[ p. 21 ]

Descriptive statistics for the three variables of interest are presented below in Table 7. Note that these descriptive statistics refer to the means of each individual's 12 responses. Of particular interest are those items indicating the range of responses.

Table 7. Descriptive statistics for word counts, number of speech acts, and number of actions.

Note: The statistics below describe the average response for each participant, meaning that the minimum figure in the "mean words" column represents the smallest average response length, not the smallest response.

Mean words Mean speech acts Mean actions
Mean 13.72 2.32 2.28
Standard Deviation 5.03 0.68 0.61
Standard Error 0.31 0.04 0.04
Median 13.33 2.33 2.33
Mode 9.33 2.00 2.00
Kurtosis 0.26 0.30 0.36
Skewness 0.54 0.51 0.34
Range 26.29 3.55 3.36
Minimum 3.63 1.00 1.00
Maximum 29.92 4.55 4.36
99% CI for mean 0.81 0.11 0.10

Table 3 gives an idea of the overall variance for the three variables, but still represents the participants as a group. The argument in this study being in part that representations of participants in groups obscures important variation, it is useful to look more closely at how some of the variation between participants is distributed.
Figures 1 through 3 show scatterplots of cloze test scores and mean word counts, mean speech acts, and mean actions, respectively. The spreads of scores along the x-axes show the variance in the three count variables; the relative vertical density in the scatterplots allows the reader to interpret the scatterplots as histograms.

Figure 1
Figure 1. Scatterplot of cloze test scores (y-axis) versus mean word counts (x-axis) in responses to the complaint-initiation production questionnaire.
Labels refer to the different universities of the participants.

[ p. 22 ]

Figure 2
Figure 2. Scatterplot of cloze test scores (y-axis) versus mean speech act counts (x-axis) in responses to the complaint-initiation production questionnaire.
Labels refer to the different universities of the participants.
Figure 3
Figure 3. Scatterplot of cloze test scores (y-axis) versus mean action counts (x-axis) in responses to the complaint-initiation production questionnaire.
Labels refer to the different universities of the participants.

Another point of interest is the variety of speech acts the participants used in responding to the production questionnaire. Two figures are presented to give an idea of how much participants varied their responses by prompts. These can be considered measures of situational sensitivity.
Figure 4 is a scatterplot of the total number of different speech acts each participant used (x-axis) versus the number of different speech acts used per response (y-axis). Points higher on the y-axis mean a greater number of different kinds of speech acts were used for each response. Hence, a participant who always responded to prompts in the same way (that is, showed less situational sensitivity) would appear lower on the y-axis. A participant who tended to use a greater variety of kinds of speech acts depending on the particular prompt (that is, showed greater situational sensitivity) would appear higher on the y-axis. A participant who appears far to the right on the x-axis used a greater number of different speech acts (that is, showed more variety in language use overall), and a participant who appears far to the left on the x-axis used a smaller number of different speech acts. Hence, the scatterplot can be considered to show a measure of situational sensitivity, if the x-axis is considered a demonstration of the ability to use different kinds of speech acts and the y-axis the tendency to tailor speech acts to particular situations.

[ p. 23 ]

Figure 4
Figure 4. Differences in situational sensitivity (y-axis) versus overall variety in speech act use (x-axis).

Figure 5 shows a similar measure of situational sensitivity. This figure shows the tendency to start complaint initiations in different ways depending on situation, another measure of situational sensitivity. Participants higher on the y- axis varied the starts of their responses more (or, showed greater situational sensitivity). The R2 coefficient (0.086) shows that there is little relationship between the number of different prompts a participant responded to and the number of different ways he or she began those responses.

Figure 5
Figure 5. Differences in situational sensitivity (y-axis) versus number of prompts responded to.

[ p. 24 ]

Discussion

This study was designed to compare situational sensitivity with idiosyncratic tendencies to use English. There does seem to be evidence in this study that people do differ in their tendency to use the same kinds of language consistently, in all twelve of the situations studied.
The three G-studies all, as might be expected, show that the variance in the three measures used is mostly attributable to individual differences. The percentage for all three measures is higher than 80%. This is no surprise: a fairly wide range of levels of English proficiency are represented in the sample, and any language allows too much variation, even for low-proficiency users, to expect that everyone will behave in the same way. (In a criterion-referenced mathematics test, on the other hand, one would expect variance associated with persons to be lower, as there is not much chance to give several different correct answers to a given problem. Shavelson and Webb, 2003, for example, report variance attributable to persons in a mathematics test as 27% of the total variance.)
Somewhat surprising, however, is how the remainder of the variance is divided. If situational sensitivity were consistently strong, one would expect situations to have accounted for substantial amounts of variance. However, they did not. In the number of words participants used, situation accounted for 2.9% of the variance. We would attribute this in large measure to the telephone company situation (Appendix 3), in which many participants used a long clause from the prompt in their responses. In the number of speech act and number of actions measures, however, which are not so sensitive to clause- or phrase-length, situation accounted for only 1% and 1.7%, respectively, of variance. Which situation participants were responding to did have a detectable effect on the various length measures, but we would argue that the effect was negligible.
Support for this contention comes from the attributions of the remaining variance. The interaction between individuals and situations accounted for 10.7% of variance with number of words, 12.4% with number of speech acts, and 14.9% with number of actions.. As mentioned above, this interaction component of the variance can be considered an indication of how the rank (in this case, the number of words used, and so on) of individuals changes across situations. If situational sensitivity were an important factor in determining how much language participants use in responding to prompts, we would expect the interaction component of variance to be smaller. For example, the most verbose people would tend to always be the most verbose, regardless of situation, and the least verbose to be the least verbose. This however, is not the case: some degree of situational sensitivity was evident, but participants differed in their sensitivity to situations.
The scatter and low R2 figures shown in Figures 1 through 3 do not contradict the implications of the G study evidence. The three scatterplots show that verbosity levels are fairly widely distributed and that the verbosity does not seem to have much to do with linguistic proficiency. That is, it seems that some people were, overall, more verbose than others; nevertheless, the G study results show that the relative tendency to verbosity seems to change by situation. This would seem, in a way, to be a kind of situational sensitivity, and indeed it is, but the G study results show that participants were differently sensitive to situations.
Finally, the two nonce measures of variety shown in Figures 4 and 5 seem also to show that tendencies to use language in idiosyncratic ways, rather than tendencies to change one's language depending on the situation, seem to be stronger in some participants than in others.
Figure 5 shows that the participants differed in the speech acts they selected depending on situation. The marks higher on the vertical scale represent participants who were more likely the same sets of speech acts for each situation. Marks lower on the vertical scale indicate the tendency to use different sets of speech acts for different prompts (or greater situational sensitivity).
Figure 6 shows a similar measure. In this scatterplot, the number of different first speech acts used is contrasted with the total number of prompts responded to (or the total number of first speech acts). In this figure, marks higher on the y-axis indicate greater situational sensitivity-that is, it is the opposite of Figure 1. For example, it can be seen that at least one participant started each of 9 different responses to 9 different prompts in the same way.

[ p. 25 ]

Conclusion and directions for further research

This study provides interesting results that should be explored further. First, the study must be replicated. Does the variety in situational sensitivity that seems to have been found hold across different speech situations? Does it hold when the data are collected orally, rather than with a written production questionnaire? Does the variety hold in interaction? Does it hold when the utterances are longer?
This study here presents evidence that situational sensitivity varies. Further research should focus on why it varies. Is situational sensitivity different in different kinds of speech situations? The focus in the study reported here was on complaints. Would the results have been different if the focus had been purchasing transactions or requests?
The scatter that can be seen in all of the figures would seem to indicate that whatever is causing the variation in participants' reactions to the 12 prompts was not something that was measured in this study. (Recall the interaction effects from the G- and D-study results.) Different people were reacting to different situations in different ways, and the reactions, based on the evidence presented here, can only be called idiosyncratic. Idiosyncrasy, however, here simply functions as an admission that we do not know why people were reacting in different ways. Certainly this is an area that can be investigated further.
Pragmalingusitcs research has hereto depended on the assumption that, because situational sensitivity exists, it is important. This study has, we believe, given further evidence that situational sensitivity does indeed exist, but raised doubts about whether it should be assumed to exist in the same degree with all potential language users.

References

Atkinson, J. M., & Heritage, J. (1984/1999). Jefferson's transcript notation. In A. Jaworski & N. Coupland (Eds.), (1999). The discourse reader (pp. 158-166). London: Routledge.

Bardovi-Harlig, K., & Hartford, B. S. (1993). Refining the DCT: Comparing open questionnaires and dialogue completion tasks. In L. F. Bouton & Y. Kachru (Eds.), Pragmatics and language learning, Monograph series, Vol. 4 (pp. 143-165). Urbana-Champaign, IL: Division of English as an International Language, Intensive English Institute, University of Illinois at Urbana-Champaign.

Beebe, L. M., & Cummings, M. C. (1996). Natural speech act data versus written questionnaire data: How data collection method affects speech act performance. In S. M. Gass & J. Neu (Eds.), Speech acts across cultures: Challenges to communication in a second language (pp. 65-86). Berlin: Mouton de Gruyter.

Beebe, L. M., Takahashi, T., & Uliss-Weltz, R. (1985). Pragmatic transfer in ESL refusals. In R. C. Scarcella, E. S. Anderson, & S. D. Krashen (Eds.), Developing communicative competence in a second language (pp. 55-73). Boston: Heinle & Heinle.

Benjamin, G. R. (1992). Perceptions of speaker's age in natural conversation in Japanese and English. Language Sciences, 14, 77-87.

Billmyer, K., & Varghese, M. (2000). Investigating instrument-based pragmatic variability: Effects of enhancing discourse completion tests. Applied Linguistics, 21 (4), 517-552.

Blum-Kulka, S. (1991). Interlanguage pragmatics: The case of requests. In R. Phillipson, E. Kellerman, L. Selinker, M. Sharwood Smith, & M. Swain (Eds.), Foreign/second language pedagogy research (pp. 255-272). Clevedon, Avon: Multilingual Matters.

Blum-Kulka, S., House, J., & Kasper, G. (Eds.). Cross-cultural pragmatics: Requests and apologies. Vol. XXXI: Advances in discourse processes. Norwood, NJ: Ablex.

[ p. 26 ]

Bonikowska, M. P. (1988). The choice of opting out. Applied Linguistics, 9 (2), 169-181.

Brennan, R. L. (2001a). Genova version 3.1 (Computer software). Iowa City, IA: Author.

Brennan, R. L. (2001a). Generalizability theory. New York: Springer Verlag.

Brown, J.D. (1980). Relative merits of four methods for scoring cloze tests. Modern Language Journal, 64 (3), 311-317.

Cicourel, A. V. (1974/1999). Interpretive procedures. In A. Jaworski & N. Coupland (Eds.), (1999). The discourse reader (pp. 89-97). London: Routledge.

Clarke, D. D. (1983). Language and action: A structural model of behaviour. Oxford: Pergamon.

Eberhardt, L. (2003). A course in quantitative ecology (draft). Available online at http://nmml.afsc.noaa.gov/quantita.htm. [Accessed 14 November 2003].

Fidler, F. (2002). The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement, 62, 749-770.

Garfinkel, H. (1967). Studies in ethnomethodology. Oxford: Basil Blackwell Ltd./Polity Press.

Goldschmidt, M. (1996). From the addressee's perspective: Imposition in favor-asking. In S. M. Gass & J. Neu (Eds.), Speech acts across cultures: Challenges to communication in a second language (pp. 241-256). Berlin: Mouton de Gruyter.

Heritage, J. (1984). Garfinkel and ethnomethodology. Cambridge: Polity Press.

Hill, T. (1997). Pragmatic development in Japanese learners: A study of requestive directness level. Dokkyo Daigaku Eigo Kenkyu, 65-102.

Hood, G. M. (2003) PopTools version 2.5.9. Available online at http://www.cse.csiro.au/poptools. [Accessed 15 Jan. 2005.]

Houck, N., & Gass, S. M. (1996). Non-native refusals: A methodological perspective. In S. M. Gass & J. Neu (Eds.), Speech acts across cultures: Challenges to communication in a second language (pp. 45-64). Berlin: Mouton de Gruyter.

Kasper, G. (2001). Classroom research on interlanguage pragmatics. In K. R. Rose & G. Kasper (Eds.), Pragmatics in language teaching (pp. 33-60). Cambridge: Cambridge University Press.

Kasper, G., & Dahl, M. (1991). Research methods in interlanguage pragmatics. University of Hawai'i at Manoa: Second Language Teaching & Curriculum Center.

Kasper, G., & Rose, K. (Eds.) (2000). Research methods in interlanguage pragmatics. Mahwah, NJ: Erlbaum.

Kasper, G., & Rose, K. R. (1999). Pragmatics and SLA. Annual Review of Applied Linguistics, 19, 81-104.


[ p. 27 ]

Keselman, H. J., Cribbie, R., & Holland, B. (2002). Controlling the rate of Type I error over a large set of statistical tests. British Journal of Mathematical & Statistical Psychology, 55, 27-39.

Kuha, M. (2003). Perceived seriousness of offense: The ignored extraneous variable. Journal of Pragmatics, 35, 1803-1821.

Maeshiba, N., Yoshinaga, N., Kasper, G., & Ross, S. (1996). Transfer and proficiency in interlanguage apologizing. In S. M. Gass & J. Neu (Eds.), Speech acts across cultures: Challenges to communication in a second language (pp. 155-187). Berlin: Mouton de Gruyter.

Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and analyzing data: A model comparison perspective. Belmont, CA: Wadsworth.

Microsoft Excel 2000 [Computer software]. (1999). Redmond, WA: Microsoft.

Molloy, H. P. L., & Shimura, M. (2002, September). Production and recognition difference in Japanese university students' English-language complaining. Paper presented at the JACET 41st Annual Convention, Tokyo.

Molloy, H. P. L., & Shimura, M. (2003, September 5). Approaches to a theory of complaint interactions. Paper presented at the JACET 42nd Annual Convention, Sendai, Japan.

Molloy, H. P. L. (2004, February). Approaches to reliability calculations in pragmatic acceptability tests. Paper presented at the Temple University Japan Research Colloquium, Tokyo.

Murphy, B., & Neu, J. (1996). My grade's too low: The speech act set of complaining. In S. M. Gass & J. Neu (Eds.), Speech acts across cultures: Challenges to communication in a second language (pp. 191-216). Berlin: Mouton de Gruyter.

Nakabachi, K. (1996). Pragmatic transfer in complaints: Strategies of complaining in English and Japanese by Japanese EFL speakers. JACET Journal, 27, 127-142.

Olshtain, E., & Weinbach, L. (1987). Complaints: A study of speech act behavior among native and nonnative speakers of Hebrew. In M. B. Papi & J. Verscheuren (Eds.), The pragmatic perspective: Selected papers form the 1985 International Pragmatics Conference (pp. 195-208). Amsterdam: John Benjamins.

Olshtain, E., & Cohen, A. D. (1990). Teaching speech act behavior to nonnative speakers. In M. Celce-Murcia (Ed.), Teaching English as a second or foreign language (pp. 154-165). New York: Newbury House/HarperCollins.

Rose, K. R. (1992). Speech acts and questionnaires: The effect of hearer response. Journal of Pragmatics, 17, 49-82.

Sarle, W. S. (1995). Bootstrap confidence intervals. Available online at http://www.pitt.edu/~wpilib/statfaq/bootfaq.html. [Accessed 14 November 2003].

Sasaki, M. (1998). Investigating EFL students' production of speech acts: A comparison of production questionnaires and role plays. Journal of Pragmatics, 30, 457-484.

Shavelson, R. J., & Webb, N. M. (2003). Generalizability theory. In Encyclopedia of social measurement. New York: Academic Press. Available online at http://www.stanford.edu/dept/SUSE/SEAL/Reports_Papers/Generalizability%20Theory_ESM_Final.doc. [Accessed 11 March 2004].

[ p. 28 ]

Shimura, M., & Molloy, H. P. L. (2003a, November). Using traditional measures and bootstrap replication to calculate confidence intervals. Paper presented at the JALT National Meeting, Shizuoka, Japan.

Shimura, M., & Molloy, H. P. L. (2003b, November). The reality and realism of production questionnaire prompts. Paper presented at the JALT National Meeting, Shizuoka, Japan.

Shimura, M., & Molloy, H. P. L. (2004). Doing bootstrapping in Excel. Unpublished manuscript, Temple University Japan (Tokyo).

Takahashi, T., & Beebe, L. M. (1993). Cross-linguistic influence in the speech act of correction. In G. Kasper & S. Blum-Kulka (Eds.), Interlanguage pragmatics (pp. 138-157). Oxford: Oxford University Press.

Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31, 25-32.

Wainer, H. (1996). Depicting error. The American Statistician, 50, 101-111.

Wilkinson, L., & The APA Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.

( Appendix 1 Appendix 2 Appendix 3 )


2004 Pan SIG-Proceedings: Topic Index Author Index Page Index Title Index Main Index
Complete Pan SIG-Proceedings: Topic Index Author Index Page Index Title Index Main Index

[ p. 29 ]

Last Next