Xem mẫu

CHAPTER 23 Multiple comparisons in ANOVA Just where do the differences lie? Overview ⌘Generally speaking, analyses of variance are relatively easy to interpret if the inde-pendent variables all have just two different values. ⌘Interpretation becomes difficult with greater numbers of values of the independent variables. ⌘This is because the analysis does not stipulate which means are significantly differ-ent from each other. If there are only two values of each independent variable, then statistical significance means that those two values are significantly different. ⌘Multiple comparison tests are available to indicate just where the differences lie. ⌘These multiple comparison tests have built-in adjustment for the numbers of comparisons being made. Hence they are generally to be preferred over multiple comparisons using the t-test. ⌘It is very difficult to know which multiple comparison tests are the most appropriate for any particular data or purpose. Consequently, it is reasonable advice that several different tests should be used. The only problem that arises is when the different tests yield different conclusions. ⌘Some multiple comparison tests may be applied whether or not the ANOVA itself is statistically significant. Preparation You will need a working knowledge of Chapters 19, 20 and 21 on the analysis of variance. Chapter 14 introduces the problem of multiple comparisons in the context of partitioning chi-square tables. CHAPTER 23 MULTIPLE COMPARISONS IN ANOVA 277 23.1 Introduction When in research there are more than two levels of an independent variable it is not always obvious where the differences between conditions lie. There is no problem when you have only two groups of scores to compare in a one-way or a 2 ´ 2 ANOVA. However, if there are three or more different levels of any independent variable the interpretation prob-lems multiply. Take, for example, Table 23.1 of means for a one-way analysis of variance. Although the analysis of variance for the data which are summarised in this table may well be statistically significant, there remains a very obvious problem. Groups 1 and 2 have virtually identical means and it is group 3 which has the exceptionally large scores. Quite simply we would be tempted to assume that group 1 and group 2 do not differ significantly and that any differences are due to group 3. Our eyes are telling us that only parts of the data are contributing to the significant ANOVA. Although the above example is very clear, it becomes a little more fraught if the data are less clear-cut than this (Table 23.2). In this case, it may well be that all three groups differ from each other. Just by looking at the means we cannot know for certain since they may just reflect sampling differences. Table 23.1 Sample means in a one-way ANOVA Group 1 Group 2 Group 3 Mean 5.6 5.7 12.9 Table 23.2 Sample means in another one-way ANOVA Group 1 Group 2 Group 3 Mean 5.6 7.3 12.9 Box 23.1 Focus on Does it matter that the F-ratio is not significant? Traditionally, the advice to users of statistics was that unless the ANOVA itself is statistically significant, no fur-ther analyses should be carried out. That is, a significant ANOVA is a prerequisite for multiple comparison testing. Perhaps this was sound advice before the sophisticated modern multiple range tests were developed. However, this is a fairly controversial topic which makes straight-forward advice difficult. Some multiple range tests are deemed by some authorities to be permissible in circum-stances where ANOVA was not significant. With post hoc testing, depending on which multiple comparison test is being contemplated, you do not need a significant ANOVA first. Of course, if the ANOVA is statistically significant then any multiple comparison test is appropriate. Among a number of multiple comparison tests which can be applied irrespective of overall significance are the Neuman–Keuls test and Duncan’s new multiple range test. If one is operating within the strictures of a priori (planned) specific comparisons, then concerns which apply to the post hoc test simply do not apply, as explained elsewhere. 278 PART 3 INTRODUCTION TO ANALYSIS OF VARIANCE Obviously it is essential to test the significance of the differences between the means for all three possible pairs of sample means from the three groups. These are: group 1 with group 2 group 1 with group 3 group 2 with group 3 If there had been four groups then the pairs of comparisons would be: group 1 with group 2 group 1 with group 3 group 1 with group 4 group 2 with group 3 group 2 with group 4 group 3 with group 4 This is getting to be a lot of comparisons! 23.2 Methods There are a number of different procedures which you could employ to deal with this problem. One traditional approach involves comparing each of the pairs of groups using a t-test (or you could use one-way analysis of variance for two groups). So for the four-group experiment there would be six separate t-tests to calculate (group 1 with group 2, group 1 with group 3, etc.). The problem with this procedure (which is not so bad really) is the number of separate comparisons being made. The more comparisons you make between pairs of means the more likely is a significant difference merely due to chance (always the risk in inferential statistics). Similar procedures apply to the multifactorial (two-way, etc.) analysis of vari-ance. You can compare different levels of any of the main effect pairs simply by com-paring their means using a t-test or the equivalent. However, the multiple comparison difficulty remains unless you make an adjustment. To cope with this problem a relatively simple procedure, the Bonferroni method, is used. It assumes that the significance level should be shared between the number of com-parisons made. So, if you are making four comparisons (i.e. conducting four separate t-tests) then the appropriate significance level for the individual tests is as follows: significance level for each test = overall significancei level 5% 4 = 1.25% In other words, a comparison actually needs to be significant at the 1.25% level accord-ing to the significance tables before we accept that it is significant at the equivalent of the 5% level. This essentially compensates for our generosity in doing many comparisons and reduces the risk of inadvertently capitalising on chance differences. (We adopted this CHAPTER 23 MULTIPLE COMPARISONS IN ANOVA 279 procedure for chi-square in Chapter 14.) Although this is the proper thing to do, we have often seen examples of analyses which fail to make this adjustment. Some researchers tend to stick with the regular 5% level per comparison no matter how many they are doing, although sometimes they point out the dangers of multiple comparisons without making an appropriate adjustment. So long as you adjust your critical values to allow for the number of comparisons made, there is nothing much wrong with using multiple t-tests. Indeed, this procedure, properly applied, is a slightly ‘conservative’ one in that it errs in favour of the null hypotheses. However, there are better procedures for making multiple comparisons which are especially convenient when using a computer. These include such procedures as the Scheffé test and the Duncan multiple range test. The advantage of these is that they report directly significance levels which are adjusted for the numbers of comparisons being made. Appendix K contains a table of t-values for use when there are a number of com-parisons being made (i.e. multiple comparisons). Say you wished to test the statistical significance of the differences between pairs of groups in a three-group one-way analy-sis of variance. This gives three different comparisons between the pairs. The significant t-test values for this are found under the column for three comparisons. 23.3 Planned versus a posteriori (post hoc) comparisons In the fantasy world of statisticians, there is a belief that researchers are meticulous in planning the last detail of their statistical analysis in advance of doing research. As such an ideal researcher, one would have planned in advance precisely what pairs of cells or conditions in the research are to be compared. These choices are based on the hypotheses and other considerations. In other words, they are planned comparisons. More usual, in our experience, is that the details of the statistical analysis are decided upon after the data have been collected. Psychological theory is often not so strong that we can predict from it the precise pattern of outcomes we expect. Comparisons decided upon after the data have been collected and tabulated are called a posteriori or post hoc comparisons. Since properly planned comparisons are not the norm in psychological research, for simplicity we will just consider the more casual situation in which comparisons are made as the data are inspected. (Basically, if your number of planned comparisons is smaller than your number of experimental conditions, then they can be tested by the multiple t-test without adjusting the critical values.) There are a number of tests which deal with the situation in which multiple comparisons are being made. These include Dunnett’s test, Duncan’s test and others. The Scheffé test will serve as a model of the sorts of things achieved by many of these tests and is prob-ably as good as any other similar test for general application. Some other tests are not quite so stringent in ensuring that the appropriate level of significance is achieved. 23.4 The Scheffé test for one-way ANOVA Although this can be computed by hand without too much difficulty, the computer output of the Scheffé test is particularly useful as it gives subsets of the groups (or conditions) in your experiment which do not differ significantly from each other. For example, take a look at the following: 280 PART 3 INTRODUCTION TO ANALYSIS OF VARIANCE Scheffea Condition Subset for alpha = .05 1 3 4.00 1 5.60 2 7.00 Sig. 0.424 a Uses Harmonic Mean Sample Size = 3.000. This indicates that groups 1, 2 and 3 are not significantly different from each other since they all belong in the same group. The right hand column indicates that all three condi-tions are in the same subset – it also gives the means involved. If you had significant dif-ferences between all three groups then you would have three subsets (subset 1, subset 2 and subset 3) each of which contained just one group. If groups 1 and 3 did not differ from each other but they both differed from group 2 then you would obtain something like the following: Scheffea Condition Subset for alpha = .05 1 2 3 4.00 1 5.60 2 7.00 Sig. 0.975 1.000 a Uses Harmonic Mean Sample Size = 3.000. Calculation 23.1 Multiple comparisons: the Scheffé test The calculation of the Scheffé test is straightforward once you have carried out an analysis of variance and have the summary table. The test tells you whether two group means in an ANOVA differ significantly from each other. Obviously the calculation has to be repeated for every pair of groups you wish to compare, but no adjustments are necessary for the number of pairs of groups being compared. The following worked example is based on the data in Calculation 20.1. Table 23.3 reminds us about the data and Table 23.4 is the analysis of variance summary table for that calculation. Î ... - tailieumienphi.vn
nguon tai.lieu . vn