Xem mẫu

110 Testing and Assessment in Cross-Cultural Psychology techniques.Allaloufetal.(1999)andBudgelletal.(1995)are other fine examples of this methodology in the literature. Exploratory, Replicatory Factor Analysis Many psychological tests, especially personality measures, have been subjected to factor analysis, a technique that has often been used in psychology in an exploratory fashion to identify dimensions or consistencies among the items composing a measure (Anastasi & Urbina, 1997). To estab-lish that the internal relationships of items or test components hold across different language versions of a test, a factor analysis of the translated version is performed. A factor analysis normally begins with the correlation matrix of all the items composing the measure. The factor analysis looks for patterns of consistency or factors among the items. There are many forms of factor analysis (e.g., Gorsuch, 1983) and tech-niques differ in many conceptual ways.Among the important decisions made in any factor analysis are determining the number of factors, deciding whether these factors are permit-ted to be correlated (oblique) or forced to be uncorrelated (orthogonal), and interpreting the resultant factors.Acompo-nent of the factor analysis is called rotation, whereby the dimensions are changed mathematically to increase inter-pretability. The exploratory factor analysis that bears upon the construct equivalence of two measures has been called replicatory factor analysis (RFA; Ben-Porath, 1990) and is a form of cross-validation. In this instance, the number of fac-tors and whether the factors are orthogonal or oblique are constrained to yield the same number of factors as in the orig-inal test. In addition, a rotation of the factors is made to attempttomaximallyreplicatetheoriginalsolution;thistech-nique is called target rotation. Once these procedures have been performed, the analysts can estimate how similar the factors are across solutions. van de Vijver and Leung (1997) provide indices that may be used for this judgment (e.g., the coefficient of proportionality). Although RFA has probably been the most used technique for estimating congruence (van de Vijver & Leung), it does suffer from a number of prob-lems.Oneoftheseissimplythatnewertechniques,especially confirmatory factor analysis, can now perform a similar analysis while also testing whether the similarity is statisti-cally significant through hypothesis testing. A second prob-lem is that different researchers have not employed standard procedures and do not always rotate their factors to a target solution (van de Vijver & Leung). Finally, many studies do not compute indices of factor similarity across the two solutions and make this discernment only judgmentally (van de Vijver & Leung). Nevertheless, a number of out- standing researchers (e.g., Ben-Porath, 1990; Butcher, 1996) have recommended the use of RFA to establish equivalence and this technique has been widely used, especially in valida-tion efforts for various adaptations of the frequently trans-lated MMPI and the Eysenck Personality Questionnaire. Regression Regression approaches are generally used to establish the relationships between the newly translated measure and measures with which it has traditionally correlated in the original culture. The new test can be correlated statistically with other measures, and the correlation coefficients that re-sult may be compared statistically with similar correlation coefficients found in the original population. There may be one or more such correlated variables. When there is more than one independent variable, the technique is called multi-ple regression. In this case, the adapted test serves as the de-pendent variable, and the other measures as the independent variables. When multiple regression is used, the independent variables are used to predict the adapted test scores. Multiple regression weights the independent variables mathematically to optimally predict the dependent variable. The regression equation for the original test in the original culture may be compared with that for the adapted test; where there are dif-ferences between the two regression lines, whether in the slope or the intercept, or in some other manner, bias in the testing is often presumed. If the scoring of the original- and target-language mea-sures is the same, it is also possible to include cultural group membership in a multiple regression equation. Such a nomi-nal variable is added as what has been called dummy-coded variable. In such an instance, if the dummy-coded variable is assigned a weighting as part of the multiple regression equa-tion,indicatingthatitpredictstestscores,evidenceofcultural differences across either the two measures or the two cultures may be presumed (van de Vijver & Leung, 1997). Structural Equation Modeling, Including Confirmatory Factor Analysis Structural equation modeling (SEM; Byrne, 1994; Loehlin, 1992) is a more general and statistically sophisticated proce-dure that encompasses both factor analysis and regression analysis, and does so in a manner that permits elegant hy-pothesis testing. When SEM is used to perform factor analy-sis, it is typically called a confirmatory factor analysis, which is defined by van de Vijver and Leung (1997) as “an exten-sion of classical exploratory factor analysis. Specific to confirmatory factor analysis is the testing of a priori speci- fied hypotheses about the underlying structure, such as the Methods of Evaluating Test Equivalence 111 number of factors, loadings of variables on factors, and factor correlations” (p. 99). Essentially, the results of factor-analytic studies of the measure in the original language are con-strained upon the adapted measure, data from the adapted measure analyzed, and a goodness-of-fit statistical test is performed. Regression approaches to relationships among a number of tests can also be studied with SEM. Elaborate models of relationships among other tests, measuring variables hypoth-esized and found through previous research to be related to the construct measured by the adapted test, also may be tested using SEM. In such an analysis, it is possible for a researcher to approximate the kind of nomological net conceptualized by Cronbach and Meehl (1955), and test whether the struc-ture holds in the target culture as it does in the original culture. Such a test should be the ideal to be sought in estab-lishing the construct equivalence of tests across languages and cultures. Item-Response Theory Item-response theory (IRT) is an alternative to classical psy-chometric true-score theory as a method for analyzing test data. Allen and Walsh (2000) and van de Vijver and Leung (1997) provide descriptions of the way that IRT may be used to compare items across two forms of a measure that differ by language.AlthoughadetaileddescriptionofIRTisbeyondthe scopeofthischapter,thebriefestofexplanationsmayprovide a conceptual understanding of how the procedure is used, especially for cognitive tests. An item characteristic curve (ICC) is computed for each item. This curve has as the x axis the overall ability level of test takers, and as the y axis, the probability of answering the question correctly. Different IRT modelshavedifferentnumbersofparameters,withone-,two-andthree-parametermodelsmostcommon.Theseparameters correspond to difficulty, discrimination, and the ability to get theanswercorrectbychance,respectively.TheICCcurvesare plotted as normal ogive curves. When a test is adapted, each translated item may be compared across languages graphi-cally by overlaying the two ICCs as well as by comparing the item parameters mathematically. If there are differences, these may be considered conceptually.This method, too, may be considered as one technique for identifying item bias. Methods to Establish Linkage of Scores Once the conceptual equivalence of an adapted measure has been met, researchers and test developers often wish to pro-vide measurement-unit and metric equivalence, as well. For mostmeasures,thisrequirementismetthroughtheprocessof test equating.As noted throughout this chapter, merely trans-lating a test from one language to another, even if cultural biases have been eliminated, does not insure that the two different-language forms of a measure are equivalent. Con-ceptual or construct equivalence needs to be established first. Oncesuchastephasbeentaken,thenonecanconsiderhigher levels of equivalence. The mathematics of equating may be found in a variety of sources (e.g., Holland & Rubin, 1982; Kolen & Brennan, 1995), and Cook et al. (1999) provide an excellent integration of research designs and analysis for test adaptation; research designs for such studies are abstracted in the following paragraphs. Sireci (1997) clarified three experimental designs that can be used to equate adapted forms to their original-language scoring systems and, perhaps, norms. He refers to them as (a)theseparate-monolingual-groupsdesign,(b)thebilingual-group design, and (c) the matched-monolingual-groups de-sign.Abrief description of each follows. Separate-Monolingual-Groups Design In the separate-monolingual-groups design, two different groups of test takers are involved, one from each language or cultural group.Although some items may simply be assumed to be equivalent across both tests, data can be used to support thisassumption.Theseitemsserveaswhatisknowninequat-ing as anchor items. IRT methods are then generally used to calibrate the two tests to a common scale, most typically the one used by the original-language test (Angoff & Cook, 1988; O’Brien, 1992; Sireci, 1997). Translated items must then be evaluated for invariance across the two different-language test forms; that is, they are assessed to determine whether their difficulty differs across forms.This design does not work effectively if the two groups actually differ, on av-erage, on the characteristic that is assessed (Sireci); in fact, in such a situation, one cannot disentangle differences in the ability measured from differences in the two measures. The method also assumes that the construct measured is based on a single, unidimensional factor. Measures of complex con-structs, then, are not good prospects for this method. Bilingual-Group Design In the bilingual-group design, a single group of bilingual in-dividuals takes both forms of the test in counterbalanced order.An assumption of this method is that the individuals in the group are all equally bilingual, that is, equally proficient in each language. In Maldonado and Geisinger (in press), all participants first were tested in both Spanish and English competence to gain entry into the study. Even under such re- strictive circumstances, however, a ceiling effect made a true 112 Testing and Assessment in Cross-Cultural Psychology assessment of equality impossible. The problem of finding equallybilingualtesttakersisalmostinsurmountable.Also,if knowledge of what is on the test in one language affects per-formance on the other test, it is possible to use two randomly assigned groups of bilingual individuals (where their level of language skill is equated via randomization). In such an in-stance, it is possible either to give each group one of the tests or to give each group one-half of the items (counterbalanced) from each test in a nonoverlapping manner (Sireci, 1997). Finally, one must question how representative the equally bilingual individuals are of the target population; thus the external validity of the sample may be questioned. Matched-Monolingual-Groups Design This design is conceptually similar to the separate-monolingual-groups design, except that in this case the study participants are matched on the basis of some variable ex-pected to correlate highly with the construct measured. By being matched in this way, the two groups are made more equal, which reduces error. “There are not many examples of the matched monolingual group linking design, probably due to the obvious problem of finding relevant and available matching criteria” (Sireci, 1997, p. 17). The design is never-theless an extremely powerful one. CONCLUSION PsychologyhasbeencritiquedashavingaEuro-Americanori-entation(Moreland,1996;Padilla&Medina,1996).Moreland wrote, Koch (1981) suggests that American psychologists . . . are trained in scientific attitudes that Kimble (1984) has character-ized as emphasizing objectivity, data, elementism, concrete mechanisms, nomothesis, determinism, and scientific values. Dana (1993) holds that multicultural research and practice should emanate from a human science perspective characterized by the opposite of the foregoing terms: intuitive theory, holism, abstract concepts, idiography, indeterminism, and humanistic values. (p. 53) Morelandbelievedthatthisdichotomywasafalseone.Never-theless, he argued that a balance of the two approaches was needed to understand cultural issues more completely. One of theadvantagesofcross-culturalpsychologyisthatitchallenges many of our preconceptions of psychology. It is often said that onelearnsmuchaboutone’sownlanguagewhenlearningafor- eigntongue.Theanalogyforpsychologyisclear. Assessment in cross-cultural psychology emphasizes an understanding of the context in which assessment occurs. The notion that traditional understandings of testing and as-sessment have focused solely on the individual can be tested in this discipline. Cross-cultural and multicultural testing help us focus upon the broader systems of which the individ-ual is but a part. Hambleton (1994) stated, The common error is to be rather casual about the test adaptation process, and then interpret the score differences among the sam-ples or populations as if they were real. This mindless disregard of test translation problems and the need to validate instruments in the cultures where they are used has seriously undermined the results from many cross cultural studies. (p. 242) This chapter has shown that tests that are adapted for use in different languages and cultures need to be studied for equivalence. There are a variety of types of equivalence: lin-guistic equivalence, functional equivalence, conceptual or construct equivalence, and metric equivalence. Linguistic equivalence requires sophisticated translation techniques and an evaluation of the effectiveness of the translation. Func-tional equivalence requires that those translating the test be awareofculturalissuesintheoriginaltest,intheconstruct,in the target culture, and in the resultant target test. Conceptual equivalence requires a relentless adherence to a construct-validation perspective and the conduct of research using data from both original and target tests. Metric equivalence, too, involvescarefulanalysesofthetestdata.Therequirementsof metric equivalence may not be met in many situations regard-lessofhowmuchwewouldliketousescoringscalesfromthe original test with the target test. If equivalence is one side of the coin, then bias is the other. Construct bias, method bias and item bias can all influence the usefulness of a test adaptation in detrimental ways. The need for construct-validation research on adapted measures is reiterated; there is no more critical point in this chapter. In ad-dition, however, it is important to replicate the construct val-idation that had been found in the original culture with the original test. Factor analysis, multiple regression, and struc-tural equation modeling permit researchers to assess whether conceptual equivalence is achieved. The future holds much promise for cross-cultural psychol-ogyandfortestingandassessmentwithinthatsubdisciplineof psychology. There will be an increase in the use of different formsoftestsusedinboththeresearchandthepracticeofpsy-chology. In a shrinking world, it is clearer that many psycho-logical constructs are likely to hold for individuals around the world,oratleastthroughoutmuchofit.Knowledgeofresearch from foreign settings and in foreign languages is much more Appendix 113 accessible than in the recent past. Thus, researchers may take advantage of theoretical understandings, constructs, and their measurement from leaders all over the world. In applied set-tings, companies such as Microsoft are already fostering a worldinwhichtests(suchasforsoftwareliteracy)areavailable in dozens of languages. Costs of test development are so high that adaptation and translation of assessment materials can make the cost of professional assessment cost-effective even in developingnations,wherethebenefitsofpsychologicaltest-ing are likely to be highest. Computer translations of language are advancing rapidly. In some future chapter such as this one, the author may direct that the first step is to have a computer perform the first translation of the test materials. As this sen-tence is being written, we are not yet there; human review for cultural and language appropriateness continues to be needed. Yet in the time it will take for these pages to be printed and read, these words may have already become an anachronism. The search for psychological universals will continue, as will the search for cultural and language limitations on these characteristics.Psychologicalconstructs,bothofmajorimport and of more minor significance, will continue to be found that donotgeneralizetodifferentcultures.Thefactthattheworldis shrinking because of advances in travel and communications does not mean we should assume it is necessarily becoming moreWestern—moreAmerican.Todosois,atbest,pejorative. These times are exciting, both historically and psychome-trically. The costs in time and money to develop new tests in each culture are often prohibitive. Determination of those as-pects of a construct that are universal and those that are cul-turally specific is critical. These are new concepts for many psychologists; we have not defined cultural and racial con-cepts carefully and effectively and we have not always incor-poratedtheseconceptsintoourtheories(Betancourt&López, 1993; Helms, 1992). Good procedures for adapting tests are available and the results of these efforts can be evaluated. Testing can help society and there is no reason for any coun-try to hoard good assessment devices.Through the adaptation procedures discussed in this chapter they can be shared. APPENDIX Guidelines of the International Test Commission for Adapting Tests (van de Vijver & Leung, 1997, and Hambleton, 1999) The initial guidelines relate to the testing context, as follows. 2. The amount of overlap in the constructs in the popula-tions of interest should be assessed. The following guidelines relate to test translation or test adaptation. 3. Instrument developers/publishers should ensure that the translation/adaptation process takes full account of lin-guistic and cultural differences among the populations for whom the translated/adapted versions of the instru-ment are intended. 4. Instrument developers/publishers should provide evi-dence that the language used in the directions, rubrics, and items themselves as well as in the handbook [is] appropriate for all cultural and language populations for whom the instruments is intended. 5. Instrument developers/publishers should provide evi-dence that the testing techniques, item formats, test con-ventions, and procedures are familiar to all intended populations. 6. Instrument developers/publishers should provide evi-dence that item content and stimulus materials are famil-iar to all intended populations. 7. Instrument developers/publishers should implement sys-tematic judgmental evidence, both linguistic and psy-chological, to improve the accuracy of the translation/ adaptation process and compile evidence on the equiva-lence of all language versions. 8. Instrument developers/publishers should ensure that the data collection design permits the use of appropriate sta-tistical techniques to establish item equivalence between the different language versions of the instrument. 9. Instrument developers/publishers should apply appropri-ate statistical techniques to (a) establish the equivalence of the different versions of the instrument and (b) iden-tify problematic components or aspects of the instrument which may be inadequate to one or more of the intended populations. 10. Instrument developers/publishers should provide infor-mation on the evaluation of validity in all target pop-ulations for whom the translated/adapted versions are intended. 11. Instrument developers/publishers should provide statisti-cal evidence of the equivalence of questions for all in- tended populations. 1. Effects of cultural differences that are not relevant or im- 12. Nonequivalent questions between versions intended portant to the main purposes of the study should be min- imized to the extent possible. for different populations should not be used in preparing a common scale or in comparing these populations. 114 Testing and Assessment in Cross-Cultural Psychology However, they may be useful in enhancing content valid-ity of scores reported for each population separately. [emphasis in original] The following guidelines relate to test administration. 13. Instrumentdevelopersandadministratorsshouldtrytoan-ticipatethetypesofproblemsthatcanbeexpectedandtake appropriate actions to remedy these problems through the preparationofappropriatematerialsandinstructions. 14. Instrument administrators should be sensitive to a num-ber of factors related to the stimulus materials, adminis-trationprocedures,andresponsemodesthatcanmoderate the validity of the inferences drawn from the scores. 15. Those aspects of the environment that influence the ad-ministration of an instrument should be made as similar as possible across populations for whom the instrument is intended. 16. Instrument administration instructions should be in the source and target languages to minimize the influence of unwanted sources of variation across populations. 17. Theinstrumentmanualshouldspecifyallaspectsofthein-strumentanditsadministrationthatrequirescrutinyinthe application of the instrument in a new cultural context. 18. The administration should be unobtrusive, and the examiner-examinee interaction should be minimized. Explicit rules that are described in the manual for the instrument should be followed. The final grouping of guidelines relate to documentation that is suggested or required of the test publisher or user. 19. When an instrument is translated/adapted for use in an-other population, documentation of the changes should be provided, along with evidence of the equivalence. 20. Score differences among samples of populations admin-istered the instrument should not be taken at face value. The researcher has the responsibility to substantiate the differences with other empirical evidence. [emphasis in original] 21. Comparisons across populations can only be made at the level of invariance that has been established for the scale on which scores are reported. 22. The instrument developer should provide specific infor-mationonthewaysinwhichthesocioculturalandecolog-ical contexts of the populations might affect performance on the instrument and should suggest procedures to ac- countfortheseeffectsintheinterpretationofresults. REFERENCES Allalouf, A., Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educa-tional Measurement, 36, 185–198. Allen, J., & Walsh, J. A. (2000). A construct-based approach to equivalence: Methodologies to cross-cultural/multicultural per-sonality assessment research. In R. H. Dana (Ed.), Handbook of cross-cultural and multicultural personality assessment (pp. 63– 85). Mahwah, NJ: Erlbaum. Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall. Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508– 600). Washington, DC: American Council on Education. Angoff, W. H., & Cook, L. L. (1988). Equating the scores of the “Prueba de Aptitud Academica” and the “Scholastic Aptitude Test” (Report No. 88-2). New York: College Entrance Examina-tion Board. Ben-Porath, Y. S. (1990). Cross-cultural assessment of personality: The case of replicatory factor analysis. In J. N. Butcher & C. D. Spielberger (Eds.), Advances in personality assessment (Vol. 8, pp. 27–48). Hillsdale, NJ: Erlbaum. Berry, J. W. (1980). Introduction to methodology. In H. C. Triandis & J. W. Berry (Eds.), Handbook of cross-cultural psychology (Vol. 2, pp. 1–28). Boston: Allyn and Bacon. Betancourt, H., & López, S. R. (1993). The study of culture, ethnic-ity, and race in American psychology. American Psychologist, 48, 629–637. Bracken, B. A., & Barona, A. (1991). State of the art procedures for translating, validating, and using psychoeducational tests in cross-cultural assessment. School Psychology International, 12, 119–132. Bracken, B. A., Naglieri, J., & Bardos, A. (1999, May). Nonverbal assessment of intelligence: An alternative to test translation and adaptation. Paper presented at the International Conference on Test Adaptation, Washington, DC. Brislin, R. W. (1970). Back translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1, 185–216. Brislin, R. W. (1980). Translation and content analysis of oral and written material. In H. C. Triandis & J. W. Berry (Eds.), Hand-bookofcross-culturalpsychology,Vol.2:Methodology(pp.389– 444). Needham Heights, MA:Allyn and Bacon. Brislin,R.W.(1986).Thewordingandtranslationofresearchinstru-ments. In W. J. Lonner & J. W. Berry (Eds.), Field methods in cross-culturalresearch(pp.137–164).NewberryPark,CA:Sage. Brislin, R. W. (1993). Understanding culture’s influence on behav-ior. New York: Harcourt Brace. Budgell, G. R., Raju, N. S., & Quartetti, D. A. (1995). Analysis of differential item functioning in translated assessment instru- ments. Applied Psychological Measurement, 19, 309–321. ... - tailieumienphi.vn
nguon tai.lieu . vn