Lee Jussim, Thomas R. Cain, Jarret T. Crawford, Kent Harber, Florette Cohen, 2009.
ARE STEREOTYPES EMPIRICALLY INACCURATE?
Stereotype Accuracy and Levels of Analysis
The following statement summarizes a class of criticisms of stereotype accuracy that has periodically appeared in the social psychological literature (e.g., APA, 1991; Fiske, 1998; Nelson, 2002; Schneider, 2004; Stangor, 1995):
Even if it can be successfully shown that perceivers accurately judge two groups to differ on some attribute: (a) Perceivers should not assume that their stereotypes of the group automatically fit all members of the group; (b) perceivers cannot apply their beliefs about the group when judging individuals, they are likely to be wrong much of the time because few members perfectly fit the stereotype.
… Assessments of any individual based solely on stereotypes will generally be lacking. However, this logic implies nothing about stereotype accuracy. Instead, it is a claim about the accuracy of applying stereotypes of groups to specific group members.
… First, how accurate are people’s beliefs about groups? Just as a person might not accurately remember how many games Roger Clemens won in 2010 (inaccuracy in person perception) and still remember that the Yankees won the World Series that year (accurate belief about Clemens’s group), inappropriate application of a stereotype does not mean that the stereotype is itself inaccurate. A person may correctly know that, on average, women earn about 70% of what men earn, but have no accurate knowledge whatsoever about how much Nancy earns.
Second, does people’s use or disuse of stereotypes in judging individuals increase or reduce the accuracy with which they perceive differences between small groups of individuals with whom they have personally come into contact? This is the accuracy version of the “stereotypes and person perception” question. Do, for example, general stereotypes of male superiority in athletics lead the coach of a soccer team to erroneously view the particular boys on the team as better than the particular girls on the team, when they really have equal skill? […]
Correspondence With Real Differences: High Accuracy
How much correspondence should be considered “accurate”? … we adocate holding people to a high standard – the same standards to which social scientists hold themselves.
J. Cohen (1988), in his classic statistical treatise imploring social scientists to examine the size of the effects they obtained in their studies and not just the “statistical significance” of the results, suggested that effect sizes above .8 could be considered “large.” Such an effect size translates into a correlation of about .4 … . By this standard, correlations of .4 and higher could be considered accurate because they represent a “large” correspondence between stereotype and reality.
This standard has been supported by two recent studies that have examined the typical effect sizes found in clinical and social psychological research. One recent review of more than 300 meta-analyses – which included more than 25,000 studies and over 8 million human participants – found that mean and median effect sizes in social psychological research were both about .2 (Richard et al., 2003). Only 24% of social psychological effects exceeded .3. A similar pattern has been found for the phenomena studied by clinical psychologists (Hemphill. 2003). Psychological research rarely obtains effect sizes exceeding correlations of 3. Effect sizes of .4 and higher, therefore, constitute a strong standard for accuracy. Last, according to Rosenthal’s (1991) binomial effect size display, a correlation of at least .4 roughly translates into people being right at least 70% of the time. This means they are right more than twice as often as they are wrong. That seems like an appropriate cutoff for considering a stereotype reasonably accurate.
ACCURACY OF ETHNIC AND RACIAL STEREOTYPES
Table 10.1 summarizes the results of all studies assessing the accuracy of racial and ethnic stereotypes that met our criteria for inclusion. We review the most noteworthy of their results here. First, consensual stereotype discrepancies are a mix of accurate and inaccurate beliefs. Nonetheless, most judgments were either accurate or near misses. Only a minority were more than 20% off.
Second, people’s consensual stereotype discrepancies for between groupdifferences are consistently more accurate than are their consensual stereotype discrepancies for characteristics within groups. For example, in the Ryan (1996) study, Whites’ consensual stereotypes regarding Whites and regarding African Americans each were accurate only 5 of 17 times (10 of 34, total). However, their judgments of differences between Whites and African Americans were accurate 9 times out of 17. A similar pattern occurred in the McCauley and Stitt (1978) study (see Table 10.1).
Third, these results provide little support for the idea that stereotypes typically exaggerate real differences. Exaggeration occurred, but it occurred no more often than did underestimation, with one exception. The only study to assess the accuracy of personal discrepancies found a plurality of people to underestimate real differences (Ashton & Esses, 1999, summarized in Table 10.1).
Fourth, the extent to which people’s stereotypes corresponded with reality was strikingly high. Consensual stereotype accuracy correlations ranged from .53 to .93. Personal stereotype accuracy correlations were somewhat lower, but still quite high by any standard, ranging from .36 to .69.
ACCURACY OF GENDER STEREOTYPES
Table 10.2 summarizes the results of all studies of gender stereotypes that met our criteria for inclusion. Results are broadly consistent with those for ethnic and racial stereotypes. In most cases, at least a plurality of judgments was accurate, and accurate plus near miss judgments predominate in every study. Inaccuracy constituted a minority of results. Again, some results showed that people exaggerated real differences. There was, however, no support for the hypothesis that stereotypes generally lead people to exaggerate real differences. As with race, underestimations counterbalanced exaggerations.
Again, consensual stereotype accuracy correlations were quite high, ranging from .34 to .98, with most falling between .66 and .80. The results for personal stereotypes were more variable. Once they were inaccurate, with a near-zero correlation with criteria (Beyer, 1999, perceptions female targets). In general, though, they were at least moderately, and sometimes highly accurate (most correlations ranged from .40-.60; see Table 10.2).
STRENGTHS AND WEAKNESSES OF RESEARCH ON THE ACCURACY OF RACIAL, ETHNIC, AND GENDER STEREOTYPES
Several methodological aspects of these studies are worth noting because they bear on the generalizability of the results. First, although most of the studies only assessed the accuracy of undergraduates’ stereotypes, several assessed the accuracy of samples of adults (McCauley & Stitt, 1978; McCauley & Thangavelu, 1991; McCauley, Thangavelu, & Rozin, 1988). Some of the highest levels of accuracy occurred with these adult samples, suggesting that the levels of accuracy obtained do not represent some artifact resulting from the disproportionate study of undergraduate samples. Nonetheless, additional research on the accuracy of noncollege samples is still needed.
Second, the studies used a wide variety of criteria: U.S. Census data, self-reports, Board of Education data, nationally representative surveys, locally representative surveys, government reports, and so on. The consistency of the results across studies, therefore, does not reflect some artifact resulting from use of any particular criteria.
Third, the studies examined a wide range of stereotypes: beliefs about demographic characteristics (McCauley & Stitt, 1978; Wolsko, Park, Judd, & Wittenbrink, 2000), academic achievement (Ashton & Esses, 1999; McCauley & Stitt, 1978; Wolsko et al., 2000), and personality and behavior (Ryan, 1996; Wolsko et al., 2000). The consistency of the results across studies, therefore, does not reflect some artifact resulting from the study of a particular type of stereotype.
Fourth, personal discrepancies were the least studied of the four types of accuracy. Thus, the studies do not provide much information about the extent to which individual people’s stereotypes deviate from perfection.
Despite the impressive and surprising evidence of the accuracy of stereotypes, there is some consistent evidence of inaccuracy in stereotypes. In the United States, political stereotypes tend to have little accuracy (e.g., Judd & Park, 1993). Many people in the United States seem to have little knowledge or understanding of the beliefs, attitudes, and policy positions of Democrats and Republicans.
A recent large-scale study conducted in scores of countries found that there is also little evidence of accuracy in national stereotypes regarding personality (Terraciano et al., 2005). It is probably not surprising that people on different continents have little accurate knowledge about one another’s personality (e.g., that Indonesians do not know much about, say, Canadians, is not very surprising). However, somewhat more surprising is that people from cultures with a great deal of contact (various Western European countries; Britain and the United States) also have highly inaccurate beliefs about one another’s personality characteristics.
Although the Terraciano et al. (2005) study was impressive in scope and innovative in topic, it suffers from one of the limitations that excluded several studies from this review. Specifically, the criteria samples were haphazard samples of convenience, rather than random samples obtained from target populations. The extent to which this explains their low level of accuracy is unknown until research is conducted on the same topic that obtains criteria from random samples. In general, why some stereotypes have such high levels of accuracy and other such low levels is currently unclear and is an important area of future research.
What Should People Do With Useful but Not Definitive Individuating Information?
Alaska Versus New York
You get one piece of information about each location. You learn that Jane, a lifelong resident of anchorage, considers it “cold” today and Jan, a lifelong resident of New York, considers it “cold” today. Note that the “information” that you have is identical regarding the two places. Should you, therefore, predict that they have identical temperatures?
That would be silly. It ignores the wealth of information you already bring to bear on the situation: (a) It is usually much colder in Anchorage; (b) “cold” can mean lots of different things in different contexts; and (c) people usually adapt to their conditions, so, if it is usually 60 degrees in your neighborhood, you would probably judge 20 degrees as cold; but if it is usually 60 degrees in your neighborhood, 40 might be seen as quite cold. To ignore all this would be foolish, and, most of the time, doing so will lead you to an inaccurate conclusion about the weather in the two places.
In other words, in this situation, to the extent that your beliefs about the general characteristics of Alaska, Alaskans, New York, and New Yorkers are reasonaby accurate, they should influence your interpretation of “cold” and your prediction regarding the weather in each place.
Stereotypes and Person Perception
The logic here is identical. Consider stereotypes of peace activists and al Qaeda members. You hear the same thing about an individual from each group: They have “attacked” the United States. Should you interpret this to mean that they engaged in identical behaviors? Not likely. The attack perpetrated by the peace activist is most likely a verbal “attack” on U.S. war policies; the al Qaeda attack is probably something far more lethal.
The same principles hold regardless of whether the stereotypes inolve groups for whom stereotypes are deemed acceptable (e.g., peace activists or al Qaeda) or groups for whom stereotypes are deemed socially unacceptable (e.g., genders, nationalities, races, social classes, religions, ethnicities, etc.). For example, if we learn both Bob and Barb are regarded as “tall,” should we conclude that they are exactly equal in height? Of course not. Undoubtedly. Bob is tall for a man, and Barb is tall for a woman, and, because men are, on average, taller than women, tall means different objective heights for men and women (implicit acceptance of these “shifting standards” has been thoroughly demonstrated; e.g., Biernat, 1995).
What about judgments about more socially charged attributes, such as intelligence, motivation, assertiveness, social skill, hostility, and so on? The same principles apply. If the stereotype is accurate and one only has a small bit of ambiguous information about an individual, using the stereotype as a basis for judging the person will likely enhance accuracy. For the statistically inclined, this is a very basic application of Bayes’s theorem (e.g., McCauley, Stitt, & Segal, 1980) and principles of regression (Jussim, 1991). Let’s assume for a moment that 30% of motorcycle gang members are arrested for violent behavior at some point in their lives, and 0.3% of ballerinas are arrested for violent behavior at some point in their lives. People who know this are being completely reasonable and rational if, on dark streets or at lonely train stations, they avoid the bikers more than ballerinas, in the absence of much other individuating information about them.
In all of these cases, the stereotype “biases” the subsequent judgments. At least, that is how such influences have nearly always been interpreted in empirical social psychological research on stereotypes (see, e.g., Devine, 1995; Fiske & Neuberg, 1990; Gilbert, 1995; Jones, 1986). It is probably more appropriate, however, to characterize such phenomena as stereotypes “influencing” or “informing” judgments. Such effects mean that people are appropriately using their knowledge about groups to reach as informed a judgment as possible under difficult and information-poor circumstances. If their knowledge is reasonably accurate, relying on the stereotype will usually increase, rather than decrease, the accuracy of those judgments (see also Jussim, 1991, 2005).
No Individuating Information
Alaska and New York
If you are given absolutely no information, and are asked to predict today’s high temperature in Anchorage and New York, what should you do? If you know anything about the climate in the two places, you will predict that it will be warmer in New York. Indeed, you should predict this every time you are asked to do so. Would this mean your beliefs about climate are somehow irrationally and rigidly resistant to change? Of course not. All it means is that you recognize that, when two regions systematically differ and you are asked to predict the day’s temperature, and are given no other information, it will always be better to guess that the place with the higher average temperature is warmer than the place with the lower average temperature.
Stereotypes and Person Perception
If you are given no information other than race, and you are asked to predict the income of Bill, who is African American, and George, who is White, what should you do? If you know about the average incomes of African Americans and Whites in the United States, you will predict that George is richer. Indeed, you should predict this every time you are asked to make a prediction about the income of an African American and White target about whom you have no ther information. Would this mean your beliefs about racial differences in income are somehow irrationally and rigidly resistant to change? Of course not. All it means is that you recognize that, when the average income of two racial groups differs and you are asked to predict the income of an individual from those groups, and are given no other information, it will always be better to guess that the person from the group with the higher average income has more income.
What Do People Do When They Judge Individuals?
People should primarily use individuating information, when it is available, rather than stereotypes when judging others. Do they? This area of research has been highly controversial, many researchers emphasizing the power of stereotypes to bias judgments (Devine, 1995; Fiske & Neuberg, 1990; Fiske & Taylor, 1991; Jones, 1986; Jost & Kruglanski, 2002) and others emphasizing the relatively modest influence of stereotypes and the relatively large role of individuating information (Jussim, Eccles, & Madon, 1996; Kunda & Thagard, 1996).
Forunately, literally hundreds of studies have now been performed that address this issue, and, even more fortunately, multiple meta-analyses have been performed summarizing their results. Table 10.3 presents the results from meta-analyses of studies assessing stereotype bias in many contexts. It shows that the effects of stereotypes on person judgments, averaged over hundreds of experiments, range from 0 to .25. The simple arithmetic mean of the effect sizes is .10, which is an overestimate, because the meta-analyses with more studies yielded systematically lower effect sizes (r = -.43 between effect size and number of studies). The few naturalistic studies of the role of stereotypes in biasing person perception have yielded similarly small effects (e.g., Clarke & Campbell, 1955; Jussim et al., 1996; Madon et al., 1998).
How small is an effect of r = .10? It is small according to J. Cohen’s (1988) heuristic categorization of effect sizes. It is among the smallest effects found in social psychology (Richard et al.. 2003). An overall effect of .10 means that expectancies substantially influence social perceptions about 5% of the time (as per Rosenthal’s  binomial effect size display). This means that stereotypes do not influence perceptions 95% of the time. […]
Accuracy in Perception of Small Group Differences
Madon et al. (1998) examined the accuracy of seventh-grade teachers’ perceptions of their students’ performance, talent, and effort at math about 1 month into the school year. Madon et al. assessed accuracy in the following manner. First they identified the teachers’ perceptions of group differences by correlating teachers’ perceptions of individual students with the students’ race, sex, and social class. This correlation indicated the extent to which teachers systematically evaluated individuals from one group more favorably than individuals from another group. Next, Madon et al. assessed actual group differences in performance, talent, and effort by correlating individual students’ final grades the prior year (before teachers knew the students), standardized test scores, and self-reported motivation and effort with students’ race, sex, and social class. The teachers’ accuracy was assessed by correlating the teachers’ perceived differences between groups with the groups’ actual differences.
Madon et al. (1998) found that teachers were mostly accurate. The correlation between teachers’ perceived group differences and actual group differences was r = .71. The teachers’ perceptions of sex differences in effort, however, were highly inaccurate—they believed girls exerted more effort than boys, but there was no sex difference in self—reported motivation and effort. When this outlier was removed, the correlation between perceived and actual group differences increased to r = .96.
We are aware of only two other studies that have addressed whether people systematically and unjustifiably favor or disparage individuals belonging to certain groups (Clarke & Campbell, 1955; Jussim et al., 1996). Both yielded evidence of accuracy accompanied by small bias. […]
Does Relying on a Stereotype Increase or Reduce Accuracy in Person Perception?
Occupational Stereotypes: C. E. Cohen (1981)
C. E. Cohen (1981) examined whether people more easily remember behaviors and attributes that are consistent with a stereotype than those that are inconsistent with that stereotype. Perceivers in her study viewed a videotape of a dinner conversation between a husban and wife (they were actually husband and wife, but they were also experimental confederates trained by Cohen). Half of the time, this conversation led perceivers to believe the woman was a waitress; half of the time the conversation led perceivers to believe the woman was a librarian. The remainder of the conversation conveyed an equal mix of librarian-like and waitress-like attributes and behaviors.
Perceivers were then given a series of choices regarding objective aspects of the woman in the videotape (e.g., wore glasses . . . did not wear glasses). Their task was to select the correct description. Perceivers consistently remembered 5% to 10% more behaviors or features that were consistent with the woman’s supposed occupation than behaviors or features that were inconsistent with her supposed occupation. For example, they were more likely to accurately remember that the “librarian” wore glasses and liked classical music, whereas they were more accurately remember that the “waitress” had a beer and no artwork in her house (even though the tape was identical, showing the woman wearing glasses, liking classical music, having a beer, and not having artwork). This pattern occurred across two studies and regardless of whether the memory test occurred immediately after the videotape or up to 7 days later Thus, it appeared that people selectively remembered stereotype-consistent information better than they remembered stereotype-inconsistent information.
C. E. Cohen (1981) also reported results regarding the accuracy of her perceivers’ memories. Across the two studies, accuracy levels were quite high – ranging from a low of 57% to a high of 88% and averaging about 75% in the first study and about 66% in the second study. Overall, therefore, she found high (about 70%) accuracy and small (about 5%-10%) but real bias.
The results from her second study were particularly relevant with respect to understanding whether the stereotype increased or reduced accuracy. In this study, half of the perceivers learned of the woman’s supposed occupation before viewing the tape. In comparison to receiving the label after viewing the tape, when people received the label first, they more accurately remembered both stereotype-consistent and stereotype-inconsistent information. On average they correctly remembered 70% of the target’s attributes (regardless of their degree of stereotype consistency) when they received the label first; they correctly remembered only about 63% of the target’s attributes when they received the label last. The upshot here, therefore, is that, although the label biased memory in such a manner as to favor stereotype-consistent information, having the label up front also increased overall accuracy.
Why? Most likely, the label provided some sort of organizing scheme for perceivers, which facilitated their understanding and interpretation of both stereotype-consistent and stereotype-inconsistent attributes. Stereotypes may “bias” perception and, simultaneously, increase accuracy.
Residence Hall Stereotypes: Brodt and Ross (1998)
The utility of an accurate stereotype was also demonstrated by Brodt and Ross (1998). College students made predictions about the behaviors and preferences of other college students who lived in one of two dormitories. The students in the “preppie” dorm were widely seen as politically conservative, wealthy, and conventional. The students in the “hippie” dorm were widely seen as politically left wing with unconventional practices and preferences. Perceivers (other students who did not live in either dorm) viewed photographs of individual targets, were informed of each target’s dorm, and then made predictions about each target’s behaviors and attitudes. Perceivers’ predictions were then compared to the targets self-reports on these same preferences and attitudes.
When perceivers predicted targets to be consistent with their dorm (for a preppie dorm resident to have preppie attributes or for a hippie dorm resident to have hippie attributes), 66% of their predictions were correct (they matched the targets self-reports). When perceivers jettisoned their dorm stereotypes, and predicted targets to be inconsistent with their dorm, 43% of their predictions were correct. Relying on the preppie-hippie dorm stereotypes enhanced the accuracy of person perception predictions. […]
Sex Stereotypes: Jussim et al. (1996) and Madon et al. (1998)
Both Jussim et al. (1996) and Madon et al. (1998) examined the accuracy of teacher expectations. (Madon et al., 1998, was described previously; Jussim et al., 1996, was similar, except that it was conducted in sixth grade rather than seventh grade, and it did not examine the accuracy of perceived differences between students from different demographic groups.) Both found that, when controlling for individuating information (motivation, achievement, etc.), student social class and race or ethnicity had little or no effect on teacher expectations. Thus, teachers essentially jettisoned their social class and ethnic stereotypes when judging differences between children from different social class and ethnic backgrounds. Although this finding is in many ways laudable, teachers relying entirely on individuating information does not help address the question of whether relying on a stereotype increases or reduces accuracy.
Both studies, however, found that sex stereotypes biased teachers’ perceptions of boys’ and girls’ performance (standardized regression coefficients of .09 and .10 for performance, and .16 and .19 for effort, for Madon et al. and Jussim et al., respectively). In both studies, teachers perceived girls as performing higher and exerting more effort than boys. Because these effects occurred in the context of models controlling for individuating information, they are best interpreted as stereotypes influencing teacher perceptions – bias effects, in traditional social psychological parlance.
Did these sex stereotyping bias effects increase or reduce the accuracy of teachers’ perceptions? They did both. In the case of performance, the sex stereotype effect increased teacher accuracy. The real performance difference, as indicated by final grades the prior year, was r = .08 and r = .10 (for the 1996 and 1998 studies, respectively, girls received slightly higher grades). The regression model producing the “biasing” effect of stereotypes yielded as a “bias” that was virtually identical to the real difference. In other words:
The small independent effect of student sex on teacher perceptions (of performance) accounted for most of the small correlation between sex and teacher perceptions (of performance). This means that teachers apparently stereotyped girls as performing slightly higher than boys, independent of the actual slight difference in performance. However, the extent to which teachers did so corresponded reasonably well with the small sex difference in performance. In other words, teachers’ perceptions of differences between boys and girls were accurate because the teachers relied on an accurate stereotype. (Jussim et al., 1996, p. 348)
SUMMARY AND CRITICAL EVALUATION
What This Research Does Show
… Table 10.4 compares the frequency with social psychological research produces effects exceeding correlations of r = .30 and r = .50, with the frequency with which the correlations reflecting the extent to which people’s stereotypes correspond to criteria exceed r = .30 and r = .50. Only 24% of social psychological effects exceed correlations of r = .30 and only 5% exceed r = .50. In contrast, all 18 of the aggregate and consensual stereotype accuracy correlations shown in Table 10.1 and Table 10.2 exceed r = .30, and all but two exceed r = .50. Furthermore, 9 of 11 personal stereotype accuracy correlations exceeded r = .30, and 4 of 11 exceeded r = .50.
This is doubly important. First, it is yet another way to convey the impressive level of accuracy in laypeople’s stereotypes. Second, it is surprising that so many scholars in psychology and the social sciences are either unaware of this state of affairs, unjustifiably dismissive of the evidence, or choose to ignore it (see reviews by Funder, 1987, 1995; Jussim, 1991, 2005; Ryan, 2002). When introductory texts teach about social psychology, they typically teach about phenomena such as the mere exposure effect (people like novel stimuli more after repeated exposure to it, r = .26), the weapons effect (they become more aggressive after exposure to a weapon, r = .16), more credible speakers are more persuasive (r = .10), and self-serving attributions (people take more responsibility for successes than failures, r = .19; correlations all obtained from Richard et al., 2003). How much time and space is typically spent in such texts reviewing and documenting the much stronger evidence of the accuracy of people’s stereotypes? Typically, none at all. For a field that aspires to be scientific, this is a troubling state of affairs. Some might even say unbearable.
… First, the accuracy of two of the other major types of stereotypes – religion and social class – have, as far as we know, never been examined. Although we can think of no reason why patterns of accuracy should differ for these types of groups. we will never know until the research is actually conducted.
Second, the existing research has overwhelmingly examined the stereotypes held by college students, largely because those samples are convenient. Is this important? Maybe. Suggesting it may not be that important has been the research by McCauley and colleagues, and by Clahaugh and Morling (2004) showing that the accuracy of noncollege groups is nearly identical to that of college students. Nonetheless, more research with noncollege samples is needed.
Third, there are many different types and aspects of accuracy, and few studies report results addressing all of them. Ideally, more research in the future will provide more comprehensive assessments of the various types of stereotype accuracy.
Fourth, most of the research on stereotype accuracy to date has been conducted in the United States and Canada. Perhaps stereotypes in other countries are less (or more) accurate.