Untitled Document

Testing, The Bell Curve, and the social construction of intelligence. (Richard J. Herrnstein and Charles Murray's book)

Hanson, F. Allan. "Testing, The Bell Curve, and the social construction of intelligence." Tikkun. 10.n1 (Jan-Feb 1995): 22(6). 9 Sept. 2008
Abstract:

Richard J. Herrnstein and Charles Murray wrote a controversial book titled The Bell Curve to establish a connection between societal problems and low intelligence. The duo also claimed that intelligence was race-dependent, with blacks generally being less intelligent than whites.

At some gut level, many middle- and upper-class white Americans apparently harbor the conviction that they are more intelligent than people of the lower class and ethnic minorities (especially of African descent). While its obviously racist and anti-democratic connotations are sufficient to keep this attitude under wraps most of the time, periodically works grounded in psychometrics (the branch of psychology devoted to measuring differences among people) encourage this sentiment to re-emerge with an apparent mantle of scientific respectability. The most notorious of several recent eruptions has been sparked by the publication of Richard J. Herrnstein and Charles Murray's book, The Bell Curve.

Herrnstein and Murray are only the latest in a 125-year-long succession of social scientists dedicated to the enterprise of postulating intellectual differences between classes and races. Important predecessors include Francis Galton, H. L. Goddard, Lewis Terman, and Arthur Jensen. Herrnstein himself articulated the main theme of The Bell Curve twenty-one years ago, when he predicted that the importance of inherited mental abilities for achieving high income and prestige in our society would inexorably open a social rift between an intellectually gifted ruling class and a dull underclass.

Despite a bulky 800-plus pages bulging with statistics and charts, The Bell Curve conveys a very simple message: The ills of society--poverty, unemployment, unmarried parenthood, crime--are causally connected to the low intelligence of the people who manifest them. From this put-down of the lower class, Herrnstein and Murray go on to make the highly controversial claim that races differ in intelligence, and particularly that Blacks are significantly less intellectually endowed than whites.

Indisputably, intelligence test scores vary directly with socioeconomic status. If people are sorted into groups defined by $10,000 increments in annual family income, average intelligence test scores increase with each step up the income ladder. It is also true that, on average, Ashkenazi Jews score between a half and a full standard deviation (about 7 to 15 I.Q. points) higher on intelligence tests than other whites, and whites average a full standard deviation higher than Blacks.

The interpretation of these facts forms the heart of the debate over The Bell Curve. Herrnstein and Murray insist that they mean that upper-class people, on average, are more intelligent than lower-class people, and that this difference goes a long way in explaining the affluence of the one group and the chronic crime, dependence on welfare, unmarried parenthood, and other social problems of the other.

In fact, Herrnstein and Murray's thesis about different levels of mental ability between rich and poor, white and Black, is wholly predicated on the notion that intelligence is an independently existing human characteristic that is accurately measured by intelligence tests. But that fundamentally misconstrues the nature of intelligence tests and what they measure, throwing the entire argument of The Bell Curve off track. Some reflections on the nature of tests in general, and intelligence tests in particular, will make this clear.

Tests are always indirect measures. What we wish to know from a test--the target information--and what we learn directly from it--the test result--are never identical. Test results represent target information. This is particularly obvious in a he detector test, in which the test result consists of information about certain physiological changes--in blood pressure, pulse and respiration rates, and so on--as measured by a polygraph machine. The target information is whether or not the subject is telling the truth. The assumption underlying the test is that the test result tells us something about the target information: that certain physiological perturbations, when associated with responses to certain questions, signify deception.

The moment it is recognized that a gap exists between test result and target information, it becomes dear that the relation between the two is not one-to-one. Other variables may intervene. Thus, one of the biggest debates about polygraph tests is whether it is possible to weed out "false positives"--honest individuals whose physiological responses manifest the same pattern as that associated with deception, but for other reasons.

There is a similar gap in intelligence testing. The point of an intelligence test is not to learn if the subject knows the meaning of the exact words found on that test, or can solve the particular mathematical problems or identify the patterns among the specific numbers and shapes that appear there. Performance on the test is assumed to represent the subject's ability to define words or solve problems of these types, and that, in turn, is taken to represent the subject's level of some largely inherited capacity called "general intelligence." Thus, the gap between test results (right and wrong answers on a particular test) and target information (general intelligence, or I.Q.) is wide indeed, and dependent upon many variables. The most important of them is learning.

Literally every answer on an intelligence test depends on what a person has learned. Vocabulary and reading comprehension questions probe how well the individual has learned to read; quantitative questions explore how much mathematics the individual has learned; questions in logic have to do with how well the person has learned to think systematically; spatial relations problems concern how well the subject has learned to visualize shapes, compare, and mentally transpose them. Of course, intelligence or aptitude tests aim to tap learning that is more broadly applicable and loosely specified than "achievement" tests such as, say, tests limited to addition and subtraction of fractions or American colonial history. It remains true, however, that any mental test, including the most general intelligence test, is inevitably limited to measuring what the test-taker has learned.

Recognizing this, it becomes apparent that the score on an intelligence test indicates much more than an individual's innate intelligence. What the individual has learned may reflect inherited abilities to some degree, but other factors are critical, such as opportunities and motivation to learn. These depend on a variety of considerations such as the rewards and encouragements the individual has received for learning, personal relationships with parents and teachers, if and when the individual was exposed to subject matter that stimulated interest, and how much time and how many facilities, books, instruments, and other resources have been made available for learning.

Herrnstein and Murray acknowledge that intelligence is not entirely inherited; they attribute about 40 per cent of it to environmental factors. So they would be likely to chalk up what has just been said to the environmental component in intelligence. Their recognition of an environmental factor is mitigated, however, by their claim that intelligence remains stable throughout life after about age ten. Thus, they imply that the environmental impact on intelligence occurs only in relatively early childhood.

Given that tests can only measure what has been learned, it seems appropriate to reverse the direction of causality that Herrnstein and Murray propose. Far from differences in intelligence (as indicated by intelligence test scores) causing class differences, it is more likely that class membership (with all the discrepancies in opportunities to learn that that entails) causes differences in intelligence test scores.

This way of phrasing causal connection intentionally begs the question of the relation between intelligence test scores and intelligence, since that topic requires careful, separate consideration. The most basic flaw in The Bell Curve is its concept of intelligence as a single, independently existing human trait that is accurately measured by intelligence tests. Intelligence is better understood as an artifact of intelligence testing. This is not to say that there is no such thing as intelligence. It exists, but it has been brought into being by intelligence tests.

Consider a thought experiment that constructs a new test and imagines its consequences. We will call our test the New Intelligence Test, or NIT. It is intended to surpass current tests by sampling more widely from the full range of cognitive ability, and particularly its practical applications in everyday life. The NIT consists of eight sections:

* A name recall scale tests ability to remember the names of persons to whom the subject has just been introduced;

* A mathematics section tests the subject's ability to solve problems of arithmetic and algebra;

* In the exposition of ideas section, the subject is given five minutes to read a complex idea--such as a page from Rousseau describing his distinction between self-love (amour de soi) and selfishness (amour-propre)--and thirty minutes to present a clear and accurate written account of it, with original examples;

* The small-talk scale evaluates subjects' ability to carry on an interesting conversation with someone they have just met;

* In the follow-the-directions scale, the subject is told once, at the speed of ordinary conversation, to do a task consisting of six distinct steps, and is evaluated on how well the task is accomplished;

* A bullshitting scale assesses skill at participating in a discussion with two other people on a topic about which the subject knows nothing;

* The adult sports scale evaluates the subject's ability to play golf or tennis, with suitable adjustments for male and female subjects;

* The presiding scale assesses ability to run a business meeting, including matters such as maintaining focus of discussion, building consensus, and finishing on time.

The test result is reported as a composite score generated from the outcomes of the NIT's eight sections.

The ability or human capacity tested by the NIT is certainly nothing inconsequential. If the appropriate studies were done, it would doubtless turn out that high NIT scores correlate positively (probably more positively than I.Q. scores) with desirable social outcomes such as success in the university, in business or professional life, high income, and election to public office. But it is also obvious that what the NIT tests is not a single quality or capacity of persons. It is rather a set of distinct qualities, which have been measured by the several sections of the NIT and combined into a single score for convenience in reporting NIT results.

But assume now that the NIT were to catch on in a big way--that it came, for example, to be widely used for college and graduate admissions and for hiring and promotion purposes by law firms, government, and corporations. In such an event, the different abilities measured by the NIT would not remain static. People would spare no effort in preparing for the test, in the hope of achieving the rewards awaiting those who excel on it. They would bone up on arithmetic and algebra, master techniques for remembering the names of strangers, hone skills of bullshitting, take golf and tennis lessons, learn how to run successful business meetings. School curricula would shift in the direction of more training in the areas covered by the NIT. (If they did not, irate parents would demand to know why their children were not being taught something useful.) Kaplan and Princeton Review would explode into the marketplace with courses that promise dramatic improvement in one's NIT scores.

All of this dedicated effort would have a palpable effect. Although the NIT obviously measures several quite different abilities, people would knit them together as they strive to improve them all in order to raise their NIT scores. Because NIT scores are reported as simple numbers, they would begin to imagine these several abilities to be one. They would name it...perhaps "NITwit." Given its relevance for success in life, it would be valued as a thing of great importance. People would worry about how much of it they possess; they would look for promising signs of it in their children and envy evidence of its abundance in other people's offspring.

Not only would a new mental category swim into the social consciousness. The empirical amount of it possessed by individuals would literally increase as, in preparing for the NIT, they got better at following directions, playing golf, expounding on ideas, engaging in small talk, and the rest of it. And, of course, as individuals increase these skills, NIT scores would go up. There would be rejoicing in the land as today's average NIT scores exceed those achieved in the past or by test-takers in other countries...until, perhaps, an apogee is passed and national consternation about declining NIT scores sets in. Given all these transformations and developments, it is fair to say that NITwit would become a new, singular, personal trait--an objective reality literally constructed by NIT testing. Perhaps the ultimate development (and the ultimate absurdity, but it unquestionably would happen) would be the marketing of rival tests that claim to measure NITwit faster, cheaper, or more accurately than the NIT.

The foregoing discussion of the NIT has, of course, its facetious moments. But its purpose is entirely serious. It demonstrates two fundamental characteristics of all mental testing. One is that test results inevitably reflect what test-takers have learned. The other is that, when a given test becomes sufficiently important, whatever that test tests gets reified as a single quality or thing. This has been the experience of "intelligence" in the real world. Because of intelligence tests, several different abilities (to solve mathematical problems, to comprehend texts, to compare shapes, to sort ideas or objects into classes, to define words, to remember historical events, and to do all of these things rapidly) have been welded together to form a new, unitary mental characteristic called "intelligence." People place great emphasis on it because intelligence tests serve as the basis for offering or denying educational and career opportunities and other social rewards. Precisely as with NITwit in our thought experiment, intelligence has been fashioned into an objectively real personal trait by the practice of intelligence testing.

We have distinguished two very different ways of understanding the relation between intelligence, intelligence tests, race, and social class. They agree that intelligence test scores increase as one goes up the social ladder, but disagree as to why. The sociological explanation I support is that, on average, opportunities to learn increase with higher socioeconomic status. (The Black/white difference in average intelligence test scores is then attributable to the fact that the socioeconomic status distribution of Blacks is lower than that of whites.) To say that intelligence increases with socioeconomic class is true in a sense, but it is potentially misleading because of the common tendency to think of intelligence as an independently existing human trait. On the contrary, general intelligence is nothing but a social construct produced by intelligence testing. "Intelligence" rises with socioeconomic status only because intelligence test scores do. And that is explicable not by any difference in largely innate cognitive potential between people of different classes (and races), but by different opportunities to learn.

In contrast, The Bell Curve and other works in its genre take intelligence test scores to be accurate measures of an independently existing human trait called general intelligence. The fact that scores are lower for certain ethnic minorities and for the lower class indicates that people in those conditions are, on average, less intelligent than others. Moreover, low intelligence is an explanatory factor for poverty, crime, welfare dependence, and other social problems that tend to cluster in the lower class, as well as for the disproportionate representation of certain ethnic minorities (especially Blacks) in the lower class.

This point of view is perverse as well as erroneous. Its endorsement of the idea that ethnic groups and social classes differ in intellectual capacity fuels a combination of smug condescension and hostility toward minorities and those who live in poverty. This has closed opportunities for millions and has driven racism, class discrimination, and eugenic programs such as immigration quotas and enforced sterilization.

Over the last century, each eruption of the discriminatory idea that some races and classes are less intelligent than others has been met with vigorous opposition, as have the contentions of The Bell Curve. It appears that this time, as in the past, after a relatively brief popular infatuation with the idea, scholarly counter-arguments will ultimately succeed in beating it back. But, if history is any indicator, in twenty years or so the issue will pop up again.

Perhaps it has not been defeated derisively because the critiques dealing with challenges to statistics and alternative explanations have not struck at its roots. Those consist not of ideas or propositions but of a social practice: intelligence testing. The practice of intelligence testing itself has produced both the concept of intelligence as a single thing and test results indicating that that thing systematically varies among ethnic groups and social classes. Thus, the most effective way to lay this spurious and socially disruptive issue to rest once and for all is to change the practice of intelligence testing.

While it may be inconvenient to do without the efficiency and economy achieved by large-scale intelligence testing, some institutions show signs that they are beginning to wean themselves from it. Antioch, Bard, Hampshire, and Union colleges, together with some two dozen others, no longer require applicants to submit SAT or ACT scores, and Harvard Business School has dropped the GMAT (Graduate Management Aptitude Test) as an application requirement. These schools make their selections on the basis of academic records, written statements by applicants, and letters of recommendation, and they manage to operate their admissions programs effectively without intelligence tests.

A movement is afoot in the primary and secondary schools to assess children not in terms of standardized intelligence tests, but according to portfolios they develop with examples of their best work in a variety of subjects. If this becomes widespread, the conventional notion of intelligence as a single, quantifiable entity will begin to fade, as people focus on children's different talents--in such areas as music, visual art, the use of language, mathematical skills, athletics, and interpersonal relations.

Developments of this sort would not signal the end of all testing. In addition to other evidence of accomplishments (such as portfolios), tests will doubtless continue to play a role in decisions about school promotions and graduation as well as competition among aspirants for scholarships, admission to selective colleges and training programs, or employment in desirable jobs. The tests, however, would not be designed to measure anything like "general intelligence." They would aim to assess how well individuals have succeeded in mastering knowledge or skills that have been presented to them in academic courses, technical, or artistic training programs. Different individuals would, of course, perform at different levels on these tests, and this would be taken into account along with other qualifications in deciding who will receive scarce rewards and opportunities.

To implement practices such as these would not require sea changes in attitudes about assessment. The alternative perspective is already well established. Consider how evaluation works in a typical American college course. Depending on the discipline, students are usually graded on the basis of some combination of the following: problems or questions to be completed and handed in at regular intervals, laboratory reports, term papers, performance in discussion groups, and tests. The notion of general intelligence plays no role in the process. When students do not perform adequately and one wishes to understand why, the first questions have to do with how much interest they have in the subject matter and how much effort they put into it. If it is clear that they are interested and are trying hard, investigation turns next to their preparation. Have they developed good study habits? Do they have the requisite background for this course? Have they learned the particular modes of thinking and analysis that are used in this discipline?

Academic advisers account for the great majority of cases of unsuccessful course performance in terms of one or another of these lines of investigation. Only for the few cases that remain does the question of sheer ability or "intelligence" come up. And even then, the matter is posed in terms of the particular talents appropriate for a specific subject matter (ability to do mathematics, to draw, to interpret literature, and so on) rather than general intelligence.

If the attitudes represented in this process were to become commonplace, it is likely that we would lose the habit of thinking of general intelligence as an all-important, single trait that is distributed unequally among the population. Instead, we would evaluate quality of performance in terms of a variety of factors, only one of which is native ability in that particular area. Such a change in thinking would drastically curtail the destructive view that some people are irredeemably inferior to others by birth, and perhaps even by race. It would place primary responsibility for achievement squarely on the individual's effort and hold out the promise that, if given a fair opportunity, the degree of one's own determination is the major factor in achieving one's goals.

The model of the college classroom does not apply to larger evaluation programs in one crucial regard. It is a given that all of the students enrolled in a single course have the opportunity to receive the same instruction. This, of course, does not hold when large numbers from different localities and backgrounds are being assessed. They will have been exposed to a variety of different experiences and curricula in schools that are anything but uniform in the quality of education they provide. The question is how to achieve a fair evaluation of how well people have acquired academic, technical, artistic, or other skills when some of them have had much richer opportunities to acquire them than others. No simple answer exists. The only satisfactory long-range solution is to provide all primary- and secondary-school children with equal educational opportunities. And that will require much more than just fixing the schools. It also involves fostering supportive home and community environments.

Whether or not those ends will ever be achieved, doing away with testing for general intelligence would preclude the periodic eruption of facile explanations such as that crime, poverty, and other social problems are attributable to low intelligence. It would spare us from fighting the battle of The Bell Curve all over again in twenty years' time, and would help fix attention on our real challenges to eradicate race and class discrimination and to enrich environments, providing all Americans with an equal opportunity to develop their talents.

F. Allan Hanson is professor of anthropology at the University of Kansas. His most recent book is Testing Testing: Social Consequences of the Examined Life (University of California Press, 1993).