Charles Murray, one of the authors of The Bell Curve, has broken his decade-long silence on the subject of group differences in IQ. Shockingly, he’s come out of his shell to defend Larry Summers. This has seen a number of predictable and not very interesting reactions across the web, but friends of ours from the Alspaugh group have been having an out-of-band discussion on this topic and thought it might be of interest to readers of the OpenScience blog.
What follows is a three-way conversation between Dan Gezelter, Geoff Davis, and Lanier Anderson. Dan is an associate professor of chemistry at Notre Dame and directs the OpenScience project. Geoff Davis is a mathematician by training, the director of PhDs.org, and author of the survey of post-doctoral scholars at Sigma Xi. Lanier is an associate professor of philosophy at Stanford. Each of our contributions will be offset below as a block quote with attributions above each quote.
Let’s start with Dan Gezelter:
Although Murray makes some idiotic points in his essay, there is a valid (but statistically trivial) point in that essay about Gaussian distributions with larger standard deviations and slightly offset means that is interesting when applied to human populations.
The outliers of a distribution are the ones that get the most attention from humans. Basketball teams are made up largely of people who are a few standard deviations in height away from the mean. Imagine two sub-populations of the entire population. These two groups could have overlapping height distributions. Group A has a mean of 5’5″ and a standard deviation of 10″. Group B has a mean of 5’6″ and a standard deviation of 11″. Not that different, right? You can find tall As and short Bs and most people don’t notice a difference between the populations in everyday life.
However, the fraction of people in Group A that are greater than 7′ tall is 0.36% . In Group B, that fraction is 1.03%. If you were making a basketball team up of tall people, and if Group A and Group B were uniformly distributed throughout the population, you’d see a disproportionately large number of members of Group B in your basketball team because basketball selects for an extreme characteristic. Even if the means were identical, the larger standard deviation of group B would favor B-dominated basketball teams.
That’s the 3 paragraph summary of what’s really a sophomoric argument applied to politically and emotionally charged topics like intelligence and gender and race. Murray’s article is intellectually sloppy, and he’s working in a pseudo-scientific field.
To which Geoff Davis replied:
I don’t know anything about Murray. He does sound like a crackpot. I disagree about the “pseudo-science”, though.
IQ tests and the like have a checkered past, and they have definitely been abused. I do, however, think there are reasonable ways to measure some types of basic cognitive ability through standardized tests (e.g. short term memory capabilities, spatial reasoning capabilities, etc). Many, if not most, of the people working in this area are doing so for legitimate scientific reasons — better understanding cognitive development, studying the effects of various types of drugs on cognition, and so on — and dismissing the entire endeavor as pseudo-science comes across sounding like a way of avoiding a line of inquiry that at times raises inconvenient questions.
My understanding is that a wide range of standardized tests show sex-related differences in variances. I think it is legitimate science to ask why this is so.
One possibility is that some of the variance differential arises from structural brain differences. As I am sure you have read, there have been some interesting recent findings of some fairly big sounding qualitative differences in men’s and women’s brains (different distributions of white and gray matter, different connection densities in different regions of the brain, etc). The study of sex-related test differences may provide some insight into the functions of some of these structural features. That’s science.
An alternative explanation is that the differences in variances arise from different test-taking strategies. Perhaps men are more likely to guess answers when they are unsure than women. (Earlier this year I tracked down a researcher who did work on SAT scores and suggested some ways to test this hypothesis, but I couldn’t seem to get the idea across.). Anyway, if this explanation holds, there are implications for test scoring and perhaps for how people are educated. More science, and some public policy to boot.
IQ tests have historically been justified on the grounds that they have predictive power for future “success”. I think that looking for ways to estimate potential is a useful goal (though I don’t think that IQ tests have been all that successful in realizing this goal). Such measurements can help with the allocation of scarce resources (special classes, scholarships, etc). Given that SATs and the like are already being used for such purposes, it is worthwhile to try to improve them.
If the features being measured are plastic, cognitive measures provide a way to evaluate interventions. A particular form of training (or free lunch program or nutritional supplement or whatever) can be assessed by looking at its ability to alter “ability” scores.
Daniel Goleman at Harvard (the EQ guy) has done some of the more interesting recent work I have seen in this area. He has been working on measuring people’s emotional competencies. His findings are fascinating: EQ appears to be a better predictor than IQ of future success (for some reasonable measure of success). On average women fare rather better than men in EQ tests.
I think that science will sort all this out eventually; in the meantime, I like Steven Pinker’s take: â€œEquality is not the empirical claim that all groups of humans are interchangeable; it is the moral principle that individuals should not be judged or constrained by the average properties of their group.â€
Dan Gezelter replies:
Investigating structural differences in the brain would be science, no question. And even investigating the role this plays in different measures of cognitive function is science. Even an investigation into cross-cultural but sex-linked test-taking strategies could be done scientifically.
You start to cross the line into non-science when “predictive power” is brought into it. Success depends sensitively on career and life choices and on a huge number of environmental factors that can’t possibly be tested for in any realistic way.
Irregardless of the inability to control for future choices or environmental factors, from everything I’ve read, IQ and other standardized tests don’t have any substantive predictive power. My colleague, Dennis Jacobs, has done some small studies here (around 4000 subjects) on the predictive power of entering SAT scores on grades received in freshman chemistry and on retention in the sciences after 4 years and he finds essentially noise. We now use a topical entrance exam to sort students who need extra help into a special freshman chemistry section that has extra tutorials and our best TAs.
The point is that the topical exam tests preparation and/or retention and not some fuzzy cognitive ability. It is a far better predictor of success and a far better way to allocate scarce resources than the SAT has been. I think what I’m saying is that the SATs and the like have had decades to become predictive tools, and I’ve yet to see any study showing them to be predictive in any meaningful way. That qualifies them as non-scientific measures of ability.
Geoff Davis replied:
Wikipedia has some interesting numbers:
This response of the APA to Murray’s book, The Bell Curve, also has some interesting validation data:
IQ does appear to have modest utility at predicting success in a number of areas. But I think we both agree that the idea that it has strong predictive powers or that it is the only predictor of success is pernicious.
Regarding your colleague’s study at Notre Dame, I think this kind of investigation is great. But it’s not surprising that you’re seeing noise on the SAT data’s ability to make predictions. Your students are probably all a few standard deviations above the mean, so the differences in scores that you would be looking at are probably relatively modest. If you have any halfway decent admissions process, the students with lower SAT scores have shown other evidence of potential. It’s sort of like looking at the performance of professional basketball players and saying that height and stamina are irrelevant in basketball. I would be very surprised if you did not see a fair amount of predictive power from the SAT if you drew from a population that was not already screened by SAT score. Say, all students in Indiana.
I agree that lots of different factors come into play in determining success. IQ is one. Good teachers are another. Here’s a third: Martin Seligman (at U Penn) has done a lot of work on something he calls “optimism”. It’s basically a measure of how people deal cognitively with adverse circumstances. His optimism test has shown rather striking ability to predict retention in a number of areas. Check it out:
It has nothing to do with IQ, but I would bet money that a variant of that test would be a good predictor of retention of freshmen in tough classes. I can point you to some background materials if you’d like.
Regardless, the fact that there is a lot of noise doesn’t mean that in the aggregate you can’t pull out signal.
At this point, Lanier Anderson stuck his head in to comment:
I can believe that there are tests that can get some kind of measure on relatively specific cognitive differences like short term memory and spatial reasoning. And I am quite sure from personal experience that there are tests capable of measuring mastery of specific subject areas, like the ones that Dan mentioned.
But isn’t Murray’s point actually dependent on a more ambitious claim? Doesn’t he have to be claiming that there is such a property as general intelligence, which explains high achievement among humans across a very wide range of fields, including philosophy, music, natural science, mathematics, political leadership, market timing, etc. etc.? And also that this property is measured by the IQ test du jour? How else are we supposed to understand the supposed force of his claims about lack of achievement by women in philosophy, composing, etc.?
I myself am deeply skeptical that there is any such property at all. And my experience with efforts to track any relation between GRE scores and eventual professional success among philosophy graduate students makes me pretty sure that if there were any such property, current standardized testing is not tracking it in any useful way.
And just because I can’t resist, here is a short list of 20th c. women who count as counterexamples to Murray’s claim that no woman has ever been a “significant original thinker” in philosophy (and I must admit I do think I know more about this, at least, than Mr. Murray…). These are just off the top of my head:
Simone de Beauvoir
Ruth Barcan Marcus
Judith Jarvis Thompson
all of whose work is of quite a bit more intrinsic intellectual interest than, say, Murray’s.
Dan, responding to Geoff, wrote:
OK, you’ve convinced me of that pre-selected groups like freshmen at Notre Dame may all have SAT scores that are so close that the differences are meaningless. This means that psychometric theories of intelligence are about as predictive as quantum mechanics. (I.e. if I prepare identical experimental conditions, I have little predictive capacity for an individual experiment, but I can know a great deal about average values. But, if I prepare wildly different experimental conditions I may have much stronger predictive ability.) There are times that this property convinces me that quantum mechanics isn’t particularly “scientific” also…
And even if an aggregate “signal” can be pulled out of the noise, psychometrics (as used in modern educational institutions) isn’t just applied in aggregate. Standardized tests are used for individual decision making, and they just aren’t good enough predictors for that purpose. (Lanier’s point about the GREs is also valid in my field. Our department appears to have a weak inverse correlation between Chemistry GRE score and probability of finishing the Ph.D.)
Geoff Davis responding to Lanier:
I agree that Murray is a wacko. I also agree that his broad claim about general intelligence is certainly false.
My understanding is that current theories posit several different forms of intelligence, IQ being only one of them (and the only one currently measured, however imperfectly, with current methods). Howard Gardner (Multiple Intelligences) is one of the big names. Quantification of the other forms is only just starting to happen. The EQ work I mentioned earlier is in keeping with Gardner’s notion of personal intelligence, and recent attempts to quantify it have been pretty interesting. A lot of the things current EQ tests measure have to do with how effectively one can channel one’s other skills (motivation, etc) and how well one can function in society — I’m sure you know plenty of intellectually gifted people of both sexes who will never achieve on the level of, say, a Simone de Beauvoir because they lack the social skills or emotional stability that they need to function effectively. Certainly in the world of mathematicians that is the case.
I doubt that there is any such single general intelligence, but I would speculate that there is an ensemble of properties, not too many in number (maybe a few dozen?), that probably determine much of achievement after you factor out environment and luck. I think that we all have a pretty good intuitive sense of what they are: intelligence matters, of course, but so do things like creativity, self-discipline, physical strength, various kinds of artistic ability, and so on. I would further speculate that there are sex-related differences in distributions of these various qualities, but that neither men nor women have any kind of across the board superiority over the entire ensemble. My understanding is that the early EQ research bears this idea out.
Regarding your point on GRE scores and performance in graduate school, my guess is that the GREs have modest predictive power, but your (and most people’s) experiences are structured in such a way that you can’t see it — the screening of people that happens before you see them gives you too strongly biased of a personal sample. The people you run into are likely almost entirely high scorers; I doubt very much that the GREs have the ability to distinguish between such people very effectively. But I would guess that the people who get top scores on average do better professionally than the people at the bottom.
You can’t really run that experiment, though, since I would guess that the low scorers typically don’t get in to grad schools, or if they do, they end up at less prestigious places where they don’t get the benefit of working with pre-eminent Nietzsche scholars.
It’s a shame that so much of the public discourse on this stuff ends up being about the superiority of one group over another, because there is a lot of interesting scholarship that has nothing to do with any of that.
Geoff replies to Dan:
I think your quantum analogy is not bad. Think like a Bayesian: For an an individual about whom you know nothing, there will be some distribution of likely grades s/he will get in your chemistry class. For an individual for whom you know, say, an SAT score, there will be a different posterior distribution. If it’s a high SAT score, the posterior distribution will probably have a higher mean and a lower variance, but it’s still a distribution with uncertainty attached.
The problem, I think, is that most people don’t have a good intuitive sense of what that Bayesian notion of reduced posterior uncertainty means. I think it’s usually mangled into a hard binary proposition, something like “high SAT (or IQ or whatever) implies ‘success'”. People then either don’t question it, which I am sure causes plenty of problems with self-fulfilling negative prophesies, or they decide that because they know some low-SAT people who succeed, there is no relationship (which is also probably false).
I’m surprised you rated as pessimistic on the EQ scale. I’m pretty much off the charts optimistic (which one might guess from my Panglossian tendencies).
Seligman’s group has done a lot of interesting research with variants of that optimism test. It’s proven to be a pretty strong predictor of attrition in a number of other difficult jobs and is strongly linked to depression. One thing I’ve wanted to do for awhile is to get a department with a high attrition rate to have incoming first year students take that test. I think it would be a pretty good predictor of who will be around in year 2. The cool thing is that there are relatively simple measures that can raise scores on the test and that have corresponding real-life benefits. If, as I speculate, that test is a good predictor of attrition, there are likely simple things a department could do to increase retention. If you want to be a guinea pig, I have some ideas on where to go for funding.
And Lanier gets the last word:
I guess what I would speculate is this: what you are here calling intelligence, and which we tend to assume is some one cognitive capacity (or a small, pretty tightly interconnected group of capacities)– commonly referred to in American as “raw smartness”–, is probably no one property of a cognitive agent at all, but rather a relatively large (or at least middle-sized) cluster of abilities each of which varies with some independence across individuals. Some of those, like spatial reasoning, or particular linguistic abilities, etc., can probably be picked out and measured. But the general intelligence idea is that these things all travel together in a group, because they are explained by the underlying capacity of raw smartness. And I kind of doubt there is any such thing. Just a hunch, but it would explain why standardized tests predict as little as they do.
That sort of picture would be very compatible with the sort of EQ research you were citing, I take it?
Regarding the predictive power of GREs: We’ve admitted students with scores in the 200’s (!!) on one component (right, that’s the old scoring scale, so 200’s out of 800), who did just fine. But it’s true that applications getting serious consideration tend to be mid-500’s and up. What is disturbing, though, is that you can’t tell anything from the difference between 650 and 800. And if 80% of your applicant pool is 570 and up, then all the differences that could reasonably come into play are essentially useless. Just my opinion, again. I do have colleagues who take the GRE as very serious evidence, so those of us who think it’s hooey do have to go into the admissions meeting with an account of why someone is worth admitting in spite of a low score, if there is one. But really, once the students get here, I have been unable to see any sustained difference among students that tracks those scores. Whereas, e.g., when I had a worry about someone’s writing sample or transcript, that is much more likely to correspond to worries I end up having about them as graduate students.