The kids are all right, but they can’t interpret a graph

The kids are all right, but they can’t interpret a graph

I have not posted here in a while. This is mostly due to the fact that I have a job that is both engaging and demanding. I started this blog as a way to blow off steam, but I realized this mostly meant ranting about those fools at the academy! of whom there are indeed plenty. These are reality based rants, but I’ve got better things to do.

As it happens, I’ve come down with a bug that keeps me at home but leaves just enough energy to read and type, but little else. This is an excellent recipe for inciting a rant. Reading the Washington Post article on delayed gratification in children brings it on.

It is not really the article that gets me, let alone the scholarly paper on which it is based. I have not read the latter, and have no intention of doing so. I hope its author has thought through the interpretation better than is implied by what I see in the WaPo article. That is easy for me to believe; my own experience is that what academics say to the press has little to do with what eventually appears in the press – sometimes even inverting its meaning outright. (At one point I was quoted as saying that dark matter experimentalists should give up, when what I had said was that it was important to pursue these experiments to their logical conclusion, but that we also needed to think about what would constitute a logical conclusion if dark matter remains undetected.)

So I am at pains to say that my ire is not directed at the published academic article. In this case it isn’t even directed at the article in the WaPo, regardless of whether it is a fair representation of the academic work or not. My ire is directed entirely at the interpretation of a single graph, which I am going to eviscerate.

The graph in question shows the delay time measured in psychology experiments over the years. It is an attempt to measure self-control in children. When presented with a marshmallow but told they may have two marshmallows if they wait for it, how long can they hold out? This delayed gratification is thought to be a measure of self-control that correlates positively with all manners of subsequent development. Which may indeed be true. But what can we learn from this particular graph?

marshmallow_test-1

The graph plots the time delay measured from different experiments against the date of the experiment. Every point (plotted as a marshmallow – cute! I don’t object to that) represents an average over many children tested at that time. Apparently they have been “corrected” to account for the age of the children (one gets better at delayed gratification as one matures) which is certainly necessary, but it also raises a flag. How was the correction made? Such details can matter.

However, my primary concern is more basic. Do the data, as shown, actually demonstrate a trend?

To answer this question for yourself, the first thing you have to be able to do is mentally remove the line. That big black bold line that so nicely connects the dots. Perhaps it is a legitimate statistical fit of some sort. Or perhaps it is boldface to [mis]guide the eye. Doesn’t matter. Ignore it. Look at the data.

The first thing I notice about the data are the outliers – in this case, 3 points at very high delay times. These do not follow the advertised trend, or any trend. Indeed, they seem in no way related to the other data. It is as if a different experiment had been conducted.

When confronted with outlying data, one has a couple of choices. If we accept that these data are correct and from the same experiment, then there is no trend: the time of delayed gratification could be pretty much anything from a minute to half an hour. However, the rest of the data do clump together, so the other option is that these outliers are not really representing the same thing as the rest of the data, and should be ignored, or at least treated with less weight.

The outliers may be the most striking part of the data set, but they are usually the least important. There are all sorts of statistical measures by which to deal with them. I do not know which, if any, have been applied. There are no error bars, no boxes representing quartiles or some other percentage spanned by the data each point represents. Just marshmallows. Now I’m a little grumpy about the cutesy marshmallows. All marshmallows are portrayed as equal, but are some marshmallows more equal than others? This graph provides no information on this critical point.

In the absence of any knowledge about the accuracy of each marshmallow, one is forced to use one’s brain. This is called judgement. This can be good or bad. It is possible to train the brain to be a good judge of these things – a skill that seems to be in decline these days.

What I see in the data are several clumps of points (disregarding the outliers). In the past decade there are over a dozen points all clumped together around an average of 8 minutes. That seems like a pretty consistent measure of the delayed gratification of the current generation of children.

Before 2007, the data are more sparse. There are a half a dozen points on either side of 1997. These have a similar average of 7 or 8 minutes.

Before that there are very little data. What there is goes back to the sixties. One could choose to see that as two clumps of three points, or one clump of six points. If one does the latter, the mean is around 5 minutes. So we had a “trend” of 5 minutes circa 1970, 7 minutes circa 1997, and 8 minutes circa 2010. That is an increase over time, but it is also a tiny trend – much less persuasive than the heavy solid line in the graph implies.

If we treat the two clumps of three separately – as I think we should, since they sit well apart from each other – then we have to choose which to believe. They aren’t consistent. The delay time in 1968 looks to have an average of two minutes; in 1970 it looks to be 8 minutes. So which is it?

According to the line in the graph, we should believe the 1968 data and not the 1970 data. That is, the 1968 data fall nicely on the line, while the 1970 data fall well off it. In percentage terms, the 1970 data are as far from the trend as the highest 2010 point that we rejected as an outlier.

When fitting a line, the slope of the line can be strongly influence by the points at its ends. In this case, the earliest and the latest data. The latest data seem pretty consistent, but the earliest data are split. So the slope depends entirely on which clump of three early points you choose to believe.

If we choose to believe the 1970 clump, then the “trend” becomes 8 minutes in 1970, 7 minutes in 1997, 8 minutes in 2010. Which is to say, no trend at all. Try disregarding the first three (1968) points and draw your own line on this graph. Without them, it is pretty flat. In the absence of error bars and credible statistics, I would conclude that there is no meaningful trend present in the data at all. Maybe a formal fit gives a non-zero slope, but I find it hard to believe it is meaningfully non-zero.

None of this happens in a vacuum. Lets step back and apply some external knowledge. Have people changed over the 5 decades of my life?

The contention of the WaPo article is that they have. Specifically, contrary to the perception that iPhones and video games have created a generation with a cripplingly short attention span (congrats if you made it this far!), in fact the data show the opposite. The ability of children to delay gratification has improved over the time these experiments have been conducted.

What does the claimed trend imply? If we take it literally, then extrapolating back in time, the delay time goes to zero around 1917. People in the past must have been completely incapable of delaying gratification for even an instant. This was a power our species only developed in the past century.

I hope that sounds implausible. If there is no trend, which is what the data actually show, then children a half century ago were much the same as children a generation ago are much the same as the children of today. So the more conservative interpretation of the graph would be that human nature is rather invariant, at least as indicated by the measure of delayed gratification in children.

Sadly, null results are dull. There well may be a published study reporting no trend, but it doesn’t get picked up by the Washington Post. Imagine the headline: “Children today are much the same as they’ve always been!” Who’s gonna click on that? In this fashion, even reputable news sources contribute to the scourge of misleading science and fake news that currently pollutes our public discourse.

ghostbusters-columbia
They expect results!

This sort of over-interpretation of weak trends is rife in many fields. My own, for example. This is why I’m good at spotting them. Fortunately, screwing up in Astronomy seldom threatens life and limb.

Then there is Medicine. My mother was a medical librarian; I occasionally browsed their journals when waiting for her at work. Graphs for the efficacy of treatments that looked like the marshmallow graph were very common. Which is to say, no effect was in evidence, but it was often portrayed as a positive trend. They seem to be getting better lately (which is to say, at some point in the not distant past some medical researchers were exposed to basic statistics), but there is an obvious pressure to provide a treatment, even if the effect of the available course of treatment is tiny. Couple that to the aggressive marketing of drugs in the US, and it would not surprise me if many drugs have been prescribed based on efficacy trends weaker than seen in the marshmallow graph. See! There is a line with a positive slope! It must be doing some good!

Another problem with data interpretation is in the corrections applied. In the case of marshmallows, one must correct for the age of the subject: an eight year old can usually hold out longer than a toddler. No doubt there are other corrections. The way these are usually made is to fit some sort of function to whatever trend is seen with age in a particular experiment. While that trend may be real, it also has scatter (I’ve known eight year olds who couldn’t out wait a toddler), which makes it dodgy to apply. Do all experiments see the same trend? It is safe to apply the same correction to all of them? Worse, it is often necessary to extrapolate these corrections beyond where they are constrained by data. This is known to be dangerous, as the correction can become overlarge upon extrapolation.

It would not surprise me if the abnormally low points around 1968 were over-corrected in some way. But then, it was the sixties. Children may have not changed much since then, but the practice of psychology certainly has. Lets consider the implications that has for comparing 1968 data to 2017 data.

The sixties were a good time for psychological research. The field had grown enormously since the time of Freud and was widely respected. However, this was also the time when many experimental psychologists thought psychotropic drugs were a good idea. Influential people praised the virtues of LSD.

My father was a grad student in psychology in the sixties. He worked with swans. One group of hatchlings imprinted on him. When they grew up, they thought they should mate with people – that’s what their mom looked like, after all. So they’d and make aggressive displays towards any person (they could not distinguish human gender) who ventured too close.

He related the anecdote of a colleague who became interested in the effect of LSD on animals. The field was so respected at the time that this chap was able to talk the local zoo into letting him inject an elephant with LSD. What could go wrong?

Perhaps you’ve heard the expression “That would have killed a horse! Fortunately, you’re not a horse.” Well, the fellow in question figured elephants were a lot bigger than people. So he scaled up the dose by the ratio of body mass. Not, say, the ratio of brain size, or whatever aspect of the metabolism deals with LSD.

That’s enough LSD to kill an elephant.

Sad as that was for the elephant, who is reputed to have been struck dead pretty much instantly – no tripping rampage preceded its demise – my point here is that these were the same people conducting the experiments in 1968. Standards were a little different. The difference seen in the graph may have more to do with differences in the field than with differences in the subjects.

That is not to say we should simply disregard old data. The date on which an observation is made has no bearing on its reliability. The practice of the field at that time does.

The 1968 delay times are absurdly low. All three are under four minutes. Such low delay times are not reproduced in any of the subsequent experiments. They would be more credible if the same result were even occasionally reproduced. It ain’t.

Another way to look at this is that there should be a comparable number of outliers on either side of the correct trend. That isn’t necessarily true – sometimes systematic errors push in a single direction – but in the absence of knowledge of such effects, one would expect outliers on both the high side and the low side.

In the marshmallow graph, with the trend as drawn, there are lots of outliers on the high side. There are none on the low side. [By outlier, I mean points well away from the trend, not just scattered a little to one side or the other.]

If instead we draw a flat line at 7 or 8 minutes, then there are three outliers on both sides. The three very high points, and the three very low points, which happen to occur around 1968. It is entirely because the three outliers on the low side happen at the earliest time that we get even the hint of a trend. Spread them out, and they would immediately be dismissed as outliers – which is probably what they are. Without them, there is no significant trend. This would be the more conservative interpretation of the marshmallow graph.

Perhaps those kids in 1968 were different in other ways. The experiments were presumably conducted in psychology departments on university campuses in the late sixties. It was OK to smoke inside back then, and not everybody restricted themselves to tobacco in those days. Who knows how much second hand marijuana smoke was inhaled just to getting to the test site? I jest, but the 1968 numbers might just measure the impact on delayed gratification when the subject gets the munchies.

ancient-aliens
Marshmallows.

 

Advertisements

Solution Aversion

Solution Aversion

I have had the misfortune to encounter many terms for psychological dysfunction in many venues. Cognitive dissonance, confirmation bias, the Dunning-Kruger effect – I have witnessed them all, all too often, both in the context of science and elsewhere. Those of us who are trained as scientists are still human: though we fancy ourselves immune, we are still subject to the same cognitive foibles as everyone else. Generally our training only suffices us to get past the oft-repeated ones.

Solution aversion is the knee-jerk reaction we have to deny the legitimacy of a problem when we don’t like the solution admitting said problem would entail. An obvious example in the modern era is climate change. People who deny the existence of this problem are usually averse to its solution.

Let me give an example from my own experience. To give some context requires some circuitous story-telling. We’ll start with climate change, but eventually get to cosmology.

Recently I encountered a lot of yakking on social media about an encounter between Bill Nye (the science guy) and Will Happer in a dispute about climate change. The basic gist of most of the posts was that of people (mostly scientists, mostly young enough to have watched Bill Nye growing up) cheering on Nye as he “eviscerated” Happer’s denialism. I did not watch any of the exchange, so I cannot evaluate the relative merits of their arguments. However, there is a more important issue at stake here: credibility.

Bill Nye has done wonderful work promoting science. Younger scientists often seem to revere him as a sort of Mr. Rogers of science. Which is great. But he is a science-themed entertainer, not an actual scientist. His show demonstrates basic, well known phenomena at a really, well, juvenile level. That’s a good thing – it clearly helped motivate a lot of talented people to become scientists. But recapitulating well-known results is very different from doing the cutting edge science that establishes new results that will become the fodder of future textbooks.

Will Happer is a serious scientist. He has made numerous fundamental contributions to physics. For example, he pointed out that the sodium layer in the upper atmosphere could be excited by a laser to create artificial guide stars for adaptive optics, enabling ground-based telescopes to achieve resolutions comparable to that of the Hubble space telescope. I suspect his work for the JASON advisory group led to the implementation of adaptive optics on Air Force telescopes long before us astronomers were doing it. (This is speculation on my part: I wouldn’t know; it’s classified.)

My point is that, contrary to the wishful thinking on social media, Nye has no more standing to debate Happer than Mickey Mouse has to debate Einstein. Nye, like Mickey Mouse, is an entertainer. Einstein is a scientist. If you think that comparison is extreme, that’s because there aren’t many famous scientists whose name I can expect everyone to know. A better analogy might be comparing Jon Hirschtick (a successful mechanical engineer, Nye’s field) to I.I. Rabi (a prominent atomic physicist like Happer), but you’re less likely to know who those people are. Most serious scientists do not cultivate public fame, and the modern examples I can think of all gave up doing real science for the limelight of their roles as science entertainers.

Another important contribution Happer made was to the study and technology of spin polarized nuclei. If you place an alkali element and a noble gas together in vapor, they may form weak van der Waals molecules. An alkali is basically a noble gas with a spare electron, so the two can become loosely bound, sharing the unwanted electron between them. It turns out – as Happer found and explained – that the wavefunction of the spare electron overlaps with the nucleus of the noble. By spin polarizing the electron through the well known process of optical pumping with a laser, it is possible to transfer the spin polarization to the nucleus. In this way, one can create large quantities of polarized nuclei, an amazing feat. This has found use in medical imaging technology. Noble gases are chemically inert, so safe to inhale. By doing so, one can light up lung tissue that is otherwise invisible to MRI and other imaging technologies.

I know this because I worked on it with Happer in the mid-80s. I was a first year graduate student in physics at Princeton where he was a professor. I did not appreciate the importance of what we were doing at the time. Will was a nice guy, but he was also my boss and though I respected him I did not much like him. I was a high-strung, highly stressed, 21 year old graduate student displaced from friends and familiar settings, so he may not have liked me much, or simply despaired of me amounting to anything. Mostly I blame the toxic arrogance of the physics department we were both in – Princeton is very much the Slytherin of science schools.

In this environment, there weren’t many opportunities for unguarded conversations. I do vividly recall some of the few that happened. In one instance, we had heard a talk about the potential for industrial activity to add enough carbon dioxide to the atmosphere to cause an imbalance in the climate. This was 1986, and it was the first I had heard of what is now commonly referred to as climate change. I was skeptical, and asked Will’s opinion. I was surprised by the sudden vehemence of his reaction:

“We can’t turn off the wheels of industry, and go back to living like cavemen.”

I hadn’t suggested any such thing. I don’t even recall expressing support for the speaker’s contention. In retrospect, this is a crystal clear example of solution aversion in action. Will is a brilliant guy. He leapt ahead of the problem at hand to see the solution being a future he did not want. Rejecting that unacceptable solution became intimately tied, psychologically, to the problem itself. This attitude has persisted to the present day, and Happer is now known as one of the most prominent scientists who is also a climate change denier.

Being brilliant never makes us foolproof against being wrong. If anything, it sets us up for making mistakes of enormous magnitude.

There is a difference between the problem and the solution. Before we debate the solution, we must first agree on the problem. That should, ideally, be done dispassionately and without reference to the solutions that might stem from it. Only after we agree on the problem can we hope to find a fitting solution.

In the case of climate change, it might be that we decide the problem is not so large as to require drastic action. Or we might hope that we can gradually wean ourselves away from fossil fuels. That is easier said than done, as many people do not seem to appreciate the magnitude of the energy budget what needs replacing. But does that mean we shouldn’t even try? That seems to be the psychological result of solution aversion.

Either way, we have to agree and accept that there is a problem before we can legitimately decide what to do about it. Which brings me back to cosmology. I did promise you a circuitous bit of story-telling.

Happer’s is just the first example I encountered of a brilliant person coming to a dubious conclusion because of solution aversion. I have had many colleagues who work on cosmology and galaxy formation say straight out to me that they would only consider MOND “as a last resort.” This is a glaring, if understandable, example of solution aversion. We don’t like MOND, so we’re only willing to consider it when all other options have failed.

I hope it is obvious from the above that this attitude is not a healthy one in science. In cosmology, it is doubly bad. Just when, exactly, do we reach the last resort?

We’ve already accepted that the universe is full of dark matter, some invisible form of mass that interacts gravitationally but not otherwise, has no place in the ridiculously well tested Standard Model of particle physics, and has yet to leave a single shred of credible evidence in dozens of super-sensitive laboratory experiments. On top of that, we’ve accepted that there is also a distinct dark energy that acts like antigravity to drive the apparent acceleration of the expansion rate of the universe, conserving energy by the magic trick of a sign error in the equation of state that any earlier generation of physicists would have immediately rejected as obviously unphysical. In accepting these dark denizens of cosmology we have granted ourselves essentially infinite freedom to fine-tune any solution that strikes our fancy. Just what could possibly constitute the last resort of that?

hammerandnails
When you have a supercomputer, every problem looks like a simulation in need of more parameters.

Being a brilliant scientist never precludes one from being wrong. At best, it lengthens the odds. All too often, it leads to a dangerous hubris: we’re so convinced by, and enamored of, our elaborate and beautiful theories that we see only the successes and turn a blind eye to the failures, or in true partisan fashion, try to paint them as successes. We can’t have a sensible discussion about what might be right until we’re willing to admit – seriously, deep-down-in-our-souls admit – that maybe ΛCDM is wrong.

I fear the field has gone beyond that, and is fissioning into multiple, distinct branches of science that use the same words to mean different things. Already “dark matter” means something different to particle physicists and astronomers, though they don’t usually realize it. Soon our languages may become unrecognizable dialects to one another; already communication across disciplinary boundaries is strained. I think Kuhn noted something about different scientists not recognizing what other scientists were doing as science, nor regarding the same evidence in the same way. Certainly we’ve got that far already, as successful predictions of the “other” theory are dismissed as so much fake news in a world unhinged from reality.

Critical Examination of the Impossible

Critical Examination of the Impossible

It has been proposal season for the Hubble Space Telescope, so many astronomers have been busy with that. I am no exception. Talking to others, it is clear that there remain many more excellent Hubble projects than available observing time.

So I haven’t written here for a bit, and I have other tasks to get on with. I did get requests for a report on the last conference I went to, Beyond WIMPs: from Theory to Detection. They have posted video from the talks, so anyone who is interested may watch.

I think this is the worst talk I’ve given in 20 years. Maybe more. Made the classic mistake of trying to give the talk the organizers asked for rather than the one I wanted to give. Conference organizers mean well, but they usually only have a vague idea of what they imagine you’ll say. You should always ignore that and say what you think is important.

When speaking or writing, there are three rules: audience, audience, audience. I was unclear what the audience would be when I wrote the talk, and it turns out there were at least four identifiably distinct audiences in attendance. There were skeptics – particle physicists who were concerned with the state of their field and that of cosmology, there were the faithful – particle physicists who were not in the least concerned about this state of affairs, there were the innocent – grad students with little to no background in astronomy, and there were experts – astroparticle physicists who have a deep but rather narrow knowledge of relevant astronomical data. I don’t think it would have been possible to address the assigned topic (a “Critical Examination of the Existence of Dark Matter“) in a way that satisfied all of these distinct audiences, and certainly not in the time allotted (or even in an entire semester).

It is tempting to give an interruption by interruption breakdown of the sociology, but you may judge that for yourselves. The one thing I got right was what I said at the outset: Attitude Matters. You can see that on display throughout.

IMG_5460
This comic has been hanging on a colleague’s door for decades.

In science as in all matters, if you come to a problem sure that you already know the answer, you will leave with that conviction. No data nor argument will shake your faith. Only you can open your own mind.

Hubble constant redux

Hubble constant redux

There is a new article in Science on the expansion rate of the universe, very much along the lines of my recent post. It is a good read that I recommend. It includes some of the human elements that influence the science.

When I started this blog, I recalled my experience in the ’80s moving from a theory-infused institution to a more observationally and empirically oriented one. At that time, the theory-infused cosmologists assured us that Sandage had to be correct: H0 = 50. As a young student, I bought into this. Big time. I had no reason not to; I was very certain of the transmitted lore. The reasons to believe it then seemed every bit as convincing a the reasons to believe ΛCDM today. When I encountered people actually making the measurement, like Greg Bothun, they said “looks to be about 80.”

This caused me a lot of cognitive dissonance. This couldn’t be true. The universe would be too young (at most ∼12 Gyr) to contain the oldest stars (thought to be ∼18 Gyr at that time). Worse, there was no way to reconcile this with Inflation, which demanded Ωm = 1. The large deceleration of the expansion caused by high Ωm greatly exacerbated the age problem (only ∼8 Gyr accounting for deceleration). Reconciling the age problem with Ωm = 1 was hard enough without raising the Hubble constant.

Presented with this dissonant information, I did what most of us humans do: I ignored it. Some of my first work involved computing the luminosity function of quasars. With the huge distance scale of H0 = 50, I remember noticing how more distant quasars got progressively brighter. By a lot. Yes, they’re the most luminous things in the early universe. But they weren’t just outshining a galaxy’s worth of stars; they were outshining a galaxy of galaxies.

That was a clue that the metric I was assuming was very wrong. And indeed, since that time, every number of cosmological significance that I was assured in confident tones by Great Men that I Had to Believe has changed by far more than its formal uncertainty. In struggling with this, I’ve learned not to be so presumptuous in my beliefs. The universe is there for us to explore and discover. We inevitably err when we try to dictate how it Must Be.

The amplitude of the discrepancy in the Hubble constant is smaller now, but the same attitudes are playing out. Individual attitudes vary, of course, but there are many in the cosmological community who take the attitude that the Planck data give H0 = 67.8 so that is the right number. All other data are irrelevant; or at best flawed until brought into concordance with the right number.

It is Known, Khaleesi. 

Often these are the same people who assured us we had to believe Ωm = 1 and H0 = 50 back in the day. This continues the tradition of arrogance about how things must be. This attitude remains rampant in cosmology, and is subsumed by new generations of students just as it was by me. They’re very certain of the transmitted lore. I’ve even been trolled by some who seem particularly eager to repeat the mistakes of the past.

From hard experience, I would advocate a little humility. Yes, Virginia, there is a real tension in the Hubble constant. And yes, it remains quite possible that essential elements of our cosmology may prove to be wrong. I personally have no doubt about the empirical pillars of the Big Bang – cosmic expansion, Big Bang Nucleosynthesis, and the primordial nature of the Cosmic Microwave Background. But Dark Matter and Dark Energy may well turn out to be mere proxies for some deeper cosmic truth. IF that is so, we will never recognize it if we proceed with the attitude that LCDM is Known, Khaleesi.

Ode to Vera

Ode to Vera

Vera Rubin passed away a few weeks ago. This was not surprising: she had lived a long, positive, and fruitful life, but had faced the usual health problems of those of us who make it to the upper 80s. Though news of her death was not surprising, it was deeply saddening. It affected me more than I had anticipated, even armed with the intellectual awareness that the inevitable must be approaching. It saddens me again now trying to write this, which must inevitably be an inadequate tribute.

In the days after Vera Rubin passed away, I received a number of inquiries from the press asking me to comment on her life and work for their various programs. I did not respond. I guess I understand the need to recognize and remark on the passing of a great scientist and human being, and I’m glad the press did in fact acknowledge her many accomplishments. But I wondered if, by responding, I would be providing a tribute to Vera, or merely feeding the needs of the never-ending hyperactive news cycle. Both, I guess. At any rate, I did not feel it was my place to comment. It did not seem right to air my voice where hers would never be heard again.

I knew Vera reasonably well, but there are plenty who knew her better and were her colleagues over a longer period of time. Also, at the back of my mind, I was a tiny bit afraid that no matter what I said, someone would read into it some sort of personal scientific agenda. My reticence did not preclude other scientists who knew her considerably less well from doing exactly that. Perhaps it is unavoidable: to speak of others, one must still use one’s own voice, and that inevitably is colored by our own perspective. I mention this because many of the things recently written about Vera do not do justice to her scientific opinions as I know them from conversations with her. This is important, because Vera was all about the science.

One thing I distinctly remembering her saying to me, and I’m sure she repeated this advice to many other junior scientists, was that you had to do science because you had a need to Know. It was not something to be done for awards or professional advancement; you could not expect any sort of acknowledgement and would likely be disappointed if you did. You had to do it because you wanted to find out how things work, to have even a brief moment when you felt like you understood some tiny fraction of the wonders of the universe.

Despite this attitude, Vera was very well rewarded for her science. It came late in her career – she did devote a lot of energy to raising a large family; she and her husband Bob Rubin were true life partners in the ideal sense of the term: family came first, and they always supported each other. It was deeply saddening when Bob passed, and another blow to science when their daughter Judy passed away all too early. We all die, sometimes sooner rather than later, but few of us take it well.

Professionally, Vera was all about the science. Work was like breathing. Something you just did; doing it was its own reward. Vera always seemed to take great joy in it. Success, in terms of awards, came late, but it did come, and in many prestigious forms – membership in the National Academy of Sciences, the Gold Medal of the Royal Astronomical Society, and the National Medal of Science, to name a few of her well-deserved honors. Much has been made of the fact that this list does not include a Nobel Prize, but I never heard Vera express disappointment about that, or even aspiration to it. Quite the contrary, she, like most modest people, didn’t seem to consider it to be appropriate. I think  part of the reason for this was that she self-identified as an astronomer, not as a physicist (as some publications mis-report). That distinction is worthy of an entire post so I’ll leave it for now.

Astronomer though she was, her work certainly had an outsized impact on physics. I have written before as to why she was deserving of a Nobel Prize, if for slightly different reasons than others give. But I do not dread that she died in any way disappointed by the lack of a Nobel Prize. It was not her nature to fret about such things.

Nevertheless, Vera was an obvious scientist to recognize with a Nobel Prize. No knowledgeable scientist would have disputed her as a choice. And yet the history of the physics Nobel prize is incredibly lacking in female laureates (see definition 4). Only two women have been recognized in the entire history of the award: Marie Curie (1903) and Maria Goeppert-Mayer (1963). She was an obvious woman to have honored in this way. It is hard to avoid the conclusion that the awarding of the prize is inherently sexist. Based on two data points, it has become more sexist over time, as there is a longer gap between now and the last award to a woman (63 years) than between the two awards (60 years).

Why should gender play any role in the search for knowledge? Or the recognition of discoveries made in that search? And yet women scientists face antiquated attitudes and absurd barriers all the time. Not just in the past. Now.

Vera was always a strong advocate of women in science. She has been an inspiration to many. A Nobel prize awarded to Vera Rubin would have been great for her, yes, but the greater tragedy of this missed opportunity is what it would have meant to all the women who are scientists now and who will be in the future.

Well, those are meta-issues raised by Vera’s passing. I don’t think it is inappropriate, because these were issues dear to her heart. I know the world is a better place for her efforts. But I hadn’t intended to go off on meta-tangents. Vera was a very real, warm, positive human being. So I what I had meant to do was recollect a few personal anecdotes. These seem so inadequate: brief snippets in a long and expansive life. Worse, they are my memories, so I can’t see how to avoid making it at least somewhat about me when it should be entirely about her. Still. Here are a few of the memories I have of her.

I first met Vera in 1985 on Kitt Peak. In retrospect I can’t imagine a more appropriate setting. But at the time it was only my second observing run, and I had no clue as to what was normal or particularly who Vera Rubin was. She was just another astronomer at the dinner table before a night of observing.

A very curious astronomer. She kindly asked what I was working on, and followed up with a series of perceptive questions. She really wanted to know. Others have remarked on her ability to make junior people feel important, and she could indeed do that. But I don’t think she tried, in particular. She was just genuinely curious.

At the time, I was a senior about to graduate from MIT. I had to beg permission to take some finals late so I could attend this observing run. My advisor, X-ray astronomer George Whipple Clark, kindly bragged about how I had actually got my thesis in on time (most students took advantage of a default one-week grace period) in order to travel to Kitt Peak. Vera, ever curious, asked about my thesis, what galaxies were involved, how the data were obtained… all had been from a run the semester before. As this became clear, Vera got this bemused look and asked “What kind of thesis can be written from a single observing run?” “A senior thesis!” I volunteered: undergraduate observers were rare on the mountain in those days; up till that point I think she had assumed I was a grad student.

I encountered Vera occasionally over the following years, but only in passing. In 1995, she offered me a Carnegie fellowship at DTM. This was a reprieve in a tight job market. As it happened, we were both visiting the Kapteyn Institute, and Renzo Sancisi had invited us both to dinner, so she took the opportunity to explain that their initial hire had moved on to a faculty position so the fellowship was open again. She managed to do this without making me feel like an also-ran. I had recently become interested in MOND, and here was the queen of dark matter offering me a job I desperately needed. It seemed right to warn her, so I did: would she have a problem with a postdoc who worked on MOND? She was visibly shocked, but only for an instant. “Of course not,” she said. “As a Carnegie Fellow, you can work on whatever you want.”

Vera was very supportive throughout my time at DTM, and afterwards. We had many positive scientific interactions, but we didn’t really work together then. I tried to get her interested in the rotation curves of low surface brightness galaxies, but she had a full plate. It wasn’t until a couple of years after I left DTM that we started collaborating.

fig3
Figure made by Vera Rubin from her measurements of the rotation curves of low surface brightness galaxies. Published in McGaugh, Rubin, & de Blok (2001).

Vera loved to measure. The reason I chose the picture featured at top is that it shows her doing what she loved. By the time we collaborated, she had moved on to using a computer to measure line positions for velocities. But that is what she loved to do. She did all the measurements for the rotation curves we measured, like the ones shown above. As the junior person, I had expected to do all that work, but she wanted to do it. Then she handed it on to me to write up, with no expectation of credit. It was like she was working for me as a postdoc. Vera Rubin was an awesome postdoc!

She also loved to observe. Mostly that was a typically positive, fruitful experience. But she did have an intense edge that rarely peaked out. One night on Las Campanas, the telescope broke. This is not unusual, and we took it in stride. For a half hour or so. Then Vera started calmly but assertively asking the staff why we were not yet back up and working. Something was very wrong, and it involved calling in extra technicians who led us into the mechanical bowels of the du Pont telescope, replete with steel cables and unidentifiable steam-punk looking artifacts. Vera watched them like a hawk. She never said a negative word. But she silently, intently watched them. Tension mounted; time slowed to a crawl till it seemed that I could feel like a hard rain the impact of every photon that we weren’t collecting. She wanted those photons. Never said a negative word, but I’m sure the staff felt a wall of pressure that I was keenly aware of merely standing in its proximity. Perhaps like a field mouse under a raptor’s scrutiny.

Vera was not normally like that, but every good observer has in her that urgency to get on sky. This was the only time I saw it come out. Other typical instrumental guffaws she bore in stride. This one took too long. But it did get fixed, and we were back on sky, and it was as if there had never been a problem in the world.

Ultimately, Vera loved the science. She was one of the most intrinsically curious souls I ever met. She wanted to know, to find out what was going on up there. But she was also content with what the universe chose to share, reveling in the little discoveries as much as the big ones. Why does the Hα emission extend so far out in UGC 2885? What is the kinematic major axis of DDO 154, anyway? Let’s put the slit in a few different positions and work it out. She kept a cheat sheet taped on her desk for how the rotation curve changed if the position angle were missed – which never happened, because she prepared so carefully for observing runs. She was both thorough and extremely good at what she did.

Vera was very positive about the discoveries of others. Like all good astronomers, she had a good BS detector. But she very rarely said a negative word. Rarely, not never. She was not a fan of Chandrasekhar, who was the editor of the ApJ when she submitted her dissertation paper there. Her advisor, Gamow, had posed the question to her, is there a length scale in the sky? Her answer would, in the modern parlance, be called the correlation length of galaxies. Chandrasekhar declined to consider publishing this work, explaining in a letter that he had a student working on the topic, and she should wait for the right answer. The clear implication was that this was a man’s job, and the work of a woman was not to be trusted. Ultimately her work was published in the proceedings of the National Academy, of which Gamow was a member. He had predicted that this is how Chandrasekhar would behave, afterwards sending her a postcard saying only “Told you so.”

On another occasion, in the mid-90s when “standard” CDM meant SCDM with Ωm = 1, not ΛCDM, she confided to me in hushed tones that the dark matter had to be baryonic. Other eminent dynamicists have said the same thing to me at times, always in the same hushed tones, lest the cosmologists overhear. As well they might. To my ears this was an absurdity, and I know well the derision it would bring. What about Big Bang Nucleosynthesis? This was the only time I recall hearing Vera scoff. “If I told the theorists today that I could prove Ωm = 1, tomorrow they would explain that away.”

I was unconvinced. But it made clear to me that I put a lot of faith in Big Bang Nucleosynthesis, and this need not be true for all intelligent scientists. Vera – and the others I allude to, who still live so I won’t name – had good reasons for her assertion. She had already recognized that there was a connection between the baryon distribution and the dynamics of galaxies, and that this made a lot more sense if the dark and luminous component were closely related – for example, if the dark matter – or at least some important fraction of it in galaxies – were itself baryonic. Even if we believe in Big Bang Nucleosynthesis, we’re still missing a lot of baryons.

The proper interpretation of this evidence is still debated today. What I learned from this was to be more open to the possibility that things I thought I knew for sure might turn out to be wrong. After all, that pretty much sums up the history of cosmology.

It was widely reported that Vera discovered dark matter or “proved” or “confirmed” its existence. I don’t think Vera would agree with this assessment, nor would many of her colleagues at DTM. I know this because we talked about it. A lot.

To my mind, what Vera discovered is both more specific and more profound than the dark matter paradigm it helped to create. What she discovered observationally is that rotation curves are very nearly flat, and continue to be so to indefinitely large radius. Over and over again, for every galaxy in the sky. It is a law of nature for galaxies, akin to Kepler’s laws for planets. Dark matter is an inference, a subsidiary result. It is just one possible interpretation, a subset of amazing and seemingly unlikely possibilities opened up by her discovery.

The discovery itself is amazing enough without conflating it with dark matter or MOND or any other flavor of interpretation of which the reader might be fond. Like many great discoveries, it has many parents. I would give a lot of credit to Albert Bosma, but there are also others who had early results, like Mort Roberts and Seth Shostak. But it was Vera whose persistence overcame the knee-jerk conservatism of cosmologists like Sandage, who she said dismissed her early flat rotation curve of M31 (obtained in collaboration with Roberts) as “the effect of looking at a bright galaxy.” “What does that even mean?” she asked me rhetorically. She also recalled Jim Gunn gasping “But… that would mean most of the mass is dark!” Indeed. It takes time to wrap our heads around these things. She obtained rotation curve after rotation curve in excess of a hundred to ensure we realized we had to do so.

Vera realized the interpretation was never as settled as the data. Her attitude (and that of many of us, including myself) is nicely summarized by her exchange with Tohline at the end of her 1982 talk at IAU 100. One starts with the most conservative – or at least, least outrageous – possibility, which at that time was a mere factor of two in hidden mass, which could easily have been baryonic. Yet much more more recently, at the last conference I attended with her (in 2009), she reminded the audience (to some visible consternation) that it was still “early days” for dark matter, and we should not be surprised to be surprised – up to, and including, how gravity works.

At this juncture, I expect some readers will accuse me of what I warned about above: using this for my own agenda. I have found it is impossible to avoid having an agenda imputed to me by people who don’t like what they imagine my agenda to be, whether they imagine right or not – usually not. But I can’t not say these things if I want to set the record straight – these were Vera’s words. She remained concerned all along that it might be gravity to blame rather than dark matter. Not convinced, nor even giving either the benefit of the doubt. There was, and remains, so much to figure out.

“Early days.”

I suppose, in the telling, it is often more interesting to relate matters of conflict and disagreement than feelings of goodwill. In that regards, some of the above anecdotes are atypical: Vera was a very positive person. It just isn’t compelling to relate episodes like her gushing praise for Rodrigo Ibata’s discovery of the Sagittarius dwarf satellite galaxy. I probably only remember that myself because I had, like Rodrigo, encountered considerable difficulty in convincing some at Cambridge that there could be lots of undiscovered low surface brightness galaxies out there, even in the Local Group. Some of these same people now seem to take for granted that there are a lot more in the Local Group than I find plausible.

I have been fortunate in my life to have known many talented scientists. I have met many people from many nations, most of them warm, wonderful human beings. Vera was the best of the best, both as a scientist and as a human being. The world is a better place for having had her in it, for a time.

Crater 2: the Bullet Cluster of LCDM

Crater 2: the Bullet Cluster of LCDM

Recently I have been complaining about the low standards to which science has sunk. It has become normal to be surprised by an observation, express doubt about the data, blame the observers, slowly let it sink in, bicker and argue for a while, construct an unsatisfactory model that sort-of, kind-of explains the surprising data but not really, call it natural, then pretend like that’s what we expected all along. This has been going on for so long that younger scientists might be forgiven if they think this is how science is suppose to work. It is not.

At the root of the scientific method is hypothesis testing through prediction and subsequent observation. Ideally, the prediction comes before the experiment. The highest standard is a prediction made before the fact in ignorance of the ultimate result. This is incontrovertibly superior to post-hoc fits and hand-waving explanations: it is how we’re suppose to avoid playing favorites.

I predicted the velocity dispersion of Crater 2 in advance of the observation, for both ΛCDM and MOND. The prediction for MOND is reasonably straightforward. That for ΛCDM is fraught. There is no agreed method by which to do this, and it may be that the real prediction is that this sort of thing is not possible to predict.

The reason it is difficult to predict the velocity dispersions of specific, individual dwarf satellite galaxies in ΛCDM is that the stellar mass-halo mass relation must be strongly non-linear to reconcile the steep mass function of dark matter sub-halos with their small observed numbers. This is closely related to the M*-Mhalo relation found by abundance matching. The consequence is that the luminosity of dwarf satellites can change a lot for tiny changes in halo mass.

apj374168f11_lr
Fig. 11 from Tollerud et al. (2011, ApJ, 726, 108). The width of the bands illustrates the minimal scatter expected between dark halo and measurable properties. A dwarf of a given luminosity could reside in dark halos differing be two decades in mass, with a corresponding effect on the velocity dispersion.

Long story short, the nominal expectation for ΛCDM is a lot of scatter. Photometrically identical dwarfs can live in halos with very different velocity dispersions. The trend between mass, luminosity, and velocity dispersion is so weak that it might barely be perceptible. The photometric data should not be predictive of the velocity dispersion.

It is hard to get even a ballpark answer that doesn’t make reference to other measurements. Empirically, there is some correlation between size and velocity dispersion. This “predicts” σ = 17 km/s. That is not a true theoretical prediction; it is just the application of data to anticipate other data.

Abundance matching relations provide a highly uncertain estimate. The first time I tried to do this, I got unphysical answers (σ = 0.1 km/s, which is less than the stars alone would cause without dark matter – about 0.5 km/s). The application of abundance matching requires extrapolation of fits to data at high mass to very low mass. Extrapolating the M*-Mhalo relation over many decades in mass is very sensitive to the low mass slope of the fitted relation, so it depends on which one you pick.

he-chose-poorly

Since my first pick did not work, lets go with the value suggested to me by James Bullock: σ = 11 km/s. That is the mid-value (the blue lines in the figure above); the true value could easily scatter higher or lower. Very hard to predict with any precision. But given the luminosity and size of Crater 2, we expect numbers like 11 or 17 km/s.

The measured velocity dispersion is σ = 2.7 ± 0.3 km/s.

This is incredibly low. Shockingly so, considering the enormous size of the system (1 kpc half light radius). The NFW halos predicted by ΛCDM don’t do that.

To illustrate how far off this is, I have adopted this figure from Boylan-Kolchin et al. (2012).

mbkplusdwarfswcraterii
Fig. 1 of MNRAS, 422, 1203 illustrating the “too big to fail” problem: observed dwarfs have lower velocity dispersions than sub-halos that must exist and should host similar or even more luminous dwarfs that apparently do not exist. I have had to extend the range of the original graph to lower velocities in order to include Crater 2.

Basically, NFW halos, including the sub-halos imagined to host dwarf satellite galaxies, have rotation curves that rise rapidly and stay high in proportion to the cube root of the halo mass. This property makes it very challenging to explain a low velocity at a large radius: exactly the properties observed in Crater 2.

Lets not fail to appreciate how extremely wrong this is. The original version of the graph above stopped at 5 km/s. It didn’t extend to lower values because they were absurd. There was no reason to imagine that this would be possible. Indeed, the point of their paper was that the observed dwarf velocity dispersions were already too low. To get to lower velocity, you need an absurdly low mass sub-halo – around 107 M. In contrast, the usual inference of masses for sub-halos containing dwarfs of similar luminosity is around 109 Mto 1010 M. So the low observed velocity dispersion – especially at such a large radius – seems nigh on impossible.

More generally, there is no way in ΛCDM to predict the velocity dispersions of particular individual dwarfs. There is too much intrinsic scatter in the highly non-linear relation between luminosity and halo mass. Given the photometry, all we can say is “somewhere in this ballpark.” Making an object-specific prediction is impossible.

Except that it is possible. I did it. In advance.

The predicted velocity dispersion is σ = 2.1 +0.9/-0.6 km/s.

I’m an equal opportunity scientist. In addition to ΛCDM, I also considered MOND. The successful prediction is that of MOND. (The quoted uncertainty reflects the uncertainty in the stellar mass-to-light ratio.) The difference is that MOND makes a specific prediction for every individual object. And it comes true. Again.

MOND is a funny theory. The amplitude of the mass discrepancy it induces depends on how low the acceleration of a system is. If Crater 2 were off by itself in the middle of intergalactic space, MOND would predict it should have a velocity dispersion of about 4 km/s.

But Crater 2 is not isolated. It is close enough to the Milky Way that there is an additional, external acceleration imposed by the Milky Way. The net result is that the acceleration isn’t quite as low as it would be were Crater 2 al by its lonesome. Consequently, the predicted velocity dispersion is a measly 2 km/s. As observed.

In MOND, this is called the External Field Effect (EFE). Theoretically, the EFE is rather disturbing, as it breaks the Strong Equivalence Principle. In particular, Local Position Invariance in gravitational experiments is violated: the velocity dispersion of a dwarf satellite depends on whether it is isolated from its host or not. Weak equivalence (the universality of free fall) and the Einstein Equivalence Principle (which excludes gravitational experiments) may still hold.

We identified several pairs of photometrically identical dwarfs around Andromeda. Some are subject to the EFE while others are not. We see the predicted effect of the EFE: isolated dwarfs have higher velocity dispersions than their twins afflicted by the EFE.

If it is just a matter of sub-halo mass, the current location of the dwarf should not matter. The velocity dispersion certainly should not depend on the bizarre MOND criterion for whether a dwarf is affected by the EFE or not. It isn’t a simple distance-dependency. It depends on the ratio of internal to external acceleration. A relatively dense dwarf might still behave as an isolated system close to its host, while a really diffuse one might be affected by the EFE even when very remote.

When Crater 2 was first discovered, I ground through the math and tweeted the prediction. I didn’t want to write a paper for just one object. However, I eventually did so because I realized that Crater 2 is important as an extreme example of a dwarf so diffuse that it is affected by the EFE despite being very remote (120 kpc from the Milky Way). This is not easy to reproduce any other way. Indeed, MOND with the EFE is the only way that I am aware of whereby it is possible to predict, in advance, the velocity dispersion of this particular dwarf.

If I put my ΛCDM hat back on, it gives me pause that any method can make this prediction. As discussed above, this shouldn’t be possible. There is too much intrinsic scatter in the halo mass-luminosity relation.

If we cook up an explanation for the radial acceleration relation, we still can’t make this prediction. The RAR fit we obtained empirically predicts 4 km/s. This is indistinguishable from MOND for isolated objects. But the RAR itself is just an empirical law – it provides no reason to expect deviations, nor how to predict them. MOND does both, does it right, and has done so before, repeatedly. In contrast, the acceleration of Crater 2 is below the minimum allowed in ΛCDM according to Navarro et al.

For these reasons I consider Crater 2 to be the bullet cluster of ΛCDM. Just as the bullet cluster seems like a straight-up contradiction to MOND, so too does Crater 2 for ΛCDM. It is something ΛCDM really can’t do. The difference is that you can just look at the bullet cluster. With Crater 2 you actually have to understand MOND as well as ΛCDM, and think it through.

So what can we do to save ΛCDM?

Whatever it takes, per usual.

One possibility is that Crater II may represent the “bright” tip of the extremely low surface brightness “stealth” fossils predicted by Bovill & Ricotti. Their predictions are encouraging for getting the size and surface brightness in the right ballpark. But I see no reason in this context to expect such a low velocity dispersion. They anticipate dispersions consistent with the ΛCDM discussion above, and correspondingly high mass-to-light ratios that are greater than observed for Crater 2 (M/L ≈ 104 rather than ~50).

plausible suggestion I heard was from James Bullock. While noting that reionization should preclude the existence of galaxies in halos below 5 km/s, as we need for Crater 2, he suggested that tidal stripping could reduce an initially larger sub-halo to this point. I am dubious about this, as my impression from the simulations of Penarrubia  was that the outer regions of the sub-halo were stripped first while leaving the inner regions (where the NFW cusp predicts high velocity dispersions) largely intact until near complete dissolution. In this context, it is important to bear in mind that the low velocity dispersion of Crater 2 is observed at large radii (1 kpc, not tens of pc). Still, I can imagine ways in which this might be made to work in this particular case, depending on its orbit. Tony Sohn has an HST program to measure the proper motion; this should constrain whether the object has ever passed close enough to the center of the Milky Way to have been tidally disrupted.

Josh Bland-Hawthorn pointed out to me that he made simulations that suggest a halo with a mass as low as 107 Mcould make stars before reionization and retain them. This contradicts much of the conventional wisdom outlined above because they find a much lower (and in my opinion, more realistic) feedback efficiency for supernova feedback than assumed in most other simulations. If this is correct (as it may well be!) then it might explain Crater 2, but it would wreck all the feedback-based explanations given for all sorts of other things in ΛCDM, like the missing satellite problem and the cusp-core problem. We can’t have it both ways.

maxresdefault
Without super-efficient supernova feedback, the Local Group would be filled with a million billion ultrafaint dwarf galaxies!

I’m sure people will come up with other clever ideas. These will inevitably be ad hoc suggestions cooked up in response to a previously inconceivable situation. This will ring hollow to me until we explain why MOND can predict anything right at all.

In the case of Crater 2, it isn’t just a matter of retrospectively explaining the radial acceleration relation. One also has to explain why exceptions to the RAR occur following the very specific, bizarre, and unique EFE formulation of MOND. If I could do that, I would have done so a long time ago.

No matter what we come up with, the best we can hope to do is a post facto explanation of something that MOND predicted correctly in advance. Can that be satisfactory?

Pulp Science

Pulp Science

g1_pulp_fiction

Vincent: Want to talk about MOND?

Jules: No man, I don’t consider MOND.

Vincent: Are you biased?

Jules: Nah, I ain’t biased, I just don’t dig MOND, that’s all.

Vincent: Why not?

Jules: MOND is an ugly theory. I don’t consider ugly theories.

Vincent: MOND makes predictions that come true. Fits galaxy data gooood.

Jules: Hey, MOND may fit every galaxy in the universe, but I’d never know ’cause I wouldn’t consider the ugly theory. MOND has no generally covariant extension. That’s an ugly theory. I ain’t considering nothin’ that ain’t got a proper cosmology.

Vincent: How about ΛCDM? ΛCDM has lots of small scale problems.

Jules: I don’t care about small scale problems.

Vincent: Yeah, but do you consider ΛCDM to be an ugly theory?

Jules: I wouldn’t go so far as to call ΛCDM ugly, but it’s definitely fine-tuned. But, ΛCDM’s got the CMB. The CMB goes a long way.

Vincent: Ah, so by that rationale, if a theory of modified dynamics fit the CMB, it would cease to be an ugly theory. Is that true?

Jules: Well, we’d have to be talkin’ about one charming eff’n theory of modified dynamics. I mean, it’d have to be ten times more charmin’ than MOND, you know what I’m sayin’?