The kids are all right, but they can’t interpret a graph

The kids are all right, but they can’t interpret a graph

I have not posted here in a while. This is mostly due to the fact that I have a job that is both engaging and demanding. I started this blog as a way to blow off steam, but I realized this mostly meant ranting about those fools at the academy! of whom there are indeed plenty. These are reality based rants, but I’ve got better things to do.

As it happens, I’ve come down with a bug that keeps me at home but leaves just enough energy to read and type, but little else. This is an excellent recipe for inciting a rant. Reading the Washington Post article on delayed gratification in children brings it on.

It is not really the article that gets me, let alone the scholarly paper on which it is based. I have not read the latter, and have no intention of doing so. I hope its author has thought through the interpretation better than is implied by what I see in the WaPo article. That is easy for me to believe; my own experience is that what academics say to the press has little to do with what eventually appears in the press – sometimes even inverting its meaning outright. (At one point I was quoted as saying that dark matter experimentalists should give up, when what I had said was that it was important to pursue these experiments to their logical conclusion, but that we also needed to think about what would constitute a logical conclusion if dark matter remains undetected.)

So I am at pains to say that my ire is not directed at the published academic article. In this case it isn’t even directed at the article in the WaPo, regardless of whether it is a fair representation of the academic work or not. My ire is directed entirely at the interpretation of a single graph, which I am going to eviscerate.

The graph in question shows the delay time measured in psychology experiments over the years. It is an attempt to measure self-control in children. When presented with a marshmallow but told they may have two marshmallows if they wait for it, how long can they hold out? This delayed gratification is thought to be a measure of self-control that correlates positively with all manners of subsequent development. Which may indeed be true. But what can we learn from this particular graph?

marshmallow_test-1

The graph plots the time delay measured from different experiments against the date of the experiment. Every point (plotted as a marshmallow – cute! I don’t object to that) represents an average over many children tested at that time. Apparently they have been “corrected” to account for the age of the children (one gets better at delayed gratification as one matures) which is certainly necessary, but it also raises a flag. How was the correction made? Such details can matter.

However, my primary concern is more basic. Do the data, as shown, actually demonstrate a trend?

To answer this question for yourself, the first thing you have to be able to do is mentally remove the line. That big black bold line that so nicely connects the dots. Perhaps it is a legitimate statistical fit of some sort. Or perhaps it is boldface to [mis]guide the eye. Doesn’t matter. Ignore it. Look at the data.

The first thing I notice about the data are the outliers – in this case, 3 points at very high delay times. These do not follow the advertised trend, or any trend. Indeed, they seem in no way related to the other data. It is as if a different experiment had been conducted.

When confronted with outlying data, one has a couple of choices. If we accept that these data are correct and from the same experiment, then there is no trend: the time of delayed gratification could be pretty much anything from a minute to half an hour. However, the rest of the data do clump together, so the other option is that these outliers are not really representing the same thing as the rest of the data, and should be ignored, or at least treated with less weight.

The outliers may be the most striking part of the data set, but they are usually the least important. There are all sorts of statistical measures by which to deal with them. I do not know which, if any, have been applied. There are no error bars, no boxes representing quartiles or some other percentage spanned by the data each point represents. Just marshmallows. Now I’m a little grumpy about the cutesy marshmallows. All marshmallows are portrayed as equal, but are some marshmallows more equal than others? This graph provides no information on this critical point.

In the absence of any knowledge about the accuracy of each marshmallow, one is forced to use one’s brain. This is called judgement. This can be good or bad. It is possible to train the brain to be a good judge of these things – a skill that seems to be in decline these days.

What I see in the data are several clumps of points (disregarding the outliers). In the past decade there are over a dozen points all clumped together around an average of 8 minutes. That seems like a pretty consistent measure of the delayed gratification of the current generation of children.

Before 2007, the data are more sparse. There are a half a dozen points on either side of 1997. These have a similar average of 7 or 8 minutes.

Before that there are very little data. What there is goes back to the sixties. One could choose to see that as two clumps of three points, or one clump of six points. If one does the latter, the mean is around 5 minutes. So we had a “trend” of 5 minutes circa 1970, 7 minutes circa 1997, and 8 minutes circa 2010. That is an increase over time, but it is also a tiny trend – much less persuasive than the heavy solid line in the graph implies.

If we treat the two clumps of three separately – as I think we should, since they sit well apart from each other – then we have to choose which to believe. They aren’t consistent. The delay time in 1968 looks to have an average of two minutes; in 1970 it looks to be 8 minutes. So which is it?

According to the line in the graph, we should believe the 1968 data and not the 1970 data. That is, the 1968 data fall nicely on the line, while the 1970 data fall well off it. In percentage terms, the 1970 data are as far from the trend as the highest 2010 point that we rejected as an outlier.

When fitting a line, the slope of the line can be strongly influence by the points at its ends. In this case, the earliest and the latest data. The latest data seem pretty consistent, but the earliest data are split. So the slope depends entirely on which clump of three early points you choose to believe.

If we choose to believe the 1970 clump, then the “trend” becomes 8 minutes in 1970, 7 minutes in 1997, 8 minutes in 2010. Which is to say, no trend at all. Try disregarding the first three (1968) points and draw your own line on this graph. Without them, it is pretty flat. In the absence of error bars and credible statistics, I would conclude that there is no meaningful trend present in the data at all. Maybe a formal fit gives a non-zero slope, but I find it hard to believe it is meaningfully non-zero.

None of this happens in a vacuum. Lets step back and apply some external knowledge. Have people changed over the 5 decades of my life?

The contention of the WaPo article is that they have. Specifically, contrary to the perception that iPhones and video games have created a generation with a cripplingly short attention span (congrats if you made it this far!), in fact the data show the opposite. The ability of children to delay gratification has improved over the time these experiments have been conducted.

What does the claimed trend imply? If we take it literally, then extrapolating back in time, the delay time goes to zero around 1917. People in the past must have been completely incapable of delaying gratification for even an instant. This was a power our species only developed in the past century.

I hope that sounds implausible. If there is no trend, which is what the data actually show, then children a half century ago were much the same as children a generation ago are much the same as the children of today. So the more conservative interpretation of the graph would be that human nature is rather invariant, at least as indicated by the measure of delayed gratification in children.

Sadly, null results are dull. There well may be a published study reporting no trend, but it doesn’t get picked up by the Washington Post. Imagine the headline: “Children today are much the same as they’ve always been!” Who’s gonna click on that? In this fashion, even reputable news sources contribute to the scourge of misleading science and fake news that currently pollutes our public discourse.

ghostbusters-columbia
They expect results!

This sort of over-interpretation of weak trends is rife in many fields. My own, for example. This is why I’m good at spotting them. Fortunately, screwing up in Astronomy seldom threatens life and limb.

Then there is Medicine. My mother was a medical librarian; I occasionally browsed their journals when waiting for her at work. Graphs for the efficacy of treatments that looked like the marshmallow graph were very common. Which is to say, no effect was in evidence, but it was often portrayed as a positive trend. They seem to be getting better lately (which is to say, at some point in the not distant past some medical researchers were exposed to basic statistics), but there is an obvious pressure to provide a treatment, even if the effect of the available course of treatment is tiny. Couple that to the aggressive marketing of drugs in the US, and it would not surprise me if many drugs have been prescribed based on efficacy trends weaker than seen in the marshmallow graph. See! There is a line with a positive slope! It must be doing some good!

Another problem with data interpretation is in the corrections applied. In the case of marshmallows, one must correct for the age of the subject: an eight year old can usually hold out longer than a toddler. No doubt there are other corrections. The way these are usually made is to fit some sort of function to whatever trend is seen with age in a particular experiment. While that trend may be real, it also has scatter (I’ve known eight year olds who couldn’t out wait a toddler), which makes it dodgy to apply. Do all experiments see the same trend? It is safe to apply the same correction to all of them? Worse, it is often necessary to extrapolate these corrections beyond where they are constrained by data. This is known to be dangerous, as the correction can become overlarge upon extrapolation.

It would not surprise me if the abnormally low points around 1968 were over-corrected in some way. But then, it was the sixties. Children may have not changed much since then, but the practice of psychology certainly has. Lets consider the implications that has for comparing 1968 data to 2017 data.

The sixties were a good time for psychological research. The field had grown enormously since the time of Freud and was widely respected. However, this was also the time when many experimental psychologists thought psychotropic drugs were a good idea. Influential people praised the virtues of LSD.

My father was a grad student in psychology in the sixties. He worked with swans. One group of hatchlings imprinted on him. When they grew up, they thought they should mate with people – that’s what their mom looked like, after all. So they’d and make aggressive displays towards any person (they could not distinguish human gender) who ventured too close.

He related the anecdote of a colleague who became interested in the effect of LSD on animals. The field was so respected at the time that this chap was able to talk the local zoo into letting him inject an elephant with LSD. What could go wrong?

Perhaps you’ve heard the expression “That would have killed a horse! Fortunately, you’re not a horse.” Well, the fellow in question figured elephants were a lot bigger than people. So he scaled up the dose by the ratio of body mass. Not, say, the ratio of brain size, or whatever aspect of the metabolism deals with LSD.

That’s enough LSD to kill an elephant.

Sad as that was for the elephant, who is reputed to have been struck dead pretty much instantly – no tripping rampage preceded its demise – my point here is that these were the same people conducting the experiments in 1968. Standards were a little different. The difference seen in the graph may have more to do with differences in the field than with differences in the subjects.

That is not to say we should simply disregard old data. The date on which an observation is made has no bearing on its reliability. The practice of the field at that time does.

The 1968 delay times are absurdly low. All three are under four minutes. Such low delay times are not reproduced in any of the subsequent experiments. They would be more credible if the same result were even occasionally reproduced. It ain’t.

Another way to look at this is that there should be a comparable number of outliers on either side of the correct trend. That isn’t necessarily true – sometimes systematic errors push in a single direction – but in the absence of knowledge of such effects, one would expect outliers on both the high side and the low side.

In the marshmallow graph, with the trend as drawn, there are lots of outliers on the high side. There are none on the low side. [By outlier, I mean points well away from the trend, not just scattered a little to one side or the other.]

If instead we draw a flat line at 7 or 8 minutes, then there are three outliers on both sides. The three very high points, and the three very low points, which happen to occur around 1968. It is entirely because the three outliers on the low side happen at the earliest time that we get even the hint of a trend. Spread them out, and they would immediately be dismissed as outliers – which is probably what they are. Without them, there is no significant trend. This would be the more conservative interpretation of the marshmallow graph.

Perhaps those kids in 1968 were different in other ways. The experiments were presumably conducted in psychology departments on university campuses in the late sixties. It was OK to smoke inside back then, and not everybody restricted themselves to tobacco in those days. Who knows how much second hand marijuana smoke was inhaled just to getting to the test site? I jest, but the 1968 numbers might just measure the impact on delayed gratification when the subject gets the munchies.

ancient-aliens
Marshmallows.

 

Advertisements

Ain’t no cusps here

Ain’t no cusps here

It has been twenty years since we coined the phrase NFW halo to describe the cuspy halos that emerge from dark matter simulations of structure formation. Since that time, observations have persistently contradicted this fundamental prediction of the cold dark matter cosmogony. There have, of course, been some theorists who cling to the false hope that somehow it is the data to blame and not a shortcoming of the model.

That this false hope has persisted in some corners for so long is a tribute to the power of ideas over facts and the influence that strident personalities wield over the sort objective evaluation we allegedly value in science. This history is a bit like this skit by Arsenio Hall. Hall is pestered by someone calling, demanding Thelma. Just substitute “cusps” for “Thelma” and that pretty much sums it up.

All during this time, I have never questioned the results of the simulations. While it is a logical possibility that they screwed something up, I don’t think that is likely. Moreover, it is inappropriate to pour derision on one’s scientific colleagues just because you disagree. Such disagreements are part and parcel of the scientific method. We don’t need to be jerks about it.

But some people are jerks about it. There are some – and merely some, certainly not all – theorists who make a habit of pouring scorn on the data for not showing what they want it to show. And that’s what it really boils down to. They’re so sure that their models are right that any disagreement with data must be the fault of the data.

This has been going on so long that in 1996, George Efstathiou was already making light of it in his colleagues, in the form of the Frenk Principle:

“If the Cold Dark Matter Model does not agree with observations, there must be physical processes, no matter how bizarre or unlikely, that can explain the discrepancy.”

There are even different flavors of the Strong Frenk Principle:

1: “The physical processes must be the most bizarre and unlikely.”
2: “If we are incapable of finding any physical processes to explain the discrepancy between CDM models and observations, then observations are wrong.”

In the late ’90s, blame was frequently placed on beam smearing. The resolution of 21 cm data cubes at that time was typically 13 to 30 arcseconds, which made it challenging to resolve the shape of some rotation curves. Some but not all. Nevertheless, beam smearing became the default excuse to pretend the observations were wrong.

This persisted for a number of years, until we obtained better data – long slit optical spectra with 1 or 2 arcsecond resolution. These data did show up a few cases where beam smearing had been a legitimate concern. It also confirmed the rotation curves of many other galaxies where it had not been.

So they made up a different systematic error. Beam smearing was no longer an issue, but longslit data only gave a slice along the major axis, not the whole velocity field. So it was imagined that we observers had placed the slits in the wrong place, thereby missing the signature of the cusps.

This was obviously wrong from the start. It boiled down to an assertion that Vera Rubin didn’t know how to measure rotation curves. If that were true, we wouldn’t have dark matter in the first place. The real lesson of this episode was to never underestimate the power of cognitive dissonance. People believed one thing about the data quality when it agreed with their preconceptions (rotation curves prove dark matter!) and another when it didn’t (rotation curves don’t constrain cusps!)

Whatwesaytotheorists

So, back to the telescope. Now we obtained 2D velocity fields at optical resolution (a few arcseconds). When you do this, there is no where for a cusp to hide. Such a dense concentration makes a pronounced mark on the velocity field.

NFWISOvelocityfield
Velocity fields of the inner parts of zero stellar mass disks embedded in an NFW halo (left panel) and a pseudo-isothermal (ISO) halo (right panel). The velocity field is seen under an inclination angle of 60°, and a PA of 90°. The boxes measure 5 × 5 kpc2. The vertical minor-axis contour is 0 km s−1, increasing in steps of 10 km s−1 outwards. The NFW halo parameters are c= 8.6 and V200= 100 km s−1, the ISO parameters are RC= 1 kpc and V= 100 km s−1. From de Blok et al. 2003, MNRAS, 340, 657 (Fig. 3).

To give a real world example (O’Neil et. al 2000; yes, we could already do this in the previous millennium), here is a galaxy with a cusp and one without:

UGC12687UGC12695vfields
The velocity field of UGC 12687, which shows the signature of a cusp (left), and UGC 12695, which does not (right). Both galaxies are observed in the same 21 cm cube with the same sensitivity, same resolution, etc.

It is easy to see the signature of a cusp in a 2D velocity field. You can’t miss it. It stands out like a sore thumb.

The absence of cusps is typical of dwarf and low surface brightness galaxies. In the vast majority of these, we see approximately solid body rotation, as in UGC 12695. This is incredibly reproducible. See, for example, the case of UGC 4325 (Fig. 3 of Bosma 2004), where six independent observations employing three distinct observational techniques all obtain the same result.

There are cases where we do see a cusp. These are inevitably associated with a dense concentration of stars, like a bulge component. There is no need to invoke dark matter cusps when the luminous matter makes the same prediction. Worse, it becomes ambiguous: you can certainly fit a cuspy halo by reducing the fractional contribution of the stars. But this only succeeds by having the dark matter mimic the light distribution. Maybe such galaxies do have cuspy halos, but the data do not require it.

All this was settled a decade ago. Most of the field has moved on, with many theorists trying to simulate the effects of baryonic feedback. An emerging consensus is that such feedback can transform cusps into cores on scales that matter to real galaxies. The problem then moves to finding observational tests of feedback: does it work in the real universe as it must do in the simulations in order to get the “right” result?

Not everyone has kept up with the times. A recent preprint tries to spin the story that non-circular motions make it hard to obtain the true circular velocity curve, and therefore we can still get away with cusps. Like all good misinformation, there is a grain of truth to this. It can indeed be challenging to get the precisely correct 1D rotation curve V(R) in a way that properly accounts for non-circular motions. Challenging but not impossible. Some of the most intense arguments I’ve had have been over how to do this right. But these were arguments among perfectionists about details. We agreed on the basic result.

arsenio
There ain’t no cusp here!

High quality data paint a clear and compelling picture. The data show an incredible amount of order in the form of Renzo’s rule, the Baryonic Tully-Fisher relation, and the Radial Acceleration Relation. Such order cannot emerge from a series of systematic errors. Models that fail to reproduce these observed relations can be immediately dismissed as incorrect.

The high degree of order in the data has been known for decades, and yet many modeling papers simply ignore these inconvenient facts. Perhaps the authors of such papers are simply unaware of them. Worse, some seem to be fooling themselves through the liberal application of the Frenk’s Principle. This places a notional belief system (dark matter halos must have cusps) above observational reality. This attitude has more in common with religious faith than with the scientific method.

Dwarf Galaxies on the Shoulders of Giants

Dwarf Galaxies on the Shoulders of Giants

The week of June 5, 2017, we held a workshop on dwarf galaxies and the dark matter problem. The workshop was attended by many leaders in the field – giants of dwarf galaxy research. It was held on the campus of Case Western Reserve University and supported by the John Templeton Foundation. It resulted in many fascinating discussions which I can’t possibly begin to share in full here, but I’ll say a few words.

Dwarf galaxies are among the most dark matter dominated objects in the universe. Or, stated more properly, they exhibit the largest mass discrepancies. This makes them great places to test theories of dark matter and modified gravity. By the end, we had come up with a few important tests for both ΛCDM and MOND. A few of these we managed to put on a white board. These are hardly a complete list, but provide a basis for discussion.

First, ΛCDM.

LCDM_whiteboard
A few issues for ΛCDM identified during the workshop.

UFDs in field: Over the past few years, a number of extremely tiny dwarf galaxies have been identified as satellites of the Milky Way galaxy. These “ultrafaint dwarfs” are vaguely defined as being fainter than 100,000 solar luminosities, with the smallest examples having only a few hundred stars. This is absurdly small by galactic standards, having the stellar content of individual star clusters within the Milky Way. Indeed, it is not obvious to me that all of the ultrafaint dwarfs deserve to be recognized as dwarf galaxies, as some may merely be fragmentary portions of the Galactic stellar halo composed of stars coincident in phase space. Nevertheless, many may well be stellar systems external to the Milky Way that orbit it as dwarf satellites.

That multitudes of minuscule dark matter halos exist is a fundamental prediction of the ΛCDM cosmogony. These should often contain ultrafaint dwarf galaxies, and not only as satellites of giant galaxies like the Milky Way. Indeed, one expects to see many ultrafaints in the “field” beyond the orbital vicinity of the Milky Way where we have found them so far. These are predicted to exist in great numbers, and contain uniformly old stars. The “old stars” portion of the prediction stems from the reionization of the universe impeding star formation in the smallest dark matter halos. Upcoming surveys like LSST should provide a test of this prediction.

From an empirical perspective, I do expect that we will continue to discover galaxies of ever lower luminosity and surface brightness. In the field, I expect that these will be predominantly gas rich dwarfs like Leo P rather than gas-free, old stellar systems like the satellite ultrafaints. My expectation is an extrapolation of past experience, not a theory-specific prediction.

No Large Cores: Many of the simulators present at the workshop showed that if the energy released by supernovae was well directed, it could reshape the steep (‘cuspy’) interior density profiles of dark matter halos into something more like the shallow (‘cored’) interiors that are favored by data. I highlight the if because I remain skeptical that supernova energy couples as strongly as required and assumed (basically 100%). Even assuming favorable feedback, there seemed to be broad (in not unanimous) consensus among the simulators present that at sufficiently low masses, not enough stars would form to produce the requisite energy. Consequently, low mass halos should not have shallow cores, but instead retain their primordial density cusps. Hence clear measurement of a large core in a low mass dwarf galaxy (stellar mass < 1 million solar masses) would be a serious problem. Unfortunately, I’m not clear that we quantified “large,” but something more than a few hundred parsecs should qualify.

Radial Orbit for Crater 2: Several speakers highlighted the importance of the recently discovered dwarf satellite Crater 2. This object has a velocity dispersion that is unexpectedly low in ΛCDM, but was predicted by MOND. The “fix” in ΛCDM is to imagine that Crater 2 has suffered a large amount of tidal stripping by a close passage of the Milky Way. Hence it is predicted to be on a radial orbit (one that basically just plunges in and out). This can be tested by measuring the proper motion of its stars with Hubble Space Telescope, for which there exists a recently approved program.

DM Substructures: As noted above, there must exist numerous low mass dark matter halos in the cold dark matter cosmogony. These may be detected as substructure in the halos of larger galaxies by means of their gravitational lensing even if they do not contain dwarf galaxies. Basically, a lumpy dark matter halo bends light in subtly but detectably different ways from a smooth halo.

No Wide Binaries in UFDs: As a consequence of dynamical friction against the background dark matter, binary stars cannot remain at large separations over a Hubble time: their orbits should decay. In the absence of dark matter, this should not happen (it cannot if there is nowhere for the orbital energy to go, like into dark matter particles). Thus the detection of a population of widely separated binary stars would be problematic. Indeed, Pavel Kroupa argued that the apparent absence of strong dynamical friction already excludes particle dark matter as it is usually imagined.

Short dynamical times/common mergers: This is related to dynamical friction. In the hierarchical cosmogony of cold dark matter, mergers of halos (and the galaxies they contain) must be frequent and rapid. Dark matter halos are dynamically sticky, soaking up the orbital energy and angular momentum between colliding galaxies to allow them to stick and merge. Such mergers should go to completion on fairly short timescales (a mere few hundred million years).

MOND

A few distinctive predictions for MOND were also identified.

MOND_whiteboard

Tangential Orbit for Crater 2: In contrast to ΛCDM, we expect that the `feeble giant’ Crater 2 could not survive a close encounter with the Milky Way. Even at its rather large distance of 120 kpc from the Milky Way, it is so feeble that it is not immune from the external field of its giant host. Consequently, we expect that Crater 2 must be on a more nearly circular orbit, and not on a radial orbit as suggested in ΛCDM. The orbit does not need to be perfectly circular of course, but is should be more tangential than radial.

This provides a nice test that distinguishes between the two theories. Either the orbit of Crater 2 is more radial or more tangential. Bear in mind that Crater 2 already constitutes a problem for ΛCDM. What we’re discussing here is how to close what is basically a loophole whereby we can excuse an otherwise unanticipated result in ΛCDM.

EFE: The External Field Effect is a unique prediction of MOND that breaks the strong equivalence principle. There is already clear if tentative evidence for the EFE in the dwarf satellite galaxies around Andromeda. There is no equivalent to the EFE in ΛCDM.

I believe the question mark was added on the white board to permit the logical if unlikely possibility that one could write a MOND theory with an undetectably small EFE.

Position of UFDs on RAR: We chose to avoid making the radial acceleration relation (RAR) a focus of the meeting – there was quite enough to talk about as it was – but it certainly came up. The ultrafaint dwarfs sit “too high” on the RAR, an apparent problem for MOND. Indeed, when I first worked on this subject with Joe Wolf, I initially thought this was a fatal problem for MOND.

My initial thought was wrong. This is not a problem for MOND. The RAR applies to systems in dynamical equilibrium. There is a criterion in MOND to check whether this essential condition may be satisfied. Basically all of the ultrafaints flunk this test. There is no reason to think they are in dynamical equilibrium, so no reason to expect that they should be exactly on the RAR.

Some advocates of ΛCDM seemed to think this was a fudge, a lame excuse morally equivalent to the fudges made in ΛCDM that its critics complain about. This is a false equivalency that reminds me of this cartoon:

hqdefault
I dare ya to step over this line!

The ultrafaints are a handful of the least-well measured galaxies on the RAR. Before we obsess about these, it is necessary to provide a satisfactory explanation for the more numerous, much better measured galaxies that establish the RAR in the first place. MOND does this. ΛCDM does not. Holding one theory to account for the least reliable of measurements before holding another to account for everything up to that point is like, well, like the cartoon… I could put an NGC number to each of the lines Bugs draws in the sand.

Long dynamical times/less common mergers: Unlike ΛCDM, dynamical friction should be relatively ineffective in MOND. It lacks the large halos of dark matter that act as invisible catchers’ mitts to make galaxies stick and merge. Personally, I do not think this is a great test, because we are a long way from understanding dynamical friction in MOND.

Non-evolution with redshift: If the Baryonic Tully-Fisher relation and the RAR are indeed the consequence of MOND, then their form is fixed by the theory. Consequently, their slope shouldn’t evolve with time. Conceivably their normalization might (e.g., the value of a0 could in principle evolve). Some recent data for high redshift galaxies place constraints on such evolution, but reports on these data are greatly exaggerated.

These are just a few of the topics discussed at the workshop, and all of those are only a few of the issues that matter to the bigger picture. While the workshop was great in every respect, perhaps the best thing was that it got people from different fields/camps/perspectives talking. That is progress.

I am grateful for progress, but I must confess that to me it feels excruciatingly slow. Models of galaxy formation in the context of ΛCDM have made credible steps forward in addressing some of the phenomenological issues that concern me. Yet they still seem to me to be very far from where they need to be. In particular, there seems to be no engagement with the fundamental question I have posed here before, and that I posed at the beginning of the workshop: Why does MOND get any predictions right?

Solution Aversion

Solution Aversion

I have had the misfortune to encounter many terms for psychological dysfunction in many venues. Cognitive dissonance, confirmation bias, the Dunning-Kruger effect – I have witnessed them all, all too often, both in the context of science and elsewhere. Those of us who are trained as scientists are still human: though we fancy ourselves immune, we are still subject to the same cognitive foibles as everyone else. Generally our training only suffices us to get past the oft-repeated ones.

Solution aversion is the knee-jerk reaction we have to deny the legitimacy of a problem when we don’t like the solution admitting said problem would entail. An obvious example in the modern era is climate change. People who deny the existence of this problem are usually averse to its solution.

Let me give an example from my own experience. To give some context requires some circuitous story-telling. We’ll start with climate change, but eventually get to cosmology.

Recently I encountered a lot of yakking on social media about an encounter between Bill Nye (the science guy) and Will Happer in a dispute about climate change. The basic gist of most of the posts was that of people (mostly scientists, mostly young enough to have watched Bill Nye growing up) cheering on Nye as he “eviscerated” Happer’s denialism. I did not watch any of the exchange, so I cannot evaluate the relative merits of their arguments. However, there is a more important issue at stake here: credibility.

Bill Nye has done wonderful work promoting science. Younger scientists often seem to revere him as a sort of Mr. Rogers of science. Which is great. But he is a science-themed entertainer, not an actual scientist. His show demonstrates basic, well known phenomena at a really, well, juvenile level. That’s a good thing – it clearly helped motivate a lot of talented people to become scientists. But recapitulating well-known results is very different from doing the cutting edge science that establishes new results that will become the fodder of future textbooks.

Will Happer is a serious scientist. He has made numerous fundamental contributions to physics. For example, he pointed out that the sodium layer in the upper atmosphere could be excited by a laser to create artificial guide stars for adaptive optics, enabling ground-based telescopes to achieve resolutions comparable to that of the Hubble space telescope. I suspect his work for the JASON advisory group led to the implementation of adaptive optics on Air Force telescopes long before us astronomers were doing it. (This is speculation on my part: I wouldn’t know; it’s classified.)

My point is that, contrary to the wishful thinking on social media, Nye has no more standing to debate Happer than Mickey Mouse has to debate Einstein. Nye, like Mickey Mouse, is an entertainer. Einstein is a scientist. If you think that comparison is extreme, that’s because there aren’t many famous scientists whose name I can expect everyone to know. A better analogy might be comparing Jon Hirschtick (a successful mechanical engineer, Nye’s field) to I.I. Rabi (a prominent atomic physicist like Happer), but you’re less likely to know who those people are. Most serious scientists do not cultivate public fame, and the modern examples I can think of all gave up doing real science for the limelight of their roles as science entertainers.

Another important contribution Happer made was to the study and technology of spin polarized nuclei. If you place an alkali element and a noble gas together in vapor, they may form weak van der Waals molecules. An alkali is basically a noble gas with a spare electron, so the two can become loosely bound, sharing the unwanted electron between them. It turns out – as Happer found and explained – that the wavefunction of the spare electron overlaps with the nucleus of the noble. By spin polarizing the electron through the well known process of optical pumping with a laser, it is possible to transfer the spin polarization to the nucleus. In this way, one can create large quantities of polarized nuclei, an amazing feat. This has found use in medical imaging technology. Noble gases are chemically inert, so safe to inhale. By doing so, one can light up lung tissue that is otherwise invisible to MRI and other imaging technologies.

I know this because I worked on it with Happer in the mid-80s. I was a first year graduate student in physics at Princeton where he was a professor. I did not appreciate the importance of what we were doing at the time. Will was a nice guy, but he was also my boss and though I respected him I did not much like him. I was a high-strung, highly stressed, 21 year old graduate student displaced from friends and familiar settings, so he may not have liked me much, or simply despaired of me amounting to anything. Mostly I blame the toxic arrogance of the physics department we were both in – Princeton is very much the Slytherin of science schools.

In this environment, there weren’t many opportunities for unguarded conversations. I do vividly recall some of the few that happened. In one instance, we had heard a talk about the potential for industrial activity to add enough carbon dioxide to the atmosphere to cause an imbalance in the climate. This was 1986, and it was the first I had heard of what is now commonly referred to as climate change. I was skeptical, and asked Will’s opinion. I was surprised by the sudden vehemence of his reaction:

“We can’t turn off the wheels of industry, and go back to living like cavemen.”

I hadn’t suggested any such thing. I don’t even recall expressing support for the speaker’s contention. In retrospect, this is a crystal clear example of solution aversion in action. Will is a brilliant guy. He leapt ahead of the problem at hand to see the solution being a future he did not want. Rejecting that unacceptable solution became intimately tied, psychologically, to the problem itself. This attitude has persisted to the present day, and Happer is now known as one of the most prominent scientists who is also a climate change denier.

Being brilliant never makes us foolproof against being wrong. If anything, it sets us up for making mistakes of enormous magnitude.

There is a difference between the problem and the solution. Before we debate the solution, we must first agree on the problem. That should, ideally, be done dispassionately and without reference to the solutions that might stem from it. Only after we agree on the problem can we hope to find a fitting solution.

In the case of climate change, it might be that we decide the problem is not so large as to require drastic action. Or we might hope that we can gradually wean ourselves away from fossil fuels. That is easier said than done, as many people do not seem to appreciate the magnitude of the energy budget what needs replacing. But does that mean we shouldn’t even try? That seems to be the psychological result of solution aversion.

Either way, we have to agree and accept that there is a problem before we can legitimately decide what to do about it. Which brings me back to cosmology. I did promise you a circuitous bit of story-telling.

Happer’s is just the first example I encountered of a brilliant person coming to a dubious conclusion because of solution aversion. I have had many colleagues who work on cosmology and galaxy formation say straight out to me that they would only consider MOND “as a last resort.” This is a glaring, if understandable, example of solution aversion. We don’t like MOND, so we’re only willing to consider it when all other options have failed.

I hope it is obvious from the above that this attitude is not a healthy one in science. In cosmology, it is doubly bad. Just when, exactly, do we reach the last resort?

We’ve already accepted that the universe is full of dark matter, some invisible form of mass that interacts gravitationally but not otherwise, has no place in the ridiculously well tested Standard Model of particle physics, and has yet to leave a single shred of credible evidence in dozens of super-sensitive laboratory experiments. On top of that, we’ve accepted that there is also a distinct dark energy that acts like antigravity to drive the apparent acceleration of the expansion rate of the universe, conserving energy by the magic trick of a sign error in the equation of state that any earlier generation of physicists would have immediately rejected as obviously unphysical. In accepting these dark denizens of cosmology we have granted ourselves essentially infinite freedom to fine-tune any solution that strikes our fancy. Just what could possibly constitute the last resort of that?

hammerandnails
When you have a supercomputer, every problem looks like a simulation in need of more parameters.

Being a brilliant scientist never precludes one from being wrong. At best, it lengthens the odds. All too often, it leads to a dangerous hubris: we’re so convinced by, and enamored of, our elaborate and beautiful theories that we see only the successes and turn a blind eye to the failures, or in true partisan fashion, try to paint them as successes. We can’t have a sensible discussion about what might be right until we’re willing to admit – seriously, deep-down-in-our-souls admit – that maybe ΛCDM is wrong.

I fear the field has gone beyond that, and is fissioning into multiple, distinct branches of science that use the same words to mean different things. Already “dark matter” means something different to particle physicists and astronomers, though they don’t usually realize it. Soon our languages may become unrecognizable dialects to one another; already communication across disciplinary boundaries is strained. I think Kuhn noted something about different scientists not recognizing what other scientists were doing as science, nor regarding the same evidence in the same way. Certainly we’ve got that far already, as successful predictions of the “other” theory are dismissed as so much fake news in a world unhinged from reality.

Degenerating problemshift: a wedged paradigm in great tightness

Degenerating problemshift: a wedged paradigm in great tightness

Reading Merritt’s paper on the philosophy of cosmology, I was struck by a particular quote from Lakatos:

A research programme is said to be progressing as long as its theoretical growth anticipates its empirical growth, that is as long as it keeps predicting novel facts with some success (“progressive problemshift”); it is stagnating if its theoretical growth lags behind its empirical growth, that is as long as it gives only post-hoc explanations either of chance discoveries or of facts anticipated by, and discovered in, a rival programme (“degenerating problemshift”) (Lakatos, 1971, pp. 104–105).

The recent history of modern cosmology is rife with post-hoc explanations of unanticipated facts. The cusp-core problem and the missing satellites problem are prominent examples. These are explained after the fact by invoking feedback, a vague catch-all that many people agree solves these problems even though none of them agree on how it actually works.

FeedbackCartoonSilkMamon
Cartoon of the feedback explanation for the difference between the galaxy luminosity function (blue line) and the halo mass function (red line). From Silk & Mamon (2012).

There are plenty of other problems. To name just a few: satellite planes (unanticipated correlations in phase space), the emptiness of voids, and the early formation of structure  (see section 4 of Famaey & McGaugh for a longer list and section 6 of Silk & Mamon for a positive spin on our list). Each problem is dealt with in a piecemeal fashion, often by invoking solutions that contradict each other while buggering the principle of parsimony.

It goes like this. A new observation is made that does not align with the concordance cosmology. Hands are wrung. Debate is had. Serious concern is expressed. A solution is put forward. Sometimes it is reasonable, sometimes it is not. In either case it is rapidly accepted so long as it saves the paradigm and prevents the need for serious thought. (“Oh, feedback does that.”) The observation is no longer considered a problem through familiarity and exhaustion of patience with the debate, regardless of how [un]satisfactory the proffered solution is. The details of the solution are generally forgotten (if ever learned). When the next problem appears the process repeats, with the new solution often contradicting the now-forgotten solution to the previous problem.

This has been going on for so long that many junior scientists now seem to think this is how science is suppose to work. It is all they’ve experienced. And despite our claims to be interested in fundamental issues, most of us are impatient with re-examining issues that were thought to be settled. All it takes is one bold assertion that everything is OK, and the problem is perceived to be solved whether it actually is or not.

8631e895433bc3d1fa87e3d857fc7500
“Is there any more?”

That is the process we apply to little problems. The Big Problems remain the post hoc elements of dark matter and dark energy. These are things we made up to explain unanticipated phenomena. That we need to invoke them immediately casts the paradigm into what Lakatos called degenerating problemshift. Once we’re there, it is hard to see how to get out, given our propensity to overindulge in the honey that is the infinity of free parameters in dark matter models.

Note that there is another aspect to what Lakatos said about facts anticipated by, and discovered in, a rival programme. Two examples spring immediately to mind: the Baryonic Tully-Fisher Relation and the Radial Acceleration Relation. These are predictions of MOND that were unanticipated in the conventional dark matter picture. Perhaps we can come up with post hoc explanations for them, but that is exactly what Lakatos would describe as degenerating problemshift. The rival programme beat us to it.

In my experience, this is a good description of what is going on. The field of dark matter has stagnated. Experimenters look harder and harder for the same thing, repeating the same experiments in hope of a different result. Theorists turn knobs on elaborate models, gifting themselves new free parameters every time they get stuck.

On the flip side, MOND keeps predicting novel facts with some success, so it remains in the stage of progressive problemshift. Unfortunately, MOND remains incomplete as a theory, and doesn’t address many basic issues in cosmology. This is a different kind of unsatisfactory.

In the mean time, I’m still waiting to hear a satisfactory answer to the question I’ve been posing for over two decades now. Why does MOND get any predictions right? It has had many a priori predictions come true. Why does this happen? It shouldn’t. Ever.

Cepheids & Gaia: No Systematic in the Hubble Constant

Cepheids & Gaia: No Systematic in the Hubble Constant

Casertano et al. have used Gaia to provide a small but important update in the debate over the value of the Hubble Constant. The ESA Gaia mission is measuring parallaxes for billions of stars. This is fundamental data that will advance astronomy in many ways, no doubt settling long standing problems but also raising new ones – or complicating existing ones.

Traditional measurements of the H0 are built on the distance scale ladder, in which distances to nearby objects are used to bootstrap outwards to more distant ones. This works, but is also an invitation to the propagation of error. A mistake in the first step affects all others. This is a long-standing problem that informs the assumption that the tension between H0 = 67 km/s/Mpc from Planck and H0 = 73 km/s/Mpc from local measurements will be resolved by some systematic error – presumably in the calibration of the distance ladder.

Well, not so far. Gaia has now measured enough Cepheids in our own Milky Way to test the calibration used to measure the distances of external galaxies via Cepheids. This was one of the shaky steps where things seemed most likely to go off. But no – the scales are consistent at the 0.3% level. For now, direct measurement of the expansion rate remains H0 = 73 km/s/Mpc.

Critical Examination of the Impossible

Critical Examination of the Impossible

It has been proposal season for the Hubble Space Telescope, so many astronomers have been busy with that. I am no exception. Talking to others, it is clear that there remain many more excellent Hubble projects than available observing time.

So I haven’t written here for a bit, and I have other tasks to get on with. I did get requests for a report on the last conference I went to, Beyond WIMPs: from Theory to Detection. They have posted video from the talks, so anyone who is interested may watch.

I think this is the worst talk I’ve given in 20 years. Maybe more. Made the classic mistake of trying to give the talk the organizers asked for rather than the one I wanted to give. Conference organizers mean well, but they usually only have a vague idea of what they imagine you’ll say. You should always ignore that and say what you think is important.

When speaking or writing, there are three rules: audience, audience, audience. I was unclear what the audience would be when I wrote the talk, and it turns out there were at least four identifiably distinct audiences in attendance. There were skeptics – particle physicists who were concerned with the state of their field and that of cosmology, there were the faithful – particle physicists who were not in the least concerned about this state of affairs, there were the innocent – grad students with little to no background in astronomy, and there were experts – astroparticle physicists who have a deep but rather narrow knowledge of relevant astronomical data. I don’t think it would have been possible to address the assigned topic (a “Critical Examination of the Existence of Dark Matter“) in a way that satisfied all of these distinct audiences, and certainly not in the time allotted (or even in an entire semester).

It is tempting to give an interruption by interruption breakdown of the sociology, but you may judge that for yourselves. The one thing I got right was what I said at the outset: Attitude Matters. You can see that on display throughout.

IMG_5460
This comic has been hanging on a colleague’s door for decades.

In science as in all matters, if you come to a problem sure that you already know the answer, you will leave with that conviction. No data nor argument will shake your faith. Only you can open your own mind.