It is not linear

It is not linear

I just got back from a visit to the Carnegie Institution of Washington where I gave a talk and saw some old friends. I was a postdoc at the Department of Terrestrial Magnetism (DTM) in the ’90s. DTM is so-named because in their early days they literally traveled the world mapping the magnetic field. When I was there, DTM+ had a small extragalactic astronomy group including Vera Rubin*, Francois Schweizer, and John Graham. Working there as a Carnegie Fellow gave me great latitude to pursue whatever science I wanted, with the benefit of discussions with these great astronomers. After my initial work on low surface brightness galaxies had brought MOND to my attention, much of the follow-up work checking all (and I do mean all) the other constraints was done there, ultimately resulting in the triptych of papers showing that the bulk of the evidence available at that time favored MOND over the dark matter interpretation.

When I joined the faculty at the University of Maryland in 1998, I saw the need to develop a graduate course on cosmology, which did not exist there at that time. I began to consider how cosmic structure might form in MOND, but was taken aback when Simon White asked me to referee a paper on the subject by Bob Sanders. He had found much what I was finding, that there was no way to avoid an early burst of speedy galaxy formation. I had been scooped!

It has taken a quarter century to test our predictions, so any concern about who said what first seems silly now. Indeed, the bigger problem is informing people that these predictions were made at all. I had a huge eye roll last month when Physics Magazine came out with

February 12, 2024
NEWS FEATURE
JWST Sees More Galaxies than Expected
February 9, 2024

The new JWST observatory is revealing far more bright galaxies in the early Universe than anyone predicted, and astrophysicists have more than one explanation for the puzzle.

Physics Magazine

Far more bright galaxies in the early Universe than anyone predicted! Who could have predicted it? I guess I am anyone.

Joking aside, this is a great illustration of the inefficiency of scientific communication. I wrote a series of papers on the subject. I wasn’t alone; so did others. I gave talks about it. I’ve emphasized it in scientific reviews. My papers are frequently cited, ranking in the top 2% among the top 2% across all sciences. They’re cited by prominent cosmologists. Heck, I’ve even blogged about it. And yet, it comes as such a surprise that it couldn’t have possibly happened, to the extent that no one bothers to check what is in the literature. (There was a similar sociology around the prediction of the CMB second peak. It didn’t happen if we don’t look.)

So what did the Physics Magazine article talk about? More than one explanation, most of which are the conventionalist approaches we’ve talked about before – make star formation more efficient, or adjust the IMF (the mass spectrum with which stars form) to squeeze more UV photons out of fewer baryons. But there is also a paper by Sabti et al. that basically asserts “this can’t be happening!” which is exactly the point.

Sabti et al. ask whether the can boost the amplitude of structure formation in a way that satisfies both the new JWST observations and previous Hubble data. The answer is no:

We consider beyond-ΛCDM power-spectrum enhancements and show that any departure large enough to reproduce the abundance of ultramassive JWST candidates is in conflict with the HST data.

Sabti et al.

At first, this struck me as some form of reality denial, like an assertion that the luminosity density could not possible exceed LCDM predictions, even though that is exactly what it is observed to do:

The integrated UV luminosity density as a function of redshift from Adams et al. (2023). The data exceed the expectation for z > 10, even with the goal posts in motion.

On a closer read, I realized my initial impression was wrong; they are making a much better argument. The star formation rate is what is really constrained by the UV luminosity, but if that is attributed to stellar mass, you can’t get there from here – even with some jiggering of structure formation. That appears to be correct, within the framework of their considerations. Yet an alteration of structure formation is exactly what led to the now-corroborated prediction of Sanders (1998), so something still seemed odd. Just how were they altering it?

It took a close read, but the issue is in their equation 3. They allow for more structure formation by increasing the amplitude. However, they maintain the usual linear growth rate. In effect, they boost the amplitude of the linear dashed line in the left panel below, while maintaining its shape:

The growth rate of structure in CDM (linear, at left) and MOND (nonlinear, at right).

This is strongly constrain at both higher and lower redshifts, so only a little boost in amplitude is possible, assuming linear growth. So what they’ve correctly shown is that the usual linear growth rate of LCDM cannot do what needs to be done. That just emphasizes my point: to get the rapid growth we observe in the narrow time range available above redshift ten, the rate of growth needs to be nonlinear.

It’s not linear from Star Trek DS9.

Nonlinearity is unavoidable in MOND – hence the prediction of big galaxies at high redshift. Nonlinearity is a bear to calculate, which is part of the reason nobody wants to go there. Tough nougies. They teach us in grad school that the early universe is simple. It is a mantra to many who work in the field. I’m sorry, did God promise this? I understand the reasons why the early universe should be simple in standard FLRW cosmology, but what if the universe we live in isn’t that? No one has standing to promise that the early universe is as simple as expected. That’s just a fairy tale cosmologists tell their young so they can sleep at night.


+DTM has since been merged with the Geophysical Laboratory to become the Earth and Planets Laboratory. These departments shared the Broad Branch Road campus but maintained a friendly rivalry in the soccer Mud Cup, so named because the first Mud Cup was played on a field that was a such a quagmire that we all became completely covered in mud. It was great fun.

*Vera was always adamant that she was not a physicist, and yet a search returns the thumbnail

even though the Wikipedia article itself does not (at present) make this spurious “and physicist” assertion.

The evolution of the luminosity density

The evolution of the luminosity density

The results from the high redshift universe keep pouring in from JWST. It is a full time job, and then some, just to keep track. One intriguing aspect is the luminosity density of the universe at z > 10. I had not thought this to be problematic for LCDM, as it only depends on the overall number density of stars, not whether they’re in big or small galaxies. I checked this a couple of years ago, and it was fine. At that point we were limited to z < 10, so what about higher redshift?

It helps to have in mind the contrasting predictions of distinct hypotheses, so a quick reminder. LCDM predicts a gradual build up of the dark matter halo mass function that should presumably be tracked by the galaxies within these halos. MOND predicts that galaxies of a wide range of masses form abruptly, including the biggest ones. The big distinction I’ve focused on is the formation epoch of the most massive galaxies. These take a long time to build up in LCDM, which typically takes half a Hubble time (~7 billion years; z < 1) for a giant elliptical to build up half its final stellar mass. Baryonic mass assembly is considerably more rapid in MOND, so this benchmark can be attained much earlier, even within the first billion years after the Big Bang (z > 5).

In both theories, astrophysics plays a role. How does gas condense into galaxies, and then form into stars? Gravity just tells us when we can assemble the mass, not how it becomes luminous. So the critical question is whether the high redshift galaxies JWST sees are indeed massive. They’re much brighter than had been predicted by LCDM, and in-line with the simplest models evolutionary models one can build in MOND, so the latter is the more natural interpretation. However, it is much harder to predict how many galaxies form in MOND; it is straightforward to show that they should form fast but much harder to figure out how many do so – i.e., how many baryons get incorporated into collapsed objects, and how many get left behind, stranded in the intergalactic medium? Consequently, the luminosity density – the total number of stars, regardless of what size galaxies they’re in – did not seem like a straight-up test the way the masses of individual galaxies is.

It is not difficult to produce lots of stars at high redshift in LCDM. But those stars should be in many protogalactic fragments, not individually massive galaxies. As a reminder, here is the merger tree for a galaxy that becomes a bright elliptical at low redshift:

Merger tree from De Lucia & Blaizot 2007 showing the hierarchical build-up of massive galaxies from many protogalactic fragments.

At large lookback times, i.e., high redshift, galaxies are small protogalactic fragments that have not yet assembled into a large island universe. This happens much faster in MOND, so we expect that for many (not necessarily all) galaxies, this process is basically complete after a mere billion years or so, often less. In both theories, your mileage will vary: each galaxy will have its own unique formation history. Nevertheless, that’s the basic difference: big galaxies form quickly in MOND while they should still be little chunks at high z in LCDM.

The hierarchical formation of structure is a fundamental prediction of LCDM, so this is in principle a place it can break. That is why many people are following the usual script of blaming astrophysics, i.e., how stars form, not how mass assembles. The latter is fundamental while the former is fungible.

Gradual mass assembly is so fundamental that its failure would break LCDM. Indeed, it is so deeply embedded in the mental framework of people working on it that it doesn’t seem to occur to most of them to consider the possibility that it could work any other way. It simply has to work that way; we were taught so in grad school!

Here is a sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

A principle result in perturbation theory applied to density fluctuations in an expanding universe governed by General Relativity is that the growth rate of these proto-objects is proportional to the expansion rate of the universe – hence the linear long-dashed line in the left diagram. The baryons cannot match the observations by themselves because the universe has “only” expanded by a factor of a thousand since recombination while structure has grown by a factor of a hundred thousand. This was one of the primary motivations for inventing cold dark matter in the first place: it can grow at the theory-specified rate without obliterating the observed isotropy% of the microwave background. The skeletal structure of the cosmic web grows in cold dark matter first; the baryons fall in afterwards (short-dashed line in left panel).

That’s how it works. Without dark matter, structure cannot form, so we needn’t consider MOND nor speak of it ever again forever and ever, amen.

Except, of course, that isn’t necessarily how structure formation works in MOND. Like every other inference of dark matter, the slow growth of perturbations assumes that gravity is normal. If we consider a different force law, then we have to revisit this basic result. Exactly how structure formation works in MOND is not a settled subject, but the panel at right illustrates how I think it might work. One seemingly unavoidable aspect is that MOND is nonlinear, so the growth rate becomes nonlinear at some point, which is rather early on if Milgrom’s constant a0 does not evolve. Rather than needing dark matter to achieve a growth factory of 105, the boost to the force law enables baryons do it on their own. That, in a nutshell, is why MOND predicts the early formation of big galaxies.

The same nonlinearity that makes structure grow fast in MOND also makes it very hard to predict the mass function. My nominal expectation is that the present-day galaxy baryonic mass function is established early and galaxies mostly evolve as closed boxes after that. Not exclusively; mergers still occasionally happen, as might continued gas accretion. In addition to the big galaxies that form their stars rapidly and eventually become giant elliptical galaxies, there will also be a population for which gas accretion is gradual^ enough to settle into a preferred plane and evolve into a spiral galaxy. But that is all gas physics and hand waving; for the mass function I simply don’t know how to extract a prediction from a nonlinear version of the Press-Schechter formalism. Somebody smarter than me should try that.

We do know how to do it for LCDM, at least for the dark matter halos, so there is a testable prediction there. The observable test depends on the messy astrophysics of forming stars and the shape of the mass function. The total luminosity density integrates over the shape, so is a rather forgiving test, as it doesn’t distinguish between stars in lots of tiny galaxies or the same number in a few big ones. Consequently, I hadn’t put much stock in it. But it is also a more robustly measured quantity, so perhaps it is more interesting than I gave it credit for, at least once we get to such high redshift that there should be hardly any stars.

Here is a plot of the ultraviolet (UV) luminosity density from Adams et al. (2023):

Fig. 8 from Adams et al. (2023) showing the integrated UV luminosity density as a function of redshift. UV light is produced by short-lived, massive stars, so makes a good proxy for the star formation rate (right axis).

The lower line is one+ a priori prediction of LCDM. I checked this back when JWST was launched, and saw no issues up to z=10, which remains true. However, the data now available at higher redshift are systematically higher than the prediction. The reason for this is simple, and the same as we’ve discussed before: dark matter halos are just beginning to get big; they don’t have enough baryons in them to make that many stars – at least not for the usual assumptions, or even just from extrapolating what we know quasi-empirically. (I say “quasi” because the extrapolation requires a theory-dependent rate of mass growth.)

The dashed line is what I consider to be a reasonable adjustment of the a priori prediction. Putting on an LCDM hat, it is actually closer to what I would have predicted myself because it has a constant star formation efficiency which is one of the knobs I prefer to fix empirically and then not touch. With that, everything is good up to z=10.5, maybe even to z=12 if we only believe* the data with uncertainties. But the bulk of the high redshift data sit well above the plausible expectation of LCDM, so grasping at the dangling ends of the biggest error bars seems unlikely to save us from a fall.

Ignoring the model lines, the data flatten out at z > 10, which is another way of saying that the UV luminosity function isn’t evolving when it should be. This redshift range does not correspond to much cosmic time, only a few hundred million years, so it makes the empiricist in me uncomfortable to invoke astrophysical causes. We have to imagine that the physical conditions change rapidly in the first sliver of cosmic time at just the right fine-tuned rate to make it look like there is no evolution at all, then settle down into a star formation efficiency that remains constant in perpetuity thereafter.

Harikane et al. (2023) also come to the conclusion that there is too much star formation going on at high redshift (their Fig. 18 is like that of Adams above, but extending all the way to z=0). Like many, they appear to be unaware that the early onset of structure formation had been predicted, so discuss three conventional astrophysical solutions as if these were the only possibilities. Translating from their section 6, the astrophysical options are:

  • Star formation was more efficient early on
  • Active Galactic Nuclei (AGN)
  • A top heavy IMF

This is a pretty broad view of the things that are being considered currently, though I’m sure people will add to this list as time goes forward and entropy increases.

Taking these in reverse order, the idea of a top heavy IMF is that preferentially more massive stars form early on. These produce more light per unit mass, so one gets brighter galaxies than predicted with a normal IMF. This is an idea that recurs every so often; see, e.g., section 3.1.1 of McGaugh (2004) where I discuss it in the related context of trying to get LCDM models to reionize the universe early enough. Supermassive Population III stars were all the rage back then. Changing the mass spectrum& with which stars form is one of those uber-free parameters that good modelers refrain from twiddling because it gives too much freedom. It is not a single knob so much as a Pandora’s box full of knobs that invoke a thousand Salpeter’s demons to do nearly anything at the price of understanding nothing.

As it happens, the option of a grossly variable IMF is already disfavored by the existence of quenched galaxies at z~3 that formed a normal stellar population at much higher redshift (z~11). These galaxies are composed of stars that have the spectral signatures appropriate for a population that formed with a normal IMF and evolved as stars do. This is exactly what we expect for galaxies that form early and evolve passively. Adjusting the IMF to explain the obvious makes a mockery of Occam’s razor.

AGN is a catchall term for objects like quasars that are powered by supermassive black holes at the centers of galaxies. This is a light source that is non-stellar, so we’ll overestimate the stellar mass if we mistake some light from AGN# as being from stars. In addition, we know that AGN were more prolific in the early universe. That in itself is also a problem: just as forming galaxies early is hard, so too is it hard to form enough supermassive black holes that early. So this just becomes the same problem in a different guise. Besides, the resolution of JWST is good enough to see where the light is coming from, and it ain’t all from unresolved AGN. Harikane et al. estimate that the AGN contribution is only ~10%.

That leaves the star formation efficiency, which is certainly another knob to twiddle. On the one hand, this is a reasonable thing to do, since we don’t really know what the star formation efficiency in the early universe was. On the other, we expected the opposite: star formation should, if anything, be less efficient at high redshift when the metallicity was low so there were few ways for gas to cool, which is widely considered to be a prerequisite for initiating star formation. Indeed, inefficient cooling was an argument in favor of a top-heavy IMF (perhaps stars need to be more massive to overcome higher temperatures in the gas from which they form), so these two possibilities contradict one another: we can have one but not both.

To me, the star formation efficiency is the most obvious knob to twiddle, but it has to be rather fine-tuned. There isn’t much cosmic time over which the variation must occur, and yet it has to change rapidly and in such a way as to precisely balance the non-evolving UV luminosity function against a rapidly evolving dark matter halo mass function. Once again, we’re in the position of having to invoke astrophysics that we don’t understand to make up for a manifest deficit the behavior of dark matter. Funny how those messy baryons always cover up for that clean, pure, simple dark matter.

I could go on about these possibilities at great length (and did in the 2004 paper cited above). I decline to do so any farther: we keep digging this hole just to fill it again. These ideas only seem reasonable as knobs to turn if one doesn’t see any other way out, which is what happens if one has absolute faith in structure formation theory and is blissfully unaware of the predictions of MOND. So I can already see the community tromping down the familiar path of persuading ourselves that the unreasonable is reasonable, that what was not predicted is what we should have expected all along, that everything is fine with cosmology when it is anything but. We’ve done it so many times before.


Initially I had the cat stuffed back in the bag image here, but that was really for a theoretical paper that I didn’t quite make it to in this post. You’ll see it again soon. The observations discussed here are by observers doing their best in the context they know, so it doesn’t seem appropriate to that.


%We were convinced of the need for non-baryonic dark matter before any fluctuations in the microwave background were detected; their absence at the level of one part in a thousand sufficed.

^The assembly of baryonic mass can and in most cases should be rapid. It is the settling of gas into a rotationally supported structure that takes time – this is influenced by gas physics, not just gravity. Regardless of gravity theory, gas needs to settle gently into a rotating disk in order for spiral galaxies to exist.

+There are other predictions that differ in detail, but this is a reasonable representative of the basic expectation.

*This is not necessarily unreasonable, as there is some proclivity to underestimate the uncertainties. That’s a general statement about the field; I have made no attempt to assess how reasonable these particular error bars are.

&Top-heavy refers to there being more than the usual complement of bright but short-lived (tens of millions of years) stars. These stars are individually high mass (bigger than the sun), while long-lived stars are low mass. Though individually low in mass, these faint stars are very numerous. When one integrates over the population, one finds that most of the total stellar mass resides in the faint, low mass stars while much of the light is produced by the high mass stars. So a top heavy IMF explains high redshift galaxies by making them out of the brightest stars that require little mass to build. However, these stars will explode and go away on a short time scale, leaving little behind. If we don’t outright truncate the mass function (so many knobs here!), there could be some longer-lived stars leftover, but they must be few enough for the whole galaxy to fade to invisibility or we haven’t gained anything. So it is surprising, from this perspective, to see massive galaxies that appear to have evolved normally without any of these knobs getting twiddled.

#Excess AGN were one possibility Jay Franck considered in his thesis as the explanation for what we then considered to be hyperluminous galaxies, but the known luminosity function of AGN up to z = 4 couldn’t explain the entire excess. With the clarity of hindsight, we were just seeing the same sorts of bright, early galaxies that JWST has brought into sharper focus.

Clusters of galaxies ruin everything

Clusters of galaxies ruin everything

A common refrain I hear is that MOND works well in galaxies, but not in clusters of galaxies. The oft-unspoken but absolutely intended implication is that we can therefore dismiss MOND and never speak of it again. That’s silly.

Even if MOND is wrong, that it works as well as it does is surely telling us something. I would like to know why that is. Perhaps it has something to do with the nature of dark matter, but we need to engage with it to make sense of it. We will never make progress if we ignore it.

Like the seventeenth century cleric Paul Gerhardt, I’m a stickler for intellectual honesty:

“When a man lies, he murders some part of the world.”

Paul Gerhardt

I would extend this to ignoring facts. One should not only be truthful, but also as complete as possible. It does not suffice to be truthful about things that support a particular position while eliding unpleasant or unpopular facts* that point in another direction. By ignoring the successes of MOND, we murder a part of the world.

Clusters of galaxies are problematic in different ways for different paradigms. Here I’ll recap three ways in which they point in different directions.

1. Cluster baryon fractions

An unpleasant fact for MOND is that it does not suffice to explain the mass discrepancy in clusters of galaxies. When we apply Milgrom’s formula to galaxies, it explains the discrepancy that is conventionally attributed to dark matter. When we apply MOND clusters, it comes up short. This has been known for a long time; here is a figure from the review Sanders & McGaugh (2002):

Figure 10 from Sanders & McGaugh (2002): (Left) the Newtonian dynamical mass of clusters of galaxies within an observed cutoff radius (rout) vs. the total observable mass in 93 X-ray-emitting clusters of galaxies (White et al. 1997). The solid line corresponds to Mdyn = Mobs (no discrepancy). (Right) the MOND dynamical mass within rout vs. the total observable mass for the same X-ray-emitting clusters. From Sanders (1999).

The Newtonian dynamical mass exceeds what is seen in baryons (left). There is a missing mass problem in clusters. The inference is that the difference is made up by dark matter – presumably the same non-baryonic cold dark matter that we need in cosmology.

When we apply MOND, the data do not fall on the line of equality as they should (right panel). There is still excess mass. MOND suffers a missing baryon problem in clusters.

The common line of reasoning is that MOND still needs dark matter in clusters, so why consider it further? The whole point of MOND is to do away with the need of dark matter, so it is terrible if we need both! Why not just have dark matter?

This attitude was reinforced by the discovery of the Bullet Cluster. You can “see” the dark matter.

An artistic rendition of data for the Bullet Cluster. Pink represents hot X-ray emitting gas, blue the mass concentration inferred through gravitational lensing, and the optical image shows many galaxies. There are two clumps of galaxies that collided and passed through one another, getting ahead of the gas which shocked on impact and lags behind as a result. The gas of the smaller “bullet” subcluster shows a distinctive shock wave.

Of course, we can’t really see the dark matter. What we see is that the mass required by gravitational lensing observations exceeds what we see in normal matter: this is the same discrepancy that Zwicky first noticed in the 1930s. The important thing about the Bullet Cluster is that the mass is associated with the location of the galaxies, not with the gas.

The baryons that we know about in clusters are mostly in the gas, which outweighs the stars by roughly an order of magnitude. So we might expect, in a modified gravity theory like MOND, that the lensing signal would peak up on the gas, not the stars. That would be true, if the gas we see were indeed the majority of the baryons. We already knew from the first plot above that this is not the case.

I use the term missing baryons above intentionally. If one already believes in dark matter, then it is perfectly reasonable to infer that the unseen mass in clusters is the non-baryonic cold dark matter. But there is nothing about the data for clusters that requires this. There is also no reason to expect every baryon to be detected. So the unseen mass in clusters could just be ordinary matter that does not happen to be in a form we can readily detect.

I do not like the missing baryon hypothesis for clusters in MOND. I struggle to imagine how we could hide the required amount of baryonic mass, which is comparable to or exceeds the gas mass. But we know from the first figure that such a component is indicated. Indeed, the Bullet Cluster falls at the top end of the plots above, being one of the most massive objects known. From that perspective, it is perfectly ordinary: it shows the same discrepancy every other cluster shows. So the discovery of the Bullet was neither here nor there to me; it was just another example of the same problem. Indeed, it would have been weird if it hadn’t shown the same discrepancy that every other cluster showed. That it does so in a nifty visual is, well, nifty, but so what? I’m more concerned that the entire population of clusters shows a discrepancy than that this one nifty case does so.

The one new thing that the Bullet Cluster did teach us is that whatever the missing mass is, it is collisionless. The gas shocked when it collided, and lags behind the galaxies. Whatever the unseen mass is, is passed through unscathed, just like the galaxies. Anything with mass separated by lots of space will do that: stars, galaxies, cold dark matter particles, hard-to-see baryonic objects like brown dwarfs or black holes, or even massive [potentially sterile] neutrinos. All of those are logical possibilities, though none of them make a heck of a lot of sense.

As much as I dislike the possibility of unseen baryons, it is important to keep the history of the subject in mind. When Zwicky discovered the need for dark matter in clusters, the discrepancy was huge: a factor of a thousand. Some of that was due to having the distance scale wrong, but most of it was due to seeing only stars. It wasn’t until 40 some years later that we started to recognize that there was intracluster gas, and that it outweighed the stars. So for a long time, the mass ratio of dark to luminous mass was around 70:1 (using a modern distance scale), and we didn’t worry much about the absurd size of this number; mostly we just cited it as evidence that there had to be something massive and non-baryonic out there.

Really there were two missing mass problems in clusters: a baryonic missing mass problem, and a dynamical missing mass problem. Most of the baryons turned out to be in the form of intracluster gas, not stars. So the 70:1 ratio changed to 7:1. That’s a big change! It brings the ratio down from a silly number to something that is temptingly close to the universal baryon fraction of cosmology. Consequently, it becomes reasonable to believe that clusters are fair samples of the universe. All the baryons have been detected, and the remaining discrepancy is entirely due to non-baryonic cold dark matter.

That’s a relatively recent realization. For decades, we didn’t recognize that most of the normal matter in clusters was in an as-yet unseen form. There had been two distinct missing mass problems. Could it happen again? Have we really detected all the baryons, or are there still more lurking there to be discovered? I think it unlikely, but fifty years ago I would also have thought it unlikely that there would have been more mass in intracluster gas than in stars in galaxies. I was ten years old then, but it is clear from the literature that no one else was seriously worried about this at the time. Heck, when I first read Milgrom’s original paper on clusters, I thought he was engaging in wishful thinking to invoke the X-ray gas as possibly containing a lot of the mass. Turns out he was right; it just isn’t quite enough.

All that said, I nevertheless think the residual missing baryon problem MOND suffers in clusters is a serious one. I do not see a reasonable solution. Unfortunately, as I’ve discussed before, LCDM suffers an analogous missing baryon problem in galaxies, so pick your poison.

It is reasonable to imagine in LCDM that some of the missing baryons on galaxy scales are present in the form of warm/hot circum-galactic gas. We’ve been looking for that for a while, and have had some success – at least for bright galaxies where the discrepancy is modest. But the problem gets progressively worse for lower mass galaxies, so it is a bold presumption that the check-sum will work out. There is no indication (beyond faith) that it will, and the fact that it gets progressively worse for lower masses is a direct consequence of the data for galaxies looking like MOND rather than LCDM.

Consequently, both paradigms suffer a residual missing baryon problem. One is seen as fatal while the other is barely seen.

2. Cluster collision speeds

A novel thing the Bullet Cluster provides is a way to estimate the speed at which its subclusters collided. You can see the shock front in the X-ray gas in the picture above. The morphology of this feature is sensitive to the speed and other details of the collision. In order to reproduce it, the two subclusters had to collide head-on, in the plane of the sky (practically all the motion is transverse), and fast. I mean, really fast: nominally 4700 km/s. That is more than the virial speed of either cluster, and more than you would expect from dropping one object onto the other. How likely is this to happen?

There is now an enormous literature on this subject, which I won’t attempt to review. It was recognized early on that the high apparent collision speed was unlikely in LCDM. The chances of observing the bullet cluster even once in an LCDM universe range from merely unlikely (~10%) to completely absurd (< 3 x 10-9). Answers this varied follow from what aspects of both observation and theory are considered, and the annoying fact that the distribution of collision speed probabilities plummets like a stone so that slightly different estimates of the “true” collision speed make a big difference to the inferred probability. What the “true” gravitationally induced collision speed is is somewhat uncertain because the hydrodynamics of the gas plays a role in shaping the shock morphology. There is a long debate about this which bores me; it boils down to it being easy to explain a few hundred extra km/s but hard to get up to the extra 1000 km/s that is needed.

At its simplest, we can imagine the two subclusters forming in the early universe, initially expanding apart along with the Hubble flow like everything else. At some point, their mutual attraction overcomes the expansion, and the two start to fall together. How fast can they get going in the time allotted?

The Bullet Cluster is one of the most massive systems in the universe, so there is lots of dark mass to accelerate the subclusters towards each other. The object is less massive in MOND, even spotting it some unseen baryons, but the long-range force is stronger. Which effect wins?

Gary Angus wrote a code to address this simple question both conventionally and in MOND. Turns out, the longer range force wins this race. MOND is good at making things go fast. While the collision speed of the Bullet Cluster is problematic for LCDM, it is rather natural in MOND. Here is a comparison:

A reasonable answer falls out of MOND with no fuss and no muss. There is room for some hydrodynamical+ high jinx, but it isn’t needed, and the amount that is reasonable makes an already reasonable result more reasonable, boosting the collision speed from the edge of the observed band to pretty much smack in the middle. This is the sort of thing that keeps me puzzled: much as I’d like to go with the flow and just accept that it has to be dark matter that’s correct, it seems like every time there is a big surprise in LCDM, MOND just does it. Why? This must be telling us something.

3. Cluster formation times

Structure is predicted to form earlier in MOND than in LCDM. This is true for both galaxies and clusters of galaxies. In his thesis, Jay Franck found lots of candidate clusters at redshifts higher than expected. Even groups of clusters:

Figure 7 from Franck & McGaugh (2016). A group of four protocluster candidates at z = 3.5 that are proximate in space. The left panel is the sky association of the candidates, while the right panel shows their galaxy distribution along the LOS. The ellipses/boxes show the search volume boundaries (Rsearch = 20 cMpc, Δz ± 20 cMpc). Three of these (CCPC-z34-005, CCPC-z34-006, CCPC-z35-003) exist in a chain along the LOS stretching ≤120 cMpc. This may become a supercluster-sized structure at z = 0.

The cluster candidates at high redshift that Jay found are more common in the real universe than seen with mock observations made using the same techniques within the Millennium simulation. Their velocity dispersions are also larger than comparable simulated objects. This implies that the amount of mass that has assembled is larger than expected at that time in LCDM, or that speeds are boosted by something like MOND, or nothing has settled into anything like equilibrium yet. The last option seems most likely to me, but that doesn’t reconcile matters with LCDM, as we don’t see the same effect in the simulation.

MOND also predicts the early emergence of the cosmic web, which would explain the early appearance of very extended structures like the “big ring.” While some of these very large scale structures are probably not real, there seem to be a lot of such things being noted for all of them to be an illusion. The knee-jerk denials of all such structures reminds me of the shock cosmologists expressed at seeing quasars at redshifts as high as 4 (even 4.9! how can it be so?) or clusters are redshift 2, or the original CfA stickman, which surprised the bejeepers out of everybody in 1987. So many times I’ve been told that a thing can’t be true because it violates theoretician’s preconceptions, only for them to prove to be true, ultimately to be something the theorists expected all along.

Well, which is it?

So, as the title says, clusters ruin everything. The residual missing baryon problem that MOND suffers in clusters is both pernicious and persistent. It isn’t the outright falsification that many people presume it to be, but is sure don’t sit right. On the other hand, both the collision speeds of clusters (there are more examples now than just the Bullet Cluster) and the early appearance of clusters at high redshift is considerably more natural in MOND than In LCDM. So the data for clusters cuts both ways. Taking the most obvious interpretation of the Bullet Cluster data, this one object falsifies both LCDM and MOND.

As always, the conclusion one draws depends on how one weighs the different lines of evidence. This is always an invitation to the bane of cognitive dissonance, accepting that which supports our pre-existing world view and rejecting the validity of evidence that calls it into question. That’s why we have the scientific method. It was application of the scientific method that caused me to change my mind: maybe I was wrong to be so sure of the existence of cold dark matter? Maybe I’m wrong now to take MOND seriously? That’s why I’ve set criteria by which I would change my mind. What are yours?


*In the discussion associated with a debate held at KITP in 2018, one particle physicist said “We should just stop talking about rotation curves.” Straight-up said it out loud! No notes, no irony, no recognition that the dark matter paradigm faces problems beyond rotation curves.

+There are now multiple examples of colliding cluster systems known. They’re a mess (Abell 520 is also called “the train wreck cluster“), so I won’t attempt to describe them all. In Angus & McGaugh (2008) we did note that MOND predicted that high collision speeds would be more frequent than in LCDM, and I have seen nothing to make me doubt that. Indeed, Xavier Hernandez pointed out to me that supersonic shocks like that of the Bullet Cluster are often observed, but basically never occur in cosmological simulations.

Discussion of Dark Matter and Modified Gravity

To start the new year, I provide a link to a discussion I had with Simon White on Phil Halper’s YouTube channel:

In this post I’ll say little that we don’t talk about, but will add some background and mildly amusing anecdotes. I’ll also try addressing the one point of factual disagreement. For the most part, Simon & I entirely agree about the relevant facts; what we’re discussing is the interpretation of those facts. It was a perfectly civil conversation, and I hope it can provide an example for how it is possible to have a positive discussion about a controversial topic+ without personal animus.

First, I’ll comment on the title, in particular the “vs.” This is not really Simon vs. me. This is a discussion between two scientists who are trying to understand how the universe works (no small ask!). We’ve been asked to advocate for different viewpoints, so one might call it “Dark Matter vs. MOND.” I expect Simon and I could swap sides and have an equally interesting discussion. One needs to be able to do that in order to not simply be a partisan hack. It’s not like MOND is my theory – I falsified my own hypothesis long ago, and got dragged reluctantly into this business for honestly reporting that Milgrom got right what I got wrong.

For those who don’t know, Simon White is one of the preeminent scholars working on cosmological computer simulations, having done important work on galaxy formation and structure formation, the baryon fraction in clusters, and the structure of dark matter halos (Simon is the W in NFW halos). He was a Reader at the Institute of Astronomy at the University of Cambridge where we overlapped (it was my first postdoc) before he moved on to become the director of the Max Planck Institute for Astrophysics where he was mentor to many people now working in the field.

That’s a very short summary of a long and distinguished career; Simon has done lots of other things. I highlight these works because they came up at some point in our discussion. Davis, Efstathiou, Frenk, & White are the “gang of four” that was mentioned; around Cambridge I also occasionally heard them referred to as the Cold Dark Mafia. The baryon fraction of clusters was one of the key observations that led from SCDM to LCDM.

The subject of galaxy formation runs throughout our discussion. It is always a fraught issue how things form in astronomy. It is one thing to understand how stars evolve, once made; making them in the first place is another matter. Hard as that is to do in simulations, galaxy formation involves the extra element of dark matter in an expanding universe. Understanding how galaxies come to be is essential to predicting anything about what they are now, at least in the context of LCDM*. Both Simon and I have worked on this subject our entire careers, in very much the same framework if from different perspectives – by which I mean he is a theorist who does some observational work while I’m an observer who does some theory, not LCDM vs. MOND.

When Simon moved to Max Planck, the center of galaxy formation work moved as well – it seemed like he took half of Cambridge astronomy with him. This included my then-office mate, Houjun Mo. At one point I refer to the paper Mo & I wrote on the clustering of low surface brightness galaxies and how I expected them to reside in late-forming dark matter halos**. I often cite Mo, Mao, & White as a touchstone of galaxy formation theory in LCDM; they subsequently wrote an entire textbook about it. (I was already warning them then that I didn’t think their explanations of the Tully-Fisher relation were viable, at least not when combined with the effect we have subsequently named the diversity of rotation curve shapes.)

When I first began to worry that we were barking up the wrong tree with dark matter, I asked myself what could falsify it. It was hard to come up with good answers, and I worried it wasn’t falsifiable. So I started asking other people what would falsify cold dark matter. Most did not answer. They often had a shocked look like they’d never thought about it, and would rather not***. It’s a bind: no one wants it to be false, but most everyone accepts that for it to qualify as physical science it should be falsifiable. So it was a question that always provoked a record-scratch moment in which most scientists simply freeze up.

Simon was one of the first to give a straight answer to this question without hesitation, circa 1999. At that point it was clear that dark matter halos formed central density cusps in simulations; so those “cusps had to exist” in the centers of galaxies. At that point, we believed that to mean all galaxies. The question was complicated by the large dynamical contribution of stars in high surface brightness galaxies, but low surface brightness galaxies were dark matter dominated down to small radii. So we thought these were the ideal place to test the cusp hypothesis.

We no longer believe that. After many attempts at evasion, cold dark matter failed this test; feedback was invoked, and the goalposts started to move. There is now a consensus among simulators that feedback in intermediate mass galaxies can alter the inner mass distribution of dark matter halos. Exactly how this happens depends on who you ask, but it is at least possible to explain the absence of the predicted cusps. This goes in the right direction to explain some data, but by itself does not suffice to address the thornier question of why the distribution of baryons is predictive of the kinematics even when the mass is dominated by dark matter. This is why the discussion focused on the lowest mass galaxies where there hasn’t been enough star formation to drive the feedback necessary to alter cusps. Some of these galaxies can be described as having cusps, but probably not all. Thinking only in those terms elides the fact that MOND has a better record of predictive success. I want to know why this happens; it must surely be telling us something important about how the universe works.

The one point of factual disagreement we encountered had to do with the mass profile of galaxies at large radii as traced by gravitational lensing. It is always necessary to agree on the facts before debating their interpretation, so we didn’t press this far. Afterwards, Simon sent a citation to what he was talking about: this paper by Wang et al. (2016). In particular, look at their Fig. 4:

Fig. 4 of Wang et al. (2016). The excess surface density inferred from gravitational lensing for galaxies in different mass bins (data points) compared to mock observations of the same quantity made from within a simulation (lines). Looks like excellent agreement.

This plot quantifies the mass distribution around isolated galaxies to very large scales. There is good agreement between the lensing observations and the mock observations made within a simulation. Indeed, one can see an initial downward bend corresponding to the outer part of an NFW halo (the “one-halo term”), then an inflection to different behavior due to the presence of surrounding dark matter halos (the “two-halo term”). This is what Simon was talking about when he said gravitational lensing was in good agreement with LCDM.

I was thinking of a different, closely related result. I had in mind the work of Brouwer et al. (2021), which I discussed previously. Very recently, Dr. Tobias Mistele has made a revised analysis of these data. That’s worthy its own post, so I’ll leave out the details, which can be found in this preprint. The bottom line is in Fig. 2, which shows the radial acceleration relation derived from gravitational lensing around isolated galaxies:

The radial acceleration relation from weak gravitational lensing (colored points) extending existing kinematic data (grey points) to lower acceleration corresponding to very large radii (~ 1 Mpc). The dashed line is the prediction of MOND. Looks like excellent agreement.

This plot quantifies the radial acceleration due to the gravitational potential of isolated galaxies to very low accelerations. There is good agreement between the lensing observations and the extrapolation of the radial acceleration relation predicted by MOND. There are no features until extremely low acceleration where there may be a hint of the external field effect. This is what I was talking about when I said gravitational lensing was in good agreement with MOND, and that the data indicated a single halo with an r-2 density profile that extends far out where we ought to see the r-3 behavior of NFW.

The two plots above use the same method applied to the same kind of data. They should be consistent, yet they seem to tell a different story. This is the point of factual disagreement Simon and I had, so we let it be. No point in arguing about the interpretation when you can’t agree on the facts.

I do not know why these results differ, and I’m not going to attempt to solve it here. I suspect it has something to do with sample selection. Both studies rely on isolated galaxies, but how do we define that? How well do we achieve the goal of identifying isolated galaxies? No galaxy is an island; at some level, there is always a neighbor. But is it massive enough to perturb the lensing signal, or can we successfully define samples of galaxies that are effectively isolated, so that we’re only looking at the gravitational potential of that galaxy and not that of it plus some neighbors? Looks like there is some work left to do to sort this out.

Stepping back from that, we agreed on pretty much everything else. MOND as a fundamental theory remains incomplete. LCDM requires us to believe that 95% of the mass-energy content of the universe is something unknown and perhaps unknowable. Dark matter has become familiar as a term but remains a mystery so long as it goes undetected in the laboratory. Perhaps it exists and cannot be detected – this is a logical possibility – but that would be the least satisfactory result possible: we might as well resume counting angels on the head of a pin.

The community has been working on these issues for a long time. I have been working on this for a long time. It is a big problem. There is lots left to do.


+I get a lot of kill the messenger from people who are not capable of discussing controversial topics without personal animus. A lotinevitably from people who know assume they know more about the subject than I do but actually know much less. It is really amazing how many scientists equate me as a person with MOND as a theory without bothering to do any fact-checking. This is logical fallacy 101.

*The predictions of MOND are insensitive to the details of galaxy formation. Though of course an interesting question, we don’t need that in order to make predictions. All we need is the mass distribution that the kinematics respond to – we don’t need to know how it got that way. This is like the solar system, where it suffices to know Newton’s laws to compute orbits; we don’t need to know how the sun and planets formed. In contrast, one needs to know how a galaxy was assembled in LCDM to have any hope of predicting what its distribution of dark matter is and then using that to predict kinematics.

**The ideas Mo & I discussed thirty years ago have reappeared in the literature under the designation “assembly bias.”

***It was often accompanied by “why would you even ask that?” followed by a pained, constipated expression when they realized that every physical theory has to answer that question.

A post in which some value judgements are made about the situation with wide binaries

A post in which some value judgements are made about the situation with wide binaries

I have tried very hard to remain objective and even handed, but I find that I weary of the wide binary debate. I don’t know what the right answer will turn out to be. But I do have opinions.

For starters, it is a big Galaxy. There is just too much to know. When I wrote about the Milky Way earlier this year, the idea was to set up an expectation value for wide binaries in the solar neighborhood. That devolved into at least eight other posts on the Milky Way itself, because our Galaxy is too damn interesting, and has its own controversies. So it occurs to me that I never really got on with the regularly scheduled program.

In my assessment, the radial acceleration at the solar circle is 2.2 x 10-10 m/s/s, which in terms of the MOND acceleration scale is 1.8 a0. We live on the Newtonian side of the transition to the MOND regime. The ideal place to test MOND with wide binaries would be the deep MOND regime, well below a0. That is in a part of the Galaxy that is far, far away, and not currently accessible to us. What is accessible are wide binaries in the solar neighborhood (within 250 pc, about 1% of the Galaxy’s radius) as mapped by Gaia. Locally, the MOND effect is modest, but nonzero. We’re close enough to the transition for there to be a small, detectable effect.

Local binaries that are widely separated enough for their internal acceleration to drop below a0 find themselves in the regime dominated by the field of the rest of the Galaxy and subject to the so-called External Field Effect (EFE). This situation is illustrated in the lower right panel below.

Mass estimators in different regimes of acceleration. The top row illustrates pure Newtonian (left) and MOND (right) regimes. The bottom row illustrates the case of small systems embedded in larger systems. A low acceleration system embedded in a Newtonian external field is Newtonian (left) while a very low acceleration system embedded in a merely low acceleration system is quasi-Newtonian (right). Wide binaries fall in the last category.

Intriguingly, orbits in the EFE regime remain Keplerian. The rotation curves of nearby binaries, if you could map them, are not expected to be flat in MOND. They should, however, experience enhanced speeds, with a boost to the effective value of Newton’s constant: G → γG. The value of γ depends on the sum of internal and external acceleration as well as the shape of the interpolation function when near a0. That’s one reason to prefer to do this experiment in the deep MOND regime, where the shape of the interpolation function doesn’t matter. But that’s not where we live. For Galactic data and viable possibilities* for the interpolation function, a reasonable expectation value is γ = 1.4 ± 0.1. This is what the wide binary papers attempt to measure. So, what do they find?

Hernandez et al. find γ = 1.0±0.1 for 466 close binaries with 2D separations less than 0.01 pc (about 2000 AU) and γ = 1.5±0.2 for 108 wide binaries with 2D separations greater than 0.01 pc. A purely Newtonian result (γ = 1) is recovered in the high acceleration regime of relatively close binaries where this is expected to be the case. For wider binaries, one finds a boost value consistent with the prediction of MOND and differing from Newton with modest significance (2.6σ).

Chae reports+ γ = 1.49(+0.21/-0.19) for 2,463 “pure” binaries in the low acceleration regime, consistent with his earlier result γ = 1.43±0.06 for 26,615 wide binaries. The larger numbers make the formal error smaller, hence a formally more significant departure from Newton. Many of these binaries are impure in the sense of being triples with one member being itself a close binary as discussed previously, an effect that has to be modeled in large samples. The point of the smaller samples is to select true binaries so that this modeling is unnecessary. For his smaller pure binary sample, Chae finds a smooth transition from γ ≈ 1 at high acceleration (10-8 m/s/s ≈ 100a0) through γ ≈ 1.11 around 7a0 to γ ≈ 1.49 at local Galactic saturation (1.8 a0).

Banik et al. use a slightly different language. Translating, they find γ = 1 at high confidence (16σ)$ from 8,611 wide binaries with separations from 2,000 to 30,000 AU. Newtonian behavior persists at all scales and accelerations; they find no significant deviations from γ = 1 anywhere. Note that despite going out very far, to 30,000 AU, they do not reach especially low accelerations because the EFE of the Galaxy is effectively constant in the solar neighborhood. There is no getting away from the Galaxy’s 1.8 a0. They also do not reach particularly high, purely Newtonian accelerations: 2,000 AU is in the transition regime where MOND effects are perceptible.

Here is Fig. 11 from Banik et al., the key figure Dr. Banik was advocating in his comments to the previous post:

Fig. 11 of Banik et al. shows the median dimensionless characteristic velocity as a function of dimensionless binary separation for several bins of the data (solid lines). The predicted MOND effect increases with separation until saturating in the Galactic field (dashed lines). This is not seen in most of the data, with only a hint in the highest velocity bin that represents only a few percent of the data.

A flat line in this plot indicates no boost in velocity with diminishing acceleration, so one can clearly see the source of the claim that Newton works better than MOND. There is little indication that the velocity increases at wider separations. Effectively, γ ≈ 1 pretty much everywhere.

Both Chae and Hernandez have pointed out that the lack of a constraint on the high acceleration Newtonian regime is problematic. Orbits are Keplerian in the quasi-Newtonian regime, so the behavior looks Newtonian. Lacking an anchor in the high acceleration regime, it is conceivable that the analysis of Banik is detecting the predicted MOND quasi-Newtonian behavior and defining it to be purely Newtonian. It’s just a modest offset in qualitatively similar behavior. In this context, it is worth noting that Chae and Hernandez independently measure γ ≈ 1 at high acceleration as well as γ ≈ 1.5 at low acceleration: they [claim to] detect the difference between these regimes in a way Banik does not probe.

Now let’s look at the plot that gave me the heebie-jeebies with data on it, Banik et al.’s Fig. 12:

The left two panels of Fig. 12 from Banik et al. showing the probability of observing a particular dimensionless velocity in two bins of radial separation: 2,000 to 3,000 AU (top) and 5,000 to 12,000 AU (bottom). The histograms are the Gaia data. The black lines show the Newtonian prediction while the blue lines show that of MOND. These predictions depend on many things besides the underlying theory, sampling over many astrophysical complications like the distribution of stellar masses, orbital orientations to the line of sight, orbital phase, orbital eccentricity, the close binary fraction, and probably other things that I don’t instantly recall.

One can see the basis of the concern. At high acceleration, the prediction of Newton and MOND are identical. The top bin is the closest we get to that, yet there is a clear difference in the predictions. This bin is in the transition region; there is no bin at sufficiently high acceleration for the predictions to align and provide the self-calibration that both Chae and Hernandez independently exploit.

Looking at the data in the top panel, it clearly agrees better with the Newtonian prediction. I can believe that; what concerns me is the lack grounding at still higher acceleration where the black and blue lines should coincide. I do not have a sufficiently clear understanding of all the machinations (and their inevitable foibles) that go into the predicted lines to trust that this constitutes a definitive test.

Looking at the data in the bottom panel, it clearly agrees better with the prediction of MOND. The histogram of the data follows the blue line of MOND more closely than the Newtonian black line. This is so obvious that I wondered if the colors were wrong – maybe there had been some inadvertent switcheroo in the line color in the plotting code. Apparently not, as this point is addressed in the text of Banik et al.: [in the bottom panel,] “MOND performs somewhat better in a handful of pixels around the peak region…” Yes. Yes it does. That’s… a really weird way of putting it. They go on to say “…though given the uncertainties, the Newtonian model is not that far off.” One could just as well say that about MOND in the top panel. Just looking at this figure, one might conclude that they have detected MOND at large separations.

One of the things that gives me the heebie-jeebies about this figure is that there isn’t much difference in the location of the peaks of the distributions. Some, yes, but not much: Newton and MOND apparently predict very nearly the same typical velocity. Yet that is what Fig. 11 traces: the typical (median) normalized velocity. That only tells a tiny bit of the story that is in Fig. 12, and does not appear to be a particularly sensitive indicator of the effect we’re testing for.

Returning to the matter of statistics, the attentive reader might have noted that I have not said much about the number of binaries included in each analysis. These range from a few hundred to many thousands to tens of thousands. More is better, right?

In this case, I think not. There is always a tension between data quality and quantity. Quantity helps with the statistics, but only so long as there is a signal to be dug out. At some point, it becomes a matter of garbage in, garbage out. In this respect, I am inclined to agree with Ernest Rutherford:

If your experiment requires statistics, you ought to have done a better experiment.

Ernest Rutherford$

I suspect that we’re squeezing the stone of statistics too hard here. When we do this, we get the appearance of a signal when really we’re just grinding metal. I am reminded that any time we do a big experiment like this (dark matter searches are a great example), the first thing we learn about are all the false signals we didn’t anticipate. That happens no matter how well we construct the experiment, and I give Banik et al. credit for planning this out ahead of time. That doesn’t guarantee that everything comes out right on the first attempt. There is just so much junk that the universe can and does throw at us that it is easy to imagine that the samples with large numbers of binaries bring with them too much junk (e.g., false binaries). If the fraction of junk is high, then it will look like junk at all scales – there will be no trend with increasing separation even if there is a signal buried in junk.

Consequently, I am at present inclined to trust more the super-clean sample of Hernandez – the high quality binaries where there is a chance that we’re actually measuring what we want to measure. There are only a few hundred such binaries, so the statistical confidence is modest (2.6σ). I worry that in setting the highly restrictive standards necessary to select the best binaries that we might unintentionally omit objects that could change the answer. But at least there is some confidence that these are real binaries that stands above the accumulation of all the gratuitously enormous amount of junk the universe has to throw our way.

I hope the principal scientists can come to agreement about what the data show and not just wind up having the same argument over and over forever more. That’s what usually happens. I ask all parties to remember that it is important to retain the ability to change one’s mind. All of them have demonstrated the ability to do this previously, and somebody will need to do it again.


*In principle, one might also hope to distinguish between specific theories of MOND. Examples of modified gravity like AQUAL and QUMOND give slightly different predictions. To be able to do this seems… optimistic at this point.

In modified inertia theories, the interpolation function is a chimera that depends on each orbital trajectory, so its effective realization may differ between the nearly circular motions of rotation curves and eccentric wide binaries. If so, the interpolation function defined by external galaxies may not be relevant to the problem. Ultimately, we need a theory that automatically results in the MONDian phenomenology in galaxies. Whatever it is wide binaries are doing should help inform this theory development as well as test alternatives and hopefully exclude some of them.

+The work of Chae has gone through some revisions in response to a referee, but the basic findings are unchanged. Having read an earlier version, I appreciate the clarity provided by the additions: this is a case where the refereeing process was beneficial. (I was not the referee of this or any of these papers. Editors keep me busy enough as it is, thank you very much.)

$Outside of long-established observations like the value of Gauss’s constant, there is no such thing as 16σ confidence in astronomy. The tails of real probability distributions are never as tiny as a pure Gaussian. This is a remarkably naive assertion.

Full speed in reverse!

Full speed in reverse!

People have been asking me about comments in a recent video by Sabine Hossenfelder. I have not watched it, but the quote I’m asked about is “the higher the uncertainty of the data, the better MOND seems to work” with the implication that this might mean that MOND is a systematic artifact of data interpretation. I believe, because they consulted me about it, that the origin of this claim emerged from recent work by Sabine’s student Maria Khelashvili on fitting the SPARC data.

Let me address the point about data interpretation first. Fitting the SPARC data had exactly nothing to do with attracting my attention to MOND. Detailed MOND fits to these data are not particularly important in the overall scheme of these things as I’ll discuss in excruciating detail below. Indeed, these data didn’t even exist until relatively recently.

It may, at this juncture in time, surprise some readers to learn that I was once a strong advocate for cold dark matter. I was, like many of its current advocates, rather derisive of alternatives, the most prominent at the time being baryonic dark matter. What attracted my attention to MOND was that it made a priori predictions that were corroborated, quite unexpectedly, in my data for low surface brightness galaxies. These results were surprising in terms of dark matter then and to this day remain difficult to understand. After a lot of struggle to save dark matter, I realized that the best we could hope to do with dark matter was to contrive a model that reproduced after the fact what MOND had predicted a priori. That can never be satisfactory.

So – I changed my mind. I admitted that I had been wrong to be so completely sure that the solution to the missing mass problem had to be some new form of non-baryonic dark matter. It was not easy to accept this possibility. It required lengthy and tremendous effort to admit that Milgrom had got right something that the rest of us had got wrong. But he had – his predictions came true, so what was I supposed to say? That he was wrong?

Perhaps I am wrong to take MOND seriously? I would love to be able to honestly say it is wrong so I can stop having this argument over and over. I’ve stipulated the conditions whereby I would change my mind to again believe that dark matter is indeed the better option. These conditions have not been met. Few dark matter advocates have answered the challenge to stipulate what could change their minds.

People seem to have become obsessed with making fits to data. That’s great, but it is not fundamental. Making a priori predictions is fundamental, and has nothing to do with fitting data. By construction, the prediction comes before the data. Perhaps this is one way to distinguish between incremental and revolutionary science. Fitting data is incremental science that seeks the best version of an accepted paradigm. Successful predictions are the hallmark of revolutionary science that make one take notice and say, hey, maybe something entirely different is going on.

One of the predictions of MOND is that the RAR should exist. It was not expected in dark matter. As a quick review of the history, here is the RAR as it was known in 2004 and now (as of 2016):

The radial acceleration relation constructed from data available in 2004 and that from 2016.

The big improvement provided by SPARC was a uniform estimate of the stellar mass surface density of galaxies based on Spitzer near-infrared data. These are what are used to construct the x-axis: gbar is what Newton predicts for the observed mass distribution. SPARC was a vast improvement over the optical data we had previously, to the point that the intrinsic scatter is negligibly small: the observed scatter can be attributed to the various uncertainties and the expected scatter in stellar mass-to-light ratios. The latter never goes away, but did turn out to be at the low end of the range we expected. It could easily have looked worse, as it did in 2004, even if the underlying physical relation was perfect.

Negligibly small intrinsic scatter is the best one can hope to find. The issue now is the fit quality to individual galaxies (not just the group plot above). We already know MOND fits rotation curve data. The claim that appears in Dr. Hossenfelder’s video boils down to dark matter providing better fits. This would be important if it told us something about nature. It does not. All it teaches us about is the hazards of fitting data for which the errors are not well behaved.

While SPARC provides a robust estimate of gbar, gobs is based on a heterogeneous set of rotation curves drawn from a literature spanning decades. The error bars on these rotation curves have not been estimated in a uniform way, so we cannot blindly fit the data with our favorite software tool and expect that to teach us something about physical reality. I find myself having to say this to physicists over and over and over and over and over again: you cannot trust astronomical error bars to behave as Gaussian random variables the way one would like and expect in a controlled laboratory setting.

Astronomy is not conducted in a controlled laboratory. It is an observational science. We cannot put the entire universe in a box and control all the variables. We can hope to improve the data and approach this ideal, but right now we’re nowhere near it. These fitting analyses assume that we are.

Screw it. I really am sick of explaining this over and over, so I’m just going to cut & paste verbatim what I told Hossenfelder & Khelashvili by email when they asked. This is not the first time I’ve written an email like this, and I’m sure it won’t be the last.


Excruciating details: what I said to Hossenfelder & Khelashvili about the perils of rotation curve fitting on 22 September 2023 in response for their request for comments on the draft of the relevant paper:

First, the work of Desmond is a good place to look for an opinion independent of mine. 

Second, in my experience, the fit quality you find is what I’ve found before: DM halos with a constant density core consistently give the best fits in terms of chi^2, then MOND, then NFW. The success of cored DM halos happens because it is an extremely flexible fitting function: the core radius and core density can be traded off to fit any dog’s leg, and is highly degenerate with the stellar M*/L. NFW works less well because it has a less flexible shape. But both work because they have more parameters [than MOND].

Third, statistics will not save us here. I once hoped that the BIC would sort this out, but having gone down that road, I believe the BIC does not penalize models sufficiently for adding free parameters. You allude to this at the end of section 3.2. When you go from MOND (with fixed a0 it has only one parameter, M*/L, to fit to account for everything) to a dark matter halo (which has at a minimum 3 parameters: M*/L plus two to describe the halo) then you gain an enormous amount of freedom – the volume of possible parameter space grows enormously. But the BIC just says if you had 20 degrees of freedom before, now you have 22. That does not remotely represent the amount of flexibility that represents: some free parameters are more equal than others. MOND fits and DM halo fits are not the same beast; we can’t compare them this way any more than we can compare apples and snails. 

Worse, to do this right requires that the uncertainties be real random errors. They are not. SPARC provides homogeneous mass models based on near-IR observations of the stellar mass distribution. Those should be OK to the extent that near-IR light == stellar mass. That is a decent mapping, but not perfect. Consequently, we expect the occasional galaxy to misbehave. UGC 128 is a case where the MOND fit was great with optical data then became terrible with near-IR data. The absolute difference in the data are not great, but in terms of the formal chi^2 it is. So is that a failure of the model, or of the data to represent what we want it to represent?

This happens all the time in astronomy. Here, we want to know the circular velocity of a test particle in the gravitational potential predicted by the baryonic mass distribution. We never measure either of those quantities. What we measure is the (i) stellar light distribution and the (ii) Doppler velocities of gas. We assume we can map stellar light to stellar mass and Doppler velocity to orbital speed, but no mass model is perfect, nor is any patch of observed gas guaranteed to be on a purely circular orbit. These are known unknowns: uncertainties that we know are real but we cannot easily quantify. These assumptions that we have to make to do the analysis dominate over the random errors in many cases. We also assume that galaxies are in dynamical equilibrium, but 20% of spirals show gross side-to-side asymmetries, and at least 50% mild ones. So what is the circular motion in those cases? (F579-1 is a good example)

While SPARC is homogeneous in its photometry, it is extremely heterogeneous in its rotation curve measurements. We’re working on fixing that, but it’ll take a while. Consequently, as you note, some galaxies have little constraining power while others appear to have lots. That’s because many of the rotation curve velocity uncertainties are either grossly over or underestimated. To see this, plot the cumulative distribution of chi^2 for any of your models (or see the CDF published by Li et al 2018 for the RAR and Li et al 2020 for dark matter halos of many flavors. So many, I can’t recall how many CDF we published.) Anyway, for a good model, chi^2 is always close to one, so the CDF should go up sharply and reach one quickly – there shouldn’t be many cases with very low chi^2 or very high chi^2. Unfortunately, rotation curve data do not do this for any type of model. There are always way too many cases with chi^2 << 1 and also too many with chi^2 >> 1. One might conclude that all models are unacceptable – or that the error bars are Messed Up. I think the second option is the case. If so, then this sort of analysis will always have the power to mislead. 

I insert Fig. 1 from Li et al. (2020) so you don’t have to go look it up. The CDF of a statistically good model would rise sharply, being an almost vertical line at chi^2 = 1. No model of any flavor does that. That’s in large part because the uncertainties on some rotation curves are too large, while those on others are too small. The greater flexibility of dark matter models make them incrementally better than MOND for the cases with error bars that are too small – hence the corollary statement that “the higher the uncertainty of the data, the better MOND seems to work.” This happens because dark matter models are allowed to chase bogus outliers with tiny error bars in a way that MOND cannot. That doesn’t make dark matter better, it just makes it is easier to fool.

  A key thing to watch out for is the outsized effects of a few points with tiny error bars. Among galaxies with high chi^2, what often happens is that there is one point with a tiny error bar that does not agree with any of the rest of the data for any smoothly continuous rotation curve. Fitting programs penalize a model for missing this point by many sigma, so will do anything they can to make it better. So what happens is that if you let a0 vary with a flat prior, it will got to some very silly values in order to buy a tiny improvement in chi^2. Formally, that’s a better fit, so you say OK, a0 has to vary. But if you plot the fitted RCs with fixed and variable a0, you will be hard pressed to see the difference. Chi^2 is different, sure, but both will have chi^2 >> 1, so a lousy fit either way, and we haven’t really gained anything meaningful from allowing for the greater fitting freedom. Really it is just that one point that is Wrong even though it has a tiny error bar – which you can see relative to the other points, never mind the model. Dark matter halos have more flexibility from the beginning, so this is less obvious for them even though the same thing happens.

So that’s another big point – what is the prior for a dark matter halo? [Your] Table 1 allows V200 and C200 to be pretty much anything. So yes, you will find a fit from that range. For Burkert halos, there is no prior, since these do not emerge from any theory – they’re just a flexible French curve. For NFW halos, there is a prior from cosmology – see McGaugh et al (2007) among a zillion other possible references, including Li et al (2020). In any[L]CDM cosmology, the parameters V200 and C200 correlate – they are not independent. So a reasonable prior would be a Gaussian in log(C200) at a given V200 as specified by some simulation (Macio et al; see Li et al 2020). Another prior is how V200 (or M200) relates to the observed baryonic mass (or stellar mass). This one is pretty dodgy. Originally, we expected a fixed ratio between baryonic and dark mass. So when I did this kind of analysis in the ’90s, I found NFW flunked hard compared to MOND. (I didn’t know about the BIC then.) Galaxy DM halos simply do not look like NFW halos that form in LCDM and host galaxies with a few percent of their mass in the luminous disk even though this was the standard model for many years (Mo, Mao, & White 1998). If we drop the assumption that luminous galaxies are always a fixed fraction of their dark matter halos, then better fits can be obtained. I suspect your uniform prior fits have halo masses all over the place; they probably don’t correlate well with the baryonic mass, nor are their C and V200 parameters likely to correlate as they are predicted to do. You could apply the expected mass-concentration and stellar mass-halo mass relations as priors, then NFW will come off worse in your analysis because you’ve restricted them to where they ought to live.

So, as you say – it all comes down to the prior.

Even applying a stellar mass-halo mass relation from abundance matching isn’t really independent information, though that’s the best you can hope to do. But I was saying 20+ years ago that fixed mass ratios wouldn’t work, but nobody then wanted to abandon that obvious assumption. Since then, they’ve been forced to do so. But there is no good physical reason for it (feedback is the deus ex machina of all problems in the field), what happened is that the data forced us to drop the obvious assumption. Data including kinematic data (McGaugh et al 2010). So adopting a modern stellar mass-halo mass relation will give you a stronger prior than a uniform prior, but that choice has already been informed by the kinematic data that you’re trying to fit. How do we properly penalize the model for cheating about its “prior” by peaking at past data?

So, as you say – it all comes down to the prior. I think it would be important here to better constrain the priors on the DM halo fits. Li et al (2020) discuss this. Even then we’re not done, because galaxy formation modifies the form of the halo function we’re fitting. They shouldn’t end up as NFW even if they start out that way – see Li et al 2022a & b. Those papers consider the inevitable effects of adiabatic compression, but not of feedback. If feedback really has the effects on DM halos that is frequently advertised, then neither NFW or Burkert are appropriate fitting functions – they’re not what LCDM+feedback predicts. Good luck extracting a legitimate prediction from simulations, though. So we’re stuck doing what you’re trying to do: adopt some functional form to represent the DM halo, and see what fits. What you’ve done here agrees with my experience: cored DM halos work best. But they don’t represent an LCDM prediction, or any other broader theory, so – so what? 

Another detail to be wary of – the radial range over which the RC data constrain the DM halo fit is often rather limited compared to the size of the halo. To complicate matters further, the inner regions are often star-dominated, so there is not much of a handle on DM from where the data are best, at least beyond many galaxies preferring not to have a cusp since the stars already get the job done at small R. So, one ends up with V_DM(R) constrained from 3% to 10% of the virial radius, or something like that. V200 and C200 are defined at the notional virial radius, so there are many combinations of these parameters that might adequately fit the observed range while being quite different elsewhere. Even worse, NFW halos are pretty self-similar – there are combinations of (C200,V200) that are highly degenerate, so you can’t really tell the difference between them even with excellent data – the confidence contours look like bananas in C200-V200 space, with low C/high V often being as good as high C/low V. Even even even worse is that the observed V_DM(R) is often approximately a straight line. Any function looks like a straight line if you stretch it out enough. Consequently, the fits to LSB galaxies often tend to absurdly low C and high V200: NFW never looks like a straight line, but it does if you blow it up enough. So one ends up inferring that the halo masses of tiny galaxies are nearly as big as those of huge galaxies, or more so! My favorite example was NGC 3109, a tiny dwarf on the edge of the Local Group. A straight NFW fit suggests that the halo of this one little galaxy weighs more than the entire Local Group, M31 + MW + everything else combined. This is the sort of absurd result that comes from fitting the NFW halo form to a limited radial range of data. 

I don’t know that this helps you much, but you see a few of the concerns. 

Wide binary debate heats up again

Wide binary debate heats up again

One of the most interesting and contentious results concerning MOND this year has been the dynamics of wide binaries. When last I wrote on this topic, way back at the end of August, Chae (2023) and Hernandez (2023) both had new papers finding evidence for MONDian behavior in wide binaries. Since that time, they each have written additional papers on the subject. These independent efforts both report strong evidence for MONDian behavior in wide binaries, so for all of October it seemed like Game Over for conventional* dark matter.

I refrained from writing a post then because I was still waiting to see if there would be a contradictory paper. Now there is. And boy, is it contradictory! Where Hernandez et al. find 2.6σ evidence for non-Newtonian behavior and Chae finds ~5σ evidence for non-Newtonian behavior, both consistent with MOND, Banik et al. find purely Newtonian behavior and claim to exclude MOND at 19σ. That’s pretty high confidence!

Well, which is it, young feller? You got proof of non-Newtonian dynamics, or you want to insist that’s impossible?

After the latest results appeared, a red-hot debate [re]ignited on e-mail, largely along the lines of what was discussed at the conference in St. Andrews. Banik et al say that they can reproduce the MOND-like signal of Chae, but that it goes away when the data quality restriction is applied to physical velocity uncertainties (arguing that this is what you want to know) rather than to raw observational uncertainties. Chae and Hernandez counter that the method Banik et al. apply is not grounded in the Newtonian regime where everyone agrees on what should happen, so they could be calibrating the signal away. This is one thing that I had the impression that everyone had agreed to work on in St. Andrews, but it doesn’t appear that we’re there yet.

Banik et al. do a carefully planned Bayesian analysis. This approach in principle allows one to separate many effects simultaneously, one of which is close binaries (CB**). I look at the impact that close binaries have on the analysis, and it gives me the heebie-jeebies:

One panel from Fig. 10 of Banik et al.

This figure illustrates the probability of measuring a characteristic velocity in MOND for the noted range of projected sky separation. If it is just wide binaries (WB), you get the blue line. If there are some close binaries, the expected distribution changes dramatically. This change is rather larger than the signal expected from the nominal difference in gravity. You can in principle fit for everything simultaneously, but extracting the right small signal when there is a big competing signal can be tricky. Bayesian analyses can help, but they are also a double-sided sledge-hammer: a powerful tool with which to pound the data, but also a tool that can bounce back and smack you in the face. Having done such analyses, and been smacked around a few times (and having seen others get smacked around), looking at this plot really does give me the heebie-jeebies. There are lots of ways in which this can go wrong – or even just overstate the confidence of a correct result.

Everyone uses Bayesian methods these days.***

I expect people are expecting me to comment on this hot mess. Some have already asked me to do so. I really don’t want to. I’ve already said more than I should.

There are very earnest, respectable people doing this work; I don’t think anyone is being intentionally misleading. Somebody must be wrong, but it isn’t my job to sort out who. Moreover, these are long and involved analyses; it will take me time to read all the papers and make sense of them. Maybe once I do, I’ll have something more cogent to say.

I make no promises.


*By conventional dark matter, I mean new particles that only communicate with baryons via gravity.

**CB: In principle, some of the wide binaries detected by Gaia will also be close binaries, in the sense that one of the two widely separated stars is itself not a single star but an unrecognized close binary. We know this happen in nature: the nearest star system, αCentauri, is an example. The main A&B components compose a close binary with Proxima Centauri being widely separated. Modeling how often this happens in the Gaia data gives me the willies.

***To paraphrase Churchill: Many forms of statistics have been tried, and will be tried in this science of sin and woe. No one pretends+ that Bayes is perfect or all-wise. Indeed it has been said that Bayes is the worst form of statistics except for all those other forms that have been tried from time to time.

+Lots of people pretend that Bayes is perfect and all-wise.

How things go mostly right or badly wrong

How things go mostly right or badly wrong

People often ask me of how “perfect” MOND has to be. The short answer is that it agrees with galaxy data as “perfectly” as we can perceive – i.e., the scatter in the credible data is accounted for entirely by known errors and the expected scatter in stellar mass-to-light ratios. Sometimes it nevertheless looks to go badly wrong. That’s often because we need to know both the mass distribution and the kinematics perfectly. Here I’ll use the Milky Way as an example of how easily things can look bad when they aren’t.

First, an update. I had hoped to stop talking about the Milky Way after the recent series of posts. But it is in the news, and there is always more to say. A new realization of the rotation curve from the Gaia DR3 data has appeared, so let’s look at all the DR3 data together:

Gaia DR3 realizations of the Milky Way rotation curve. The most recent version of these data from Poder et al (2023) are shown as blue squares over the range 5 < R < 13 kpc. Other Gaia DR3 realizations include Ou et al. (2023, green circles), Wang et al. (2023, magenta downward pointing triangles), and Zhou et al. (2023, purple triangles).

The new Gaia realization does not go very far out, and has larger uncertainties. That doesn’t mean it is worse; it might simply be more conservative in estimating uncertainties, and not making a claim where the data don’t substantiate it. Neither does that mean the other realizations are wrong: these differences are what happens in different analyses. Indeed, all the independent realizations of the Gaia data are pretty consistent, despite the different stellar selection criteria and analysis techniques. This is especially true for R < 17 kpc where there are lots of stars informing the measurements. Even beyond that, I would say they are consistent at the level we’d expect for astronomy.

Zooming out to compare with other results:

The Milky Way rotation curve. The model line from McGaugh (2018) is shown with data from various sources. The abscissa switches from linear to logarithmic at 10 kpc to wedge it all in. The location of the Large Magellanic Cloud at 50 kpc is noted. Gaia DR3 data (Poder et al., Ou et al., Wang et al., and Zhou et al.) are shown as in the plot above. The small black squares are the Gaia DR2 realization of Eilers et al. (2019) reanalyzed to include the effect of bumps and wiggles by McGaugh (2019). Non-Gaia data include blue horizontal branch stars (light blue squares) and red giants (red squares) in the stellar halo (Bird et al. 2022), globular clusters (Watkins et al. 2019, pink triangles), VVV stars (Portail et al. 2017, dark grey squares at R < 2.2 kpc), and terminal velocities (McClure-Griffiths & Dickey 2007, 2016, light grey points from 3 < R < 8 kpc). These terminal velocities are the only data that inform the model line; everything else follows.

Overall, I would say the data paint a pretty consistent picture. The biggest tension amongst the data illustrated here is between the outermost Gaia points around R = 25 kpc and the corresponding results from halo stars. One is consistent with the model line and the other is not. We shouldn’t allow the model to inform our interpretation; the important point is that the independent data disagree with each other. This happens all the time in astronomy. Sometimes it boils down to different assumptions; sometimes it is a real discrepancy. Either way, one has to learn* to cope.

The sharp-eyed will also notice an apparent tension between the DR2 data (black squares) and DR3 around 6 and 7 kpc. This is not real – it is an artifact of different treatments of the term in the Jeans equation for the logarithmic derivative of the density profile of the tracer particles. That’s a choice made in the analysis. The data are entirely consistent when treated consistently.

Putting on an empiricist’s hat, I will say that the kink in the slope of the Gaia data around R = 18 kpc looks unnatural. That doesn’t happen in other galaxies. Rather than belabor the point further, I’ll simply say that this is how things mostly go right but also a little wrong. This is as good as we can hope for in [extra]galactic astronomy.

In contrast, it is easy to go very wrong. To give an example, here is a model of the Milky Way that was built to approximately match the rotation curve of Sofue (2020).


Fig. 1 from Dai et al. (2022). Note the logarithmic abscissa. Their caption: The rotation curve of the Milky Way. The data (solid dark circles with error bars) for r < 100kpc come from [22], while for r > 100kpc from [23]. The solid, dashed and doted lines describe the contribution from the bulge, stellar disk and dark matter halo respectively, within a ΛCDM model of the galaxy. The dashed-dot line is the total contribution of all three components.The parameters of each component are taken from [24]. For comparison, the Milky way rotation curve from Gaia DR2 is shown in color. The red dots are data from [34], the blue upward-pointing triangles are from [35], while the cyan downward-pointing triangles are from [36].

This realization of the rotation curve is very different from that seen above. Note that the rotation curve (black points) is very different from that of Gaia (red points) over the same radial range. These independent data are inconsistent; at least one of them is wrong. The data extend to very large radii, encompassing not only the LMC but also Andromeda (780 kpc away). I am already concerned about the effects of the LMC at 50 kpc; Andromeda is twice the baryonic mass of the Milky Way so anything beyond 260 kpc is more Andromeda’s territory than ours – depending on which side we’re talking about. The uncertainties are so big out there they provide no constraining power anyway.

In terms of MOND-required perfection, things fall apart for the Dai model already at very small radii. Dai et al. (2022) chose to fit their bulge component to the high amplitude terminal velocities of Sofue. That’s a reasonable thing to do, if we think the terminal velocities represent circular motion. Because of the non-circular motions that sustain the Galactic bar, they almost certainly do not – that’s why I restricted use of terminal velocities to larger radii. We also know something about the light distribution:

The inner 3 kpc of the Milky Way. The circles are the terminal velocities of Sofue (2020); the squares are the equivalent circular velocity of the potential reconstructed from the kinematics of stars in the VVV survey (Portail et al. 2017). The line is the bulge-bar model of McGaugh (2008) based on the light distribution reported by Binney et al (1997).

This is essentially the same graph as I showed before, but showing only the Newtonian bulge-bar component, and on a logarithmic abscissa for comparison with the plot of Dai et al. The two bulge models are very different. That of Dai et al. is more massive and more compact, as required to match the terminal velocities. There may be galaxies out there that look like this, but the Milky Way is not one of them.

Indeed, Newton’s prediction for the rotation curve of the bulge-bar component – the line labeled bulge/bar based on what the Milky Way looks like – is in good agreement with the effective circular speed curve obtained from stellar data. It is not consistent with the terminal velocities. We could increase the amplitude of the Newtonian prediction by increasing the mass-to-light ratio of the stars (I have adopted the value I expect for stellar populations), but the shape would still be wrong. This does not come as a surprise to most Galactic astronomers, because we know there is a bar in the center of the Milky Way and we know that bars induce non-circular motions, so we do not expect the terminal velocities to be a fair tracer of the rotation curve in this region. That’s why Portail et al. had to go to great lengths in their analysis to reconstruct the equivalent circular velocity, as did I just to build the bulge-bar model.

The thing about predicting rotation curves from the observed mass, as MOND does, is that you have to get both the kinematic data and the mass distribution right. The velocity predicted at any radius depends on the mass enclosed by that radius. So if we get the bulge badly wrong, everything spirals down the drain from there.

Dai et al. (2022) compare their model to the acceleration residuals predicted by MOND for their mass model. If all is well, the data should scatter around the constant line at zero in this graph:

Fig. 4 from Dai et al. (2022). Their caption: [The radial acceleration relation] recast as a comparison between the total acceleration, a, and the MOND prediction, aM , as a function of the acceleration due to baryons aB. The solid horizontal line is a = aM. The circles and squares with error bars represent the Milky Way and M31 data, while the gray dots are from the EAGLE simulation of ΛCDM in [1]. For aB > 10−10m/s2 any difference between a and aM is unclear. However, once aB drops well below 10−11m/s2, the discrepancy emerges. The short-dashed line is the ΛCDM fitting curve of the MW. The dash-dot line is the ΛCDM fitting curve of M31. The mass range** of galaxies in EAGLE’s data is chosen to be between 5 × 1010M to 5 × 1011M. For comparison, the Milky way rotation curve from GAIA data release II is shown in color. The red dots are data from [34], the blue triangles are from [35], while the cyan down triangles are from [36]. While the EAGLE simulation does not match the data perfectly, these plots indicate that it is much easier to accommodate a systematic downward trend with the ΛCDM model than with MOND.

Things are not well.

The interpretation that is offered (right in the figure caption) is that MOND is wrong and the LCDM-based EAGLE simulation does a better if not perfect job of explaining things. We already know that’s not right. The alternate interpretation is that this is not a valid representation of the prediction of MOND, because their mass model does not follow from the observed distribution of light. They get neither the baryonic mass distribution and its predicted acceleration ab nor the total acceleration a right in the plot above.

In terms of dark matter, the model of Dai et al. may appear viable. In terms of MOND, it is way off, not just a little off. The residuals are only zero, as they should be, for a narrow range of accelerations, 2 to 3 x 10-10 m/s/s. That’s more Newton than MOND, and appears to correspond to the limited range in radii over which their model matches the rotation curve data in their Fig. 1 (roughly 4 to 6 kpc). It doesn’t really fit the data elsewhere, and the restrictions on a MOND fit are considerably more stringent than on the sort of dark matter model they construct: there’s no reason to expect their model to behave like MOND in the first place.

And, hoo boy, does it ever not behave like MOND. Look at how far those red points – the Gaia DR2 data – deviate from zero in their Fig. 4. Those are the exact same data that agree well with the model line I show above – the data that were correctly predicted in advance. This model is a reasonable representation of the radial force predicted by MOND, with the blue line in my plot being equivalent to the zero line in theirs.

This is how things can go badly wrong. To properly apply MOND, we need to measure both the kinematics and baryonic mass distribution correctly. If we screw either up, as is easy to do in astronomy, then the result will look very wrong, even if it shouldn’t. Combine this with the eagerness many people have to dismiss MOND outright, and you wind up with lots of articles claiming that MOND is wrong – even when that’s not really the story the data tell. Happens over and over again, so the field remains stagnant.


*This is a large part of the cultural difference between physics and astronomy. Physicists are spoiled by laboratory experiments done in controlled conditions in which one can measure to the sixth place of decimals. In contrast, astronomy is an observational rather than experimental science. We can’t put the universe in a box and control all the systematics – measuring most quantities to 1% is a tall order. Consequently, astronomers are used to being wrong. While I wouldn’t say that astronomers cope with it gracefully, they’re well aware that it happens, that is has happened a lot historically, and will continue to happen in the future. It is a risk we all take in trying to understand a universe so much vaster than ourselves. This makes astronomers rather more tolerant of surprising results – results where the first response is “that can’t be right!” but also informed by the experience that “we’ve been wrong before!” Physicists coming to the field generally lack this experience and take the error bars way too seriously. I notice this attitude is creeping into the younger generation of astronomers; people who’ve received their data from distant observatories and performed CPU-intensive MCMC error analyses, so want to believe them, but often lack the experience of dozens of nights spent at the observatory sweating a thousand ill-controlled but consequential details, like walking out to a beautiful sunrise decorated by wisps of cirrus clouds. When did those arrive?!?


**The data that define the radial acceleration relation come from galaxies spanning six decades in stellar mass, so this one decade range from the simulations is tiny – it is literally comparing a factor of ten to a a factor of a million. What happens outside the illustrated mass range? Are lower masses even resolved?

Recent Developments Concerning the Gravitational Potential of the Milky Way. III. A Closer Look at the RAR Model

Recent Developments Concerning the Gravitational Potential of the Milky Way. III. A Closer Look at the RAR Model

I am primarily an extragalactic astronomer – someone who studies galaxies outside our own. Our home Galaxy is a subject in its own right. Naturally, I became curious how the Milky Way appeared in the light of the systematic behaviors we have learned from external galaxies. I first wrote a paper about it in 2008; in the process I realized that I could use the RAR to infer the distribution of stellar mass from the terminal velocities observed in interstellar gas. That’s not necessary in external galaxies, where we can measure the light distribution, but we don’t get a view of the whole Galaxy from our location within it. Still, it wasn’t my field, so it wasn’t until 2015/16 that I did the exercise in detail. Shortly after that, the folks who study the supermassive black hole at the center of the Galaxy provided a very precise constraint on the distance there. That was the one big systematic uncertainty in my own work up to that point, but I had guessed well enough, so it didn’t make a big change. Still, I updated the model to the new distance in 2018, and provided its details on my model page so anyone could use it. Then Gaia data started to pour in, which was overwhelming, but I found I really didn’t need to do any updating: the second data release indicated a declining rotation curve at exactly the rate the model predicted: -1.7 km/s/kpc. So far so good.

I call it the RAR model because it only involves the radial force. All I did was assume that the Milky Way was a typical spiral galaxy that followed the RAR, and ask what the mass distribution of the stars needed to be to match the observed terminal velocities. This is a purely empirical exercise that should work regardless of the underlying cause of the RAR, be it MOND or something else. Of course, MOND is the only theory that explicitly predicted the RAR ahead of time, but we’ve gone to great lengths to establish that the RAR is present empirically whether we know about MOND or not. If we accept that the cause of the RAR is MOND, which is the natural interpretation, then MOND over-predicts the vertical motions by a bit. That may be an important clue, either into how MOND works (it doesn’t necessarily follow the most naive assumption) or how something else might cause the observed MONDian phenomenology, or it could just be another systematic uncertainty of the sort that always plagues astronomy. Here I will focus on the RAR model, highlighting specific radial ranges where the details of the RAR model provide insight that can’t be obtained in other ways.

The RAR Milky Way model was fit to the terminal velocity data (in grey) over the radial range 3 < R < 8 kpc. Everything outside of that range is a prediction. It is not a prediction limited to that skinny blue line, as I have to extrapolate the mass distribution of the Milky Way to arbitrarily large radii. If there is a gradient in the mass-to-light ratio, or even if I guess a little wrong in the extrapolation, it’ll go off at some point. It shouldn’t be far off, as V(R) is mostly fixed by the enclosed mass. Mostly. If there is something else out there, it’ll be higher (like the cyan line including an estimate of the coronal gas in the plot that goes out to 130 kpc). If there is a bit less than the extrapolation, it’ll be lower.

The RAR model Milky Way (blue line) together with the terminal velocities to which it was fit (light grey points), VVV data in the inner 2.2 kpc (dark grey squares), and the Zhou et al. (2023) realization of the Gaia DR3 data. Also shown are the number of stars per bin from Gaia (right axis).

From 8 to 19 kpc, the Gaia data as realized by Zhao et al. fall bang on the model. They evince exactly the slowly declining rotation curve that was predicted. That’s pretty good for an extrapolation from R < 8 kpc. I’m not aware of any other model that did this well in advance of the observation. Indeed, I can’t think of a way to even make a prediction with a dark matter model. I’ve tried this – a lot – and it is as easy to come up with a model whose rotation curve is rising as one that is falling. There’s nothing in the dark matter paradigm that is predictive at this level of detail.

Beyond R > 19 kpc, the match of the model and Zhou et al. realization of the data is not perfect. It is still pretty damn good by astronomical standards, and better than the Keplerian dotted line. Cosmologists would be wetting themselves with excitement if they could come this close to predicting anything. Heck, they’re known to do that even when they’re obviously wrong*.

If the difference between the outermost data and the blue line is correct, then all it means is that we have to tweak the model to have a bit less mass than assumed in the extrapolation. I call it a tweak because it would be exactly that: a small change to an assumption I was obliged to make in order to do the calculation. I could have assumed something else, and almost did: there is discussion in the literature that the disk of the Milky Way is truncated at 20 kpc. I considered using a mass model with such a feature, but one can’t make it a sharp edge as that introduces numerical artifacts when solving the Poisson equation numerically, as this procedure depends on derivatives that blow up when they encounter sharp features. Presumably the physical truncation isn’t unphysically sharp anyway, rather being a transition to a steeper exponential decline as we sometimes see in other galaxies. However, despite indications of such an effect, there wasn’t enough data to constrain it in a way useful for my model. So rather than introduce a bunch of extra, unconstrained freedom into the model, I made a straight extrapolation from what I had all the way to infinity in the full knowledge that this had to be wrong at some level. Perhaps we’ve found that level.

That said, I’m happy with the agreement of the data with the model as is. The data become very sparse where there is even a hint of disagreement. Where there are thousands of stars per bin in the well-fit portion of the rotation curve, there are only tens per bin outside 20 kpc. When the numbers get that small, one has to start to worry that there are not enough independent samples of phase space. A sizeable fraction of those tens of stars could be part of the same stellar stream, which would bias the results to that particular unrepresentative orbit. I don’t know if that’s the case, which is the point: it is just one of the many potential systematic uncertainties that are not represented in the formal error bars. Missing those last five points by two sigma is as likely to be an indication that the error bars have been underestimated as it is to be an indication that the model is inadequate. Trying to account for this sort of thing is why the error bars of Jiao et al. are so much bigger than the formal uncertainties in the three realization papers.

That’s the outer regions. The place where the RAR model disagrees the most with the Gaia data is from 5 < R < 8 kpc, which is in the range where it was fit! So what’s going on there?

Again, the data disagree with the data. The stellar data from Gaia disagree with the terminal velocity data from interstellar gas at high significance. The RAR model was fit to the latter, so it must per force disagree with the former. It is tempting to dismiss one or the other as wrong, but do they really disagree?

Adapted from Fig. 4 of McGaugh (2019). Grey points are the first and fourth quadrant terminal velocity data to which the model (blue line) was matched. The red squares are the stellar rotation curve estimated with Gaia DR2 (DR3 is indistinguishable). The black squares are the stellar rotation curve after adjustment to be consistent with a mass profile that includes spiral arms. This adjustment for self-consistency remedies the apparent discrepancy between gas and stellar data.

In order to build the model depicted above, I chose to split the difference between the first and fourth quadrant terminal velocity data. I fit them separately in McGaugh (2016) where I made the additional point that the apparent difference between the two quadrants is what we expect from an m=2 mode – i.e., a galaxy with spiral arms. That means these velocities are not exactly circular as commonly assumed, and as I must per force assume to build the model. So I split the difference above in the full knowledge that this is not the exact circular velocity curve of the Galaxy, it’s just the best I can do at present. This is another example of the systematic uncertainties we encounter: the difference between the first and fourth quadrant is real and is telling us that the galaxy is not azimuthally symmetric – as anyone can tell by looking at any spiral galaxy, but is a detail we’d like to ignore so we can talk about disk+dark matter halo models in the convenient limit of axisymmetry.

Though not perfect – no model is – the RAR model Milky Way is a lot better than models that ignore spiral structure entirely, which is basically all of them. The standard procedure assumes an exponential disk and some form of dark matter halo. Allowance is usually made for a central bulge component, but it is relatively rare to bother to include the interstellar gas, much less consider deviations from a pure exponential disk. Having adopted the approximation of an exponential disk, one inevitably get a smooth rotation curve like the dashed line below:

Fig. 1 from McGaugh (2019). Red points are the binned fourth quadrant molecular hydrogen terminal velocities to which the model (blue line) has been fit. The dotted lines shows the corresponding Newtonian rotation curve of the baryons. The dashed line is the model of Bovy & Rix (2013) built assuming an exponential disk. The inset shows residuals of the models from the data. The exponential model does not and cannot fit these data.

The common assumption of exponential disk precludes the possibility of fitting the bumps and wiggles observed in the terminal velocities. These occur because of deviations from a pure exponential profile caused by features like spiral arms. By making this assumption, the variations in mass due to spiral arms is artificially smoothed over. They are not there by assumption, and there is no way to recover them in a dark matter fit that doesn’t know about the RAR.

Depending on what one is trying to accomplish, an exponential model may suffice. The Bovy & Rix model shown above is perfectly reasonable for what they were trying to do, which involved the vertical motions of stars, not the bumps and wiggles in the rotation curve. I would say that the result they obtain is in reasonable agreement with the rotation curve, given what they were doing and in full knowledge that we can’t expect to hit every error bar of every datum of every sort. But for the benefit of the chi-square enthusiasts who are concerned about missing a few data points at large radii, the reduced chi-squared of the Bovy & Rix model is 14.35 while that of the RAR model is 0.6. A good fit is around 1, so the RAR model is a good fit while the smooth exponential is terrible – as one can see by eye in the residual inset: the smooth exponential model gets the overall amplitude about right, but hits none of the data. That’s the starting point for every dark matter model that assumes an exponential disk; even if they do a marginally better job of fitting the alleged Keplerian downturn, they’re still a lot worse if we consider the terminal velocity data, the details of which are usually ignored.

If instead we pay attention the details of the terminal velocity data, we discover that the broad features seen there in are pretty much what we expect for the kinematic signatures of photometrically known spiral arms. That is, the mass density variations inferred by fitting the RAR correspond to spiral arms that are independently known from star counts. We’ve discussed this before.

Spiral structure in the Milky Way (left) as traced by HII regions and Giant Molecular Clouds (GMCs). These correspond to bumps in the surface density profile inferred from kinematics with the RAR (right).

If we accept that the bumps and wiggles in the terminal velocities are tracers of bumps and wiggles in the stellar mass profiles, as seen in external galaxies, then we can return to examining the apparent discrepancy between them and the stellar rotation curve from Gaia. The latter follow from an application of the Jeans equation, which helps us sort out the circular motion from the mildly eccentric orbits of many stars. It includes a term that depends on the gradient of the density profile of the stars that trace the gravitational potential. If we assume an exponential disk, then that term is easily calculated. It is slowly and smoothly varying, and has little impact on the outcome. One can explore variations of the assumed scale length of the disk, and these likewise have little impact, leading us to infer that we don’t need to worry about it. The trouble with this inference is that it is predicated on the assumption of a smooth exponential disk. We are implicitly assuming that there are no bumps and wiggles.

The bumps and wiggles are explicitly part of the RAR model. Consequently, the gradient term in the Jeans equation has a modest but important impact on the result. Applying it to the Gaia data, I get the black points:

The red squares are the Gaia DR2 data. The black squares are the same data after including in the Jeans equation the effect of variations in the tracer gradient. This term dominates the uncertainties.

The velocities of the Gaia data in the range illustrated all go up. This systematic effect reconciles the apparent discrepancy between the stellar and gas rotation curves. The red points are highly discrepant from the gray points, but the black points are not. All it took was to drop the assumption of a smooth exponential profile and calculate the density gradient numerically from the data. This difference has a more pronounced impact on rotation curve fits than any of the differences between the various realizations of the Gaia DR3 data – hence my cavalier attitude towards their error bars. Those are not the important uncertainties.

Indeed, I caution that we still don’t know what the effective circular velocity of the potential is. I’ve made my best guess by splitting the difference between the first and fourth quadrant terminal velocity data, but I’ve surely not got it perfectly right. One might view the difference between the quadrants as the level at which the perfect quantity is practically unknowable. I don’t think it is quite that bad, but I hope I have at least given the reader some flavor for some of the hidden systematic uncertainties that we struggle with in astronomy.

It gets worse! At small radii, there is good reason to be wary of the extent to which terminal velocities represent circular motion. Our Galaxy hosts a strong bar, as artistically depicted here:

Artist’s rendition of the Milky Way. Image credit: NASA/JPL-Caltech.

Bars are a rich topic in their own right. They are supported by non-circular orbits that maintain their pattern. Consequently, one does not expect gas in the region where the bar is to be on circular orbits. It is not entirely clear how long the bar in our Galaxy is, but it is at least 3 kpc – which is why I have not attempted to fit data interior to that. I do, however, have to account for the mass in that region. So I built a model based on the observed light distribution. It’s a nifty bit of math to work out the equivalent circular velocity corresponding to a triaxial bar structure, so having done it once I’ve not been keen to do it again. This fixes the shape of the rotation curve in the inner region, though the amplitude may shift up and down with the mass-to-light ratio of the stars, which dominate the gravitational potential at small radii. This deserves its own close up:

Colored points are terminal velocities from Marasco et al. (2017), from both molecular (red) and atomic (green) gas. Light gray circles are from Sofue (2020). These are plotted assuming they represent circular motions, which they do not. Dark grey squares are the equivalent circular velocity inferred from stars in the VVV survey. The black line is the Newtonian mass model for the central bar and disk, and the blue line is the corresponding RAR model as seen above.

Here is another place where the terminal velocities disagree with the stellar data. This time, it is because the terminal velocities do not trace circular motion. If we assume they do, then we get what is depicted above, and for many years, that was thought to be the Galactic rotation curve, complete with a pronounced classical bulge. Many decades later, we know the center of the Galaxy is not dominated by a bulge but rather a bar, with concominant non-circular motions – motions that have been observed in the stars and carefully used to reconstruct the equivalent circular velocity curve by Portail et al. (2017). This is exactly what we need to compare to the RAR model.

Note that 2008, when the bar model was constructed, predates 2017 (or the 2016 appearance of the preprint). While it would have been fair to tweak the model as the data improved, this did not prove necessary. The RAR model effectively predicted the inner rotation curve a priori. That’s a considerably more impressive feat than getting the outer slope right, but the model manages both sans effort.

No dark matter model can make an equivalent boast. Indeed, it is not obvious how to do this at all; usually people just make a crude assumption with some convenient approximation like the Hernquist potential and call it a day without bothering to fit the inner data. The obvious prediction for a dark matter model overshoots the inner rotation curve, as there is no room for the cusp predicted in cold dark matter halos – stars dominate the central potential. One can of course invoke feedback to fix this, but it is a post hoc kludge rather than a prediction, and one that isn’t supposed to apply in galaxies as massive as the Milky Way. Unless it needs to, of course.

So, lets’s see – the RAR model Milky Way reconciles the tension between stellar and interstellar velocity data, indicates density bumps that are in the right location to correspond to actual spiral arms, matches the effective circular velocity curve determined for stars in the Galactic bar, correctly predicted the slope of the rotation curve outside the solar circle out to at least 19 kpc, and is consistent with the bulk of the data at much larger radii. That’s a pretty successful model. Some realizations of the Gaia DR3 data are a bit lower than predicted, but others are not. Hopefully our knowledge of the outer rotation curve will continue to improve. Maybe the day will come when the data have improved to the point where the model needs to be tweaked a little bit, but it is not this day.


*To give one example, the BICEP II experiment infamously claimed in March of 2014 to have detected the Inflationary signal of primordial gravitational waves in their polarization data. They held a huge press conference to announce the result in clear anticipation of earning a Nobel prize. They did this before releasing the science paper, much less hearing back from a referee. When they did release the science paper, it was immediately obvious on inspection that they had incorrectly estimated the dust foreground. Their signal was just that – excess foreground emission. I could see that in a quick glance at the relevant figure as soon as the paper was made available. Literally – I picked it up, scanned through it, saw the relevant figure, and could immediately spot where they had gone wrong. And yet this huge group of scientists all signed their name to the submitted paper and hyped it as the cosmic “discovery of the century”. Pfft.

Recent Developments Concerning the Gravitational Potential of the Milky Way. II. A Closer Look at the Data

Recent Developments Concerning the Gravitational Potential of the Milky Way. II. A Closer Look at the Data

Continuing from last time, let’s compare recent rotation curve determinations from Gaia DR3:

Fig. 1 from Jiao et al. comparing three different realizations of the Galactic rotation curve from Gaia DR3. The vertical lines* mark the range of the Ou et al. data considered by Chan & Chung Law (2023).

These are different analyses of the same dataset. The Gaia data release is immense, with billions of stars. There are gazillions of ways to parse these data. So it is reasonable to have multiple realizations, and we shouldn’t expect them to necessarily agree perfectly: do we look exclusively at K giants? A stars? Only stars with proper motion and/or parallax data more accurate than some limit? etc. Of course we want to understand any differences, but that’s not going to happen here.

My first observation is that the various analyses are broadly consistent. They all show a steady decline over a large range of radii. Nothing shocking there; it is fairly typical for bright, compact galaxies like the Milky Way to have somewhat declining rotation curves. The issue here, of course, is how much, and what does it mean?

Looking more closely, not all of the data agree with each other, or even with themselves. There are offsets between the three at radii around the sun (we live just outside R = 8 kpc) where you’d naively think they would agree the best. They’re very consistent from 13 < R < 17 kpc, then they start to diverge a little. The Ou data have a curious uptick right around R = 17 kpc, which I wouldn’t put much stock in; weird kinks like that sometimes happen in astronomical data. But it can’t be consistent with a continuous mass distribution, and will come up again for other reasons.

As an astronomer, I’m happy with the level of agreement I see here. It is not perfect, in the sense that there are some points from one data set whose error bars do not overlap with those of other data sets in places. That’s normal in astronomy, and one of the reasons that we can never entirely trust the stated uncertainties. Jiao et al. make a thorough and yet still incomplete assessment of the systematic uncertainties, winding up with larger error bars on the Wang et al. realization of the data.

For example, one – just one of the issues we have to contend with – is the distance to each star in the sample. Distances to individual objects are hard, and subject to systematic uncertainties. The reason to choose A stars or K giants is because you think you know their luminosity, so can estimate their distance. That works, but aren’t necessarily consistent (let alone correct) among the different groups. That by itself could be the source of the modest difference we see between data sets.

Chan & Chung Law use the Ou et al. realization of the data to make some strong claims. One is that the gradient of the rotation curve is -5 km/s/kpc, and this excludes MOND at high confidence. Here is their plot.

You will notice that, as they say, these are the data of Ou et al, being identical to the same points in the plot from Jiao et al. above – provided you only look in the range between the lines, 17 < R < 23 kpc. This is where the kink at R = 17 kpc comes in. They appear to have truncated the data right where it needs to be truncated to ignore the point with a noticeably lower velocity, which would surely affect the determination of the slope and reduce its confidence level. They also exclude the point with a really big error bar that nominally is within their radial range. That’s OK, as it has little significance: it’s large error bar means it contributes little to the constraint. That is not the case for the datum just inside of R = 17 kpc, or the rest of the data at smaller radii for that matter. These have a manifestly shallower slope. Looking at the line boundaries added to Jiao’s plot, it appears that they selected the range of the data with the steepest gradient. This is called cherry-picking.

It is a strange form of cherry-picking, as there is no physical reason to expect a linear fit to be appropriate. A Keplerian downturn has velocity decline as the inverse square root of radius (see the dotted line above.) These data, over this limited range, may be consistent with a Keplerian downturn, but certainly do not establish that it is required.

Contrast the statements of Chan & Chung Law with the more measured statement from the paper where the data analysis is actually performed:

… a low mass for the Galaxy is driven by the functional forms tested, given that it probes beyond our measurements. It is found to be in tension with mass measurements from globular clusters, dwarf satellites, and streams.

Ou et al. (2023)

What this means is that the data do not go far enough out to measure the total mass. The low mass that is inferred from the data is a result of fitting some specific choice of halo form to it. They note that the result disagrees with other data, as I discussed last time.

Rather than cherry pick the data, we should look at all of it. Let’s see, I’ve done that before. We looked at the Wang et al. (2023) data via Jiao et al. previously, and just discussed the Ou et al. data. That leaves the new Zhao et al. data, so let’s look at those:

Milky Way rotation curve with RAR model (blue line from 2018) and the Gaia DR3 data as realized by Zhou et al. (2023: purple triangles). The dashed line shows the number of stars (right axis) informing each datum.

These data were the last of the current crop that I looked at. They look… pretty good in comparison with the pre-existing RAR model. Not exactly the falsification I had been led to expect.

So – the three different realizations of the Gaia DR3 data are largely consistent, yet one is being portrayed as a falsification of MOND while another is in good agreement with its prediction.

This is why you have to take astronomical error bars with a grain of salt. Three different groups are using data from the same source to obtain very nearly the same result. It isn’t quite the same result, as some of the data disagree at the formal limits of their uncertainty. No big deal – that’s what happens in astronomy. The number of stars per bin helps illustrate one reason why: we go from thousands of stars per bin near the sun to tens of stars in wider bins at R > 20 kpc. That’s not necessarily problematic, but it is emblematic of what we’re dealing with: great gobs of data up close, but only scarce scratches of it far away where systematic effects are more pernicious.

In the meantime, one realization of these data are being portrayed as a death knell for a theory that successfully predicts another realization of the same data. Well, which is it?


*Thanks to Moti Milgrom for pointing out the restricted range of radii considered by Chan & Chung Law and adding the vertical lines to this figure.