A Blog About the Science and Sociology of Cosmology and Dark Matter
Stacy McGaugh is an astrophysicist and cosmologist who studies galaxies, dark matter, and theories of modified gravity. He is an expert on low surface brightness galaxies, a class of objects in which the stars are spread thin compared to bright galaxies like our own Milky Way. He demonstrated that these dim galaxies appear to be dark matter dominated, providing unique tests of theories of galaxy formation and modified gravity.
Professor McGaugh is currently the chair of the Department of Astronomy at Case Western Reserve University in Cleveland, Ohio, and director of the Warner and Swasey Observatory. Previously he was a member of the faculty at the University of Maryland, having also held research fellowships at Rutgers, the Department of Terrestrial Magnetism of the Carnegie Institution of Washington, and the Institute of Astronomy at the University of Cambridge after earning his Ph.D. from the University of Michigan.
One experience I’ve frequently had in Astronomy is that there is no result so obvious that someone won’t claim the exact opposite. Indeed, the more obvious the result, the louder the claim to contradict it.
There is a very obvious acceleration scale in galaxies. It can be seen in several ways. Here I describe a nice way that is completely independent of any statistics or model fitting: no need to argue over how to set priors.
Simple dimensional analysis shows that a galaxy with a flat rotation curve has a characteristic acceleration
g† = 0.8 Vf4/(G Mb)
where Vf is the flat rotation speed, Mb is the baryonic mass, and G is Newton’s constant. The factor 0.8 arises from the disk geometry of rotating galaxies, which are not spherical cows. This is first year grad school material: see Binney & Tremaine. I include it here merely to place the characteristic acceleration g† on the same scale as Milgrom’s acceleration constant a0.
These are all known numbers or measurable quantities. There are no free parameters: nothing to fiddle; nothing to fit. The only slightly tricky quantity is the baryonic mass, which is the sum of stars and gas. For the stars, we measure the light but need the mass, so we must adopt a mass-to-light ratio, M*/L. Here I adopt the simple model used to construct the radial acceleration relation: a constant 0.5 M⊙/L⊙ at 3.6 microns for galaxy disks, and 0.7 M⊙/L⊙ for bulges. This is the best present choice from stellar population models; the basic story does not change with plausible variations.
This is all it takes to compute the characteristic acceleration of galaxies. Here is the resulting histogram for SPARCgalaxies:
Do you see the acceleration scale? It’s right there in the data.
I first employed this method in 2011, where I found <g†> = 1.24 ± 0.14 Å s-2 for a sample of gas rich galaxies that predates and is largely independent of the SPARC data. This is consistent with the SPARC result <g†> = 1.20 ± 0.02 Å s-2. This consistency provides some reassurance that the mass-to-light scale is near to correct since the gas rich galaxies are not sensitive to the choice of M*/L. Indeed, the value of Milgrom’s constant has not changed meaningfully since Begeman, Broeils, & Sanders (1991).
The width of the acceleration histogram is dominated by measurement uncertainties and scatter in M*/L. We have assumed that M*/L is constant here, but this cannot be exactly true. It is a good approximation in the near-infrared, but there must be some variation from galaxy to galaxy, as each galaxy has its own unique star formation history. Intrinsic scatter in M*/L due to population difference broadens the distribution. The intrinsic distribution of characteristic accelerations must be smaller.
I have computed the scatter budget many times. It always comes up the same: known uncertainties and scatter in M*/L gobble up the entire budget. There is very little room left for intrinsic variation in <g†>. The upper limit is < 0.06 dex, an absurdly tiny number by the standards of extragalactic astronomy. The data are consistent with negligible intrinsic scatter, i.e., a universal acceleration scale. Apparently a fundamental acceleration scale is present in galaxies.
The radial acceleration relation connects what we see in visible mass with what we get in galaxy dynamics. This is true in a statistical sense, with remarkably little scatter. The SPARC data are consistent with a single, universal force law in galaxies. One that appears to be sourced by the baryons alone.
This was not expected with dark matter. Indeed, it would be hard to imagine a less natural result. We can only salvage the dark matter picture by tweaking it to make it mimic its chief rival. This is not a healthy situation for a theory.
On the other hand, if these results really do indicate the action of a single universal force law, then it should be possible to fit each individual galaxy. This has been done manytimesbefore, with surprisingly positive results. Does it work for the entirety of SPARC?
For the impatient, the answer is yes. Graduate student Pengfei Li has addressed this issue in a paper in press at A&A. There are some inevitable goofballs; this is astronomy after all. But by and large, it works much better than I expected – the goof rate is only about 10%, and the worst goofs are for the worst data.
Fig. 1 from the paper gives the example of NGC 2841. This case has been historically problematic for MOND, but a good fit falls out of the Bayesian MCMC procedure employed. We marginalize over the nuisance parameters (distance and inclination) in addition to the stellar mass-to-light ratio of disk and bulge. These come out a tad high in this case, but everything is within the uncertainties. A long standing historical problem is easily solved by application of Bayesian statistics.
Another example is provided by the low surface brightness (LSB) dwarf galaxy IC 2574. Note that like all LSB galaxies, it lies at the low acceleration end of the RAR. This is what attracted my attention to the problem a long time ago: the mass discrepancy is large everywhere, so conventionally dark matter dominates. And yet, the luminous matter tells you everything you need to know to predict the rotation curve. This makes no physical sense whatsoever: it is as if the baryonic tail wags the dark matter dog.
In this case, the mass-to-light ratio of the stars comes out a bit low. LSB galaxies like IC 2574 are gas rich; the stellar mass is pretty much an afterthought to the fitting process. That’s good: there is very little freedom; the rotation curve has to follow almost directly from the observed gas distribution. If it doesn’t, there’s nothing to be done to fix it. But it is also bad: since the stars contribute little to the total mass budget, their mass-to-light ratio is not well constrained by the fit – changing it a lot makes little overall difference. This renders the formal uncertainty on the mass-to-light ratio highly dubious. The quoted number is correct for the data as presented, but it does not reflect the inevitable systematic errors that afflict astronomical observations in a variety of subtle ways. In this case, a small change in the innermost velocity measurements (as happens in the THINGS data) could change the mass-to-light ratio by a huge factor (and well outside the stated error) without doing squat to the overall fit.
We can address statistically how [un]reasonable the required fit parameters are. Short answer: they’re pretty darn reasonable. Here is the distribution of 3.6 micron band mass-to-light ratios.
From a stellar population perspective, we expect roughly constant mass-to-light ratios in the near-infrared, with some scatter. The fits to the rotation curves give just that. There is no guarantee that this should work out. It could be a meaningless fit parameter with no connection to stellar astrophysics. Instead, it reproduces the normalization, color dependence, and scatter expected from completely independent stellar population models.
The stellar mass-to-light ratio is practically inaccessible in the context of dark matter fits to rotation curves, as it is horribly degenerate with the parameters of the dark matter halo. That MOND returns reasonable mass-to-light ratios is one of those important details that keeps me wondering. It seems like there must be something to it.
Unsurprisingly, once we fit the mass-to-light ratio and the nuisance parameters, the scatter in the RAR itself practically vanishes. It does not entirely go away, as we fit only one mass-to-light ratio per galaxy (two in the handful of cases with a bulge). The scatter in the individual velocity measurements has been minimized, but some remains. The amount that remains is tiny (0.06 dex) and consistent with what we’d expect from measurement errors and mild asymmetries (non-circular motions).
For those unfamiliar with extragalactic astronomy, it is common for “correlations” to be weak and have enormous intrinsic scatter. Early versions of the Tully-Fisher relation were considered spooky-tight with a mere 0.4 mag. of scatter. In the RAR we have a relation as near to perfect as we’re likely to get. The data are consistent with a single, universal force law – at least in the radial direction in rotating galaxies.
That’s a strong statement. It is hard to understand in the context of dark matter. If you think you do, you are not thinking clearly.
So how strong is this statement? Very. We tried fits allowing additional freedom. None is necessary. One can of course introduce more parameters, but we find that no more are needed. The bare minimum is the mass-to-light ratio (plus the nuisance parameters of distance and inclination); these entirely suffice to describe the data. Allowing more freedom does not meaningfully improve the fits.
For example, I have often seen it asserted that MOND fits require variation in the acceleration constant of the theory. If this were true, I would have zero interest in the theory. So we checked.
Here we learn something important about the role of priors in Bayesian fits. If we allow the critical acceleration g† to vary from galaxy to galaxy with a flat prior, it does indeed do so: it flops around all over the place. Aha! So g† is not constant! MOND is falsified!
Well, no. Flat priors are often problematic, as they have no physical motivation. By allowing for a wide variation in g†, one is inviting covariance with other parameters. As g† goes wild, so too does the mass-to-light ratio. This wrecks the stellar mass Tully-Fisher relation by introducing a lot of unnecessary variation in the mass-to-light ratio: luminosity correlates nicely with rotation speed, but stellar mass picks up a lot of extraneous scatter. Worse, all this variation in both g† and the mass-to-light ratio does very little to improve the fits. It does a tiny bit – χ2 gets infinitesimally better, so the fitting program takes it. But the improvement is not statistically meaningful.
In contrast, with a Gaussian prior, we get essentially the same fits, but with practically zero variation in g†. wee The reduced χ2 actually gets a bit worse thanks to the extra, unnecessary, degree of freedom. This demonstrates that for these data, g† is consistent with a single, universal value. For whatever reason it may occur physically, this number is in the data.
We have made the SPARC data public, so anyone who wants to reproduce these results may easily do so. Just mind your priors, and don’t take every individual error bar too seriously. There is a long tail to high χ2 that persists for any type of model. If you get a bad fit with the RAR, you will almost certainly get a bad fit with your favorite dark matter halo model as well. This is astronomy, fergodssake.
A recently discovered dwarf galaxy designated NGC1052-DF2 has been in the news lately. Apparently a satellite of the giant elliptical NGC 1052, DF2 (as I’ll call it from here on out) is remarkable for having a surprisingly low velocity dispersion for a galaxy of its type. These results were reported in Nature last week by van Dokkum et al., and have caused a bit of a stir.
It is common for giant galaxies to have some dwarf satellite galaxies. As can be seen from the image published by van Dokkum et al., there are a number of galaxies in the neighborhood of NGC 1052. Whether these are associated physically into a group of galaxies or are chance projections on the sky depends on the distance to each galaxy.
NGC 1052 is listed by the NASA Extragalactic Database (NED) as having a recession velocity of 1510 km/s and a distance of 20.6 Mpc. The next nearest big beastie is NGC 1042, at 1371 km/s. The difference of 139 km/s is not much different from 115 km/s, which is the velocity that Andromeda is heading towards the Milky Way, so one could imagine that this is a group similar to the Local Group. Except that NED says the distance to NGC 1042 is 7.8 Mpc, so apparently it is a foreground object seen in projection.
Van Dokkum et al. assume DF2 and NGC 1052 are both about 20 Mpc distant. They offer two independent estimates of the distance, one consistent with the distance to NGC 1052 and the other more consistent with the distance to NGC 1042. Rather than wring our hands over this, I will trust their judgement and simply note, as they do, that the nearer distance would change many of their conclusions. The redshift is 1803 km/s, larger than either of the giants. It could still be a satellite of NGC 1052, as ~300 km/s is not unreasonable for an orbital velocity.
So why the big fuss? Unlike most galaxies in the universe, DF2 appears not to require dark matter. This is inferred from the measured velocity dispersion of ten globular clusters, which is 8.4 km/s. That’s fast to you and me, but rather sluggish on the scale of galaxies. Spread over a few kiloparsecs, that adds up to a dynamical mass about equal to what we expect for the stars, leaving little room for the otherwise ubiquitous dark matter.
This is important. If the universe is composed of dark matter, it should on occasion be possible to segregate the dark from the light. Tidal interactions between galaxies can in principle do this, so a galaxy devoid of dark matter would be good evidence that this happened. It would also be evidence against a modified gravity interpretation of the missing mass problem, because the force law is always on: you can’t strip it from the luminous matter the way you can dark matter. So ironically, the occasional galaxy lacking dark matter would constitute evidence that dark matter does indeed exist!
DF2 appears to be such a case. But how weird is it? Morphologically, it resembles the dwarf spheroidal satellite galaxies of the Local Group. I have a handy compilation of those (from Lelli et al.), so we can compute the mass-to-light ratio for all of these beasties in the same fashion, shown in the figure below. It is customary to refer quantities to the radius that contains half of the total light, which is 2.2 kpc for DF2.
Perhaps the most obvious respect in which DF2 is a bit unusual relative to the dwarfs of the Local Group is that it is big and bright. Most nearby dwarfs have half light radii well below 1 kpc. After DF2, the next most luminous dwarfs is Fornax, which is a factor of 5 lower in luminosity.
DF2 is called an ultradiffuse galaxy (UDG), which is apparently newspeak for low surface brightness (LSB) galaxy. I’ve been working on LSB galaxies my entire career. While DF2 is indeed low surface brightness – the stars are spread thin – I wouldn’t call it ultra diffuse. It is actually one of the higher surface brightness objects of this type. Crater 2 and And XIX (the leftmost points in the right panel) are ultradiffuse.
Astronomers love vague terminology, and as a result often reinvent terms that already exist. Dwarf, LSB, UDG, have all been used interchangeably and with considerable slop. I was sufficiently put out by this that I tried to define some categories is the mid-90s. This didn’t catch on, but by my definition, DF2 is VLSB – very LSB, but only by a little – it is much closer to regular LSB than to extremely (ELSB). Crater 2 and And XIX, now they’re ELSB, being more diffuse than DF2 by 2 orders of magnitude.
Whatever you call it, DF2 is low surface brightness, and LSB galaxies are always dark matter dominated. Always, at least among disk galaxies: here is the analogous figure for galaxies that rotate:
Pressure supported dwarfs generally evince large mass discrepancies as well. So in this regard, DF2 is indeed very unusual. So what gives?
Perhaps DF2 formed that way, without dark matter. This is anathema to everything we know about galaxy formation in ΛCDM cosmology. Dark halos have to form first, with baryons following.
Perhaps DF2 suffered one or more tidal interactions with NGC 1052. Sub-halos in simulations are often seen to be on highly radial orbits; perhaps DF2 has had its dark matter halo stripped away by repeated close passages. Since the stars reside deep in the center of the subhalo, they’re the last thing to be stripped away. So perhaps we’ve caught this one at that special time when the dark matter has been removed but the stars still remain.
This is improbable, but ought to happen once in a while. The bigger problem I see is that one cannot simply remove the dark matter halo like yanking a tablecloth and leaving the plates. The stars must respond to the change in the gravitational potential; they too must diffuse away. That might be a good way to make the galaxy diffuse, ultimately perhaps even ultradiffuse, but the observed motions are then not representative of an equilibrium situation. This is critical to the mass estimate, which must perforce assume an equilibrium in which the gravitational potential well of the galaxy is balanced against the kinetic motion of its contents. Yank away the dark matter halo, and the assumption underlying the mass estimate gets yanked with it. While such a situation may arise, it makes it very difficult to interpret the velocities: all tests are off. This is doubly true in MOND, in which dwarfs are even more susceptible to disruption.
Then there are the data themselves. Blaming the data should be avoided, but it does happen once in a while that some observation is misleading. In this case, I am made queasy by the fact that the velocity dispersion is estimated from only ten tracers. I’ve seen plenty of cases where the velocity dispersion changes in important ways when more data are obtained, even starting from more than 10 tracers. Andromeda II comes to mind as an example. Indeed, several people have pointed out that if we did the same exercise with Fornax, using its globular clusters as the velocity tracers, we’d get a similar answer to what we find in DF2. But we also have measurements of many hundreds of stars in Fornax, so we know that answer is wrong. Perhaps the same thing is happening with DF2? The fact that DF2 is an outlier from everything else we know empirically suggests caution.
Throwing caution and fact-checking to the wind, many people have been predictably eager to cite DF2 as a falsification of MOND. Van Dokkum et al. point out the the velocity dispersion predicted for this object by MOND is 20 km/s, more than a factor of two above their measured value. They make the MOND prediction for the case of an isolated object. DF2 is not isolated, so one must consider the external field effect (EFE).
The criterion by which to judge isolation in MOND is whether the acceleration due to the mutual self-gravity of the stars is less than the acceleration from an external source, in this case the host NGC 1052. Following the method outlined by McGaugh & Milgrom, and based on the stellar mass (adopting M/L=2 as both we and van Dokkum assume), I estimate an internal acceleration of DF2 to be gin = 0.15 a0. Here a0 is the critical acceleration scale in MOND, 1.2 x 10-10 m/s/s. Using this number and treating DF2 as isolated, I get the same 20 km/s van Dokkum et al. estimate.
Estimating the external field is more challenging. It depends on the mass of NGC 1052, and the separation between it and DF2. The projected separation at the assumed distance is 80 kpc. That is well within the range that the EFE is commonly observed to matter in the Local Group. It could be a bit further granted some distance along the line of sight, but if this becomes too large then the distance by association with NGC 1052 has to be questioned, and all bets are off. The mass of NGC 1052 is also rather uncertain, or at least I have heard wildly different values quoted in discussions about this object. Here I adopt 1011 M☉ as estimated by SLUGGS. To get the acceleration, I estimate the asymptotic rotation velocity we’d expect in MOND, V4 = a0GM. This gives 200 km/s, which is conservative relative to the ~300 km/s quoted by van Dokkum et al. At a distance of 80 kpc, the corresponding external acceleration gex = 0.14 a0. This is very uncertain, but taken at face value is indistinguishable from the internal acceleration. Consequently, it cannot be ignored: the calculation published by van Dokkum et al. is not the correct prediction for MOND.
The velocity dispersion estimator in MOND differs when gex < gin and gex > gin (see equations 2 and 3 of McGaugh & Milgrom). Strictly speaking, these apply in the limits where one or the other field dominates. When they are comparable, the math gets more involved (see equation 59 of Famaey & McGaugh). The input data are too uncertain to warrant an elaborate calculation for a blog, so I note simply that the amplitude of the mass discrepancy in MOND depends on how deep in the MOND regime a system is. That is, how far below the critical acceleration scale it is. The lower the acceleration, the larger the discrepancy. This is why LSB galaxies appear to be dark matter dominated; their low surface densities result in low accelerations.
For DF2, the absolute magnitude of the acceleration is approximately doubled by the presence of the external field. It is not as deep in the MOND regime as assumed in the isolated case, so the mass discrepancy is smaller, decreasing the MOND-predicted velocity dispersion by roughly the square root of 2. For a factor of 2 range in the stellar mass-to-light ratio (as in McGaugh & Milgrom), this crude MOND prediction becomes
σ = 14 ± 4 km/s.
Like any erstwhile theorist, I reserve the right to modify this prediction granted more elaborate calculations, or new input data, especially given the uncertainties in the distance and mass of the host. Indeed, we should consider the possibility of tidal disruption, which can happen in MOND more readily than with dark matter. Indeed, at one point I came very close to declaring MOND dead because the velocity dispersions of the ultrafaint dwarf galaxies were off, only realizing late in the day that MOND actually predicts that these things should be getting tidally disrupted (as is also expected, albeit somewhat differently, in ΛCDM), so that the velocity dispersions might not reflect the equilibrium expectation.
In DF2, the external field almost certainly matters. Barring wild errors of the sort discussed or unforeseen, I find it hard to envision the MONDian velocity dispersion falling outside the range 10 – 18 km/s. This is not as high as the 20 km/s predicted by van Dokkum et al. for an isolated object, nor as small as they measure for DF2 (8.4 km/s). They quote a 90% confidence upper limit of 10 km/s, which is marginally consistent with the lower end of the prediction (corresponding to M/L = 1). So we cannot exclude MOND based on these data.
That said, the agreement is marginal. Still, 90% is not very high confidence by scientific standards. Based on experience with such data, this likely overstates how well we know the velocity dispersion of DF2. Put another way, I am 90% confident that when better data are obtained, the measured velocity dispersion will increase above the 10 km/s threshold.
More generally, experience has taught me three things:
In matters of particle physics, do not bet against the Standard Model.
In matters cosmological, do not bet against ΛCDM.
In matters of galaxy dynamics, do not bet against MOND.
The astute reader will realize that these three assertions are mutually exclusive. The dark matter of ΛCDM is a bet that there are new particles beyond the Standard Model. MOND is a bet that what we call dark matter is really the manifestation of physics beyond General Relativity, on which cosmology is based. Which is all to say, there is still some interesting physics to be discovered.
A subject of long-standing interest in extragalactic astronomy is how stars form in galaxies. Some galaxies are “red and dead” – most of their stars formed long ago, and have evolved as stars will: the massive stars live bright but short lives, leaving the less massive ones to linger longer, producing relatively little light until they swell up to become red giants as they too near the end of their lives. Other galaxies, including our own Milky Way, made some stars in the ancient past and are still actively forming stars today. So what’s the difference?
The difference between star forming galaxies and those that are red and dead turns out to be both simple and complicated. For one, star forming galaxies have a supply of cold gas in their interstellar media, the fuel from which stars form. Dead galaxies have very little in the way of cold gas. So that’s simple: star forming galaxies have the fuel to make stars, dead galaxies don’t. But why that difference? That’s a more complicated question I’m not going to begin to touch in this post.
One can see current star formation in galaxies in a variety of ways. These usually relate to the ultraviolet (UV) photons produced by short-lived stars. Only O stars are hot enough to produce the ionizing radiation that powers the emission of HII (pronounced `H-two’) regions – regions of ionized gas that are like cosmic neon lights. O stars power HII regions but live less than 10 million years. That’s a blink of the eye on the cosmic timescale, so if you see HII regions, you know stars have formed recently enough that the short-lived O stars are still around.
Measuring the intensity of the Hα Balmer line emission provides a proxy for the number of UV photons that ionize the gas, which in turn basically counts the number of O stars that produce the ionizing radiation. This number, divided by the short life-spans of O stars, measures the current star formation rate (SFR).
There are many uncertainties in the calibration of this SFR: how many UV photons do O stars emit? Over what time span? How many of these ionizing photons are converted into Hα, and how many are absorbed by dust or manage to escape into intergalactic space? For every O star that comes and goes, how many smaller stars are born along with it? This latter question is especially pernicious, as most stellar mass resides in small stars. The O stars are only the tip of the iceberg; we are using the tip to extrapolate the size of the entire iceberg.
Astronomers have obsessed over these and related questions for a long time. See, for example, the review by Kennicutt & Evans. Suffice it to say we have a surprisingly decent handle on it, and yet the systematic uncertainties remain substantial. Different methods give the same answer to within an order of magnitude, but often differ by a factor of a few. The difference is often in the mass spectrum of stars that is assumed, but even rationalizing that to the same scale, the same data can be interpreted to give different answers, based on how much UV we estimate to be absorbed by dust.
In addition to the current SFR, one can also measure the stellar mass. This follows from the total luminosity measured from starlight. Many of the same concerns apply, but are somewhat less severe because more of the iceberg is being measured. For a long time we weren’t sure we could do better than a factor of two, but this work has advanced to the point where the integrated stellar masses of galaxies can be estimated to ~20% accuracy.
A diagram that has become popular in the last decade or so is the so-called star forming main sequence. This name is made in analogy with the main sequence of stars, the physics of which is well understood. Whether this is an appropriate analogy is debatable, but the terminology seems to have stuck. In the case of galaxies, the main sequence of star forming galaxies is a plot of star formation rate against stellar mass.
The star forming main sequence is shown in the graph below. It is constructed from data from the SINGS survey (red points) and our own work on dwarf low surface brightness (LSB) galaxies (blue points). Each point represents one galaxy. Its stellar mass is determined by adding up the light emitted by all the stars, while the SFR is estimated from the Hα emission that traces the ionizing UV radiation of the O stars.
The data show a nice correlation, albeit with plenty of intrinsic scatter. This is hardly surprising, as the two axes are not physically independent. They are measuring different quantities that trace the same underlying property: star formation over different time scales. The y-axis is a measure of the quasi-instantaneous star formation rate; the x-axis is the SFR integrated over the age of the galaxy.
Since the stellar mass is the time integral of the SFR, one expects the slope of the star forming main sequence (SFMS) to be one. This is illustrated by the diagonal line marked “Hubble time.” A galaxy forming stars at a constant rate for the age of the universe will fall on this line.
The data for LSB galaxies scatter about a line with slope unity. The best-fit line has a normalization a bit less than that of a constant SFR for a Hubble time. This might mean that the galaxies are somewhat younger than the universe (a little must be true, but need not be much), have a slowly declining SFR (an exponential decline with an e-folding time of a Hubble time works well), or it could just be an error in the calibration of one or both axes. The systematic errors discussed above are easily large enough to account for the difference.
To first order, the SFR in LSB galaxies is constant when averaged over billions of years. On the millions of years timescale appropriate to O stars, the instantaneous SFR bounces up and down. Looks pretty stochastic: galaxies form stars at a steady average rate that varies up and down on short timescales.
Short-term fluctuations in the SFR explain the data with current SFR higher than the past average. These are the points that stray into the gray region of the plot, which becomes increasingly forbidden towards the top left. This is because galaxies that form stars so fast for too long will build up their entire stellar mass in the blink of a cosmic eye. This is illustrated by the lines marked as 0.1 and 0.01 of a Hubble time. A galaxy above these lines would make all their stars in < 2 Gyr; it would have had to be born yesterday. No galaxies reside in this part of the diagram. Those that approach it are called “starbursts:” they’re forming stars at a high specific rate (relative to their mass) but this is presumably a brief-lived phenomenon.
Note that the most massive of the SINGS galaxies all fall below the extrapolation of the line fit to the LSB galaxies (dotted line). The are forming a lot of stars in an absolute sense, simply because they are giant galaxies. But the current SFR is lower than the past average, as if they were winding down. This “quenching” seems to be a mass-dependent phenomenon: more massive galaxies evolve faster, burning through their gas supply before dwarfs do. Red and dead galaxies have already completed this process; the massive spirals of today are weary giants that may join the red and dead galaxy population in the future.
One consequence of mass-dependent quenching is that it skews attempts to fit relations to the SFMS. There are very many such attempts in the literature; these usually have a slope less than one. The dashed line in the plot above gives one specific example. There are many others.
If one looks only at the most massive SINGS galaxies, the slope is indeed shallower than one. Selection effects bias galaxy catalogs strongly in favor of the biggest and brightest, so most work has been done on massive galaxies with M* > 1010 M☉. That only covers the top one tenth of the area of this graph. If that’s what you’ve got to work with, you get a shallow slope like the dashed line.
The dashed line does a lousy job of extrapolating to low mass. This is obvious from the dwarf galaxy data. It is also obvious from the simple mathematical considerations outlined above. Low mass galaxies could only fall on the dashed line if they were born yesterday. Otherwise, their high specific star formation rates would over-produce their observed stellar mass.
Despite this simple physical limit, fits to the SFMS that stray into the forbidden zone are ubiquitous in the literature. In addition to selection effects, I suspect the calibrations of both SFR and stellar mass are in part to blame. Galaxies will stray into the forbidden zone if the stellar mass is underestimated or the SFR is overestimated, or some combination of the two. Probably both are going on at some level. I suspect the larger problem is in the SFR. In particular, it appears that many measurements of the SFR have been over-corrected for the effects of dust. Such a correction certainly has to be made, but since extinction corrections are exponential, it is easy to over-do. Indeed, I suspect this is why the dashed line overshoots even the bright galaxies from SINGS.
This brings us back to the terminology of the main sequence. Among stars, the main sequence is defined by low mass stars that evolve slowly. There is a turn-off point, and an associated mass, where stars transition from the main sequence to the sub giant branch. They then ascend the red giant branch as they evolve.
If we project this terminology onto galaxies, the main sequence should be defined by the low mass dwarfs. These are nowhere near to exhausting their gas supplies, so can continue to form stars far into the future. They establish a star forming main sequence of slope unity because that’s what the math says they must do.
Most of the literature on this subject refers to massive star forming galaxies. These are not the main sequence. They are the turn-off population. Massive spirals are near to exhausting their gas supply. Star formation is winding down as the fuel runs out.
Red and dead galaxies are the next stage, once star formation has stopped entirely. I suppose these are the red giants in this strained analogy to individual stars. That is appropriate insofar as most of the light from red and dead galaxies is produced by red giant stars. But is this really they right way to think about it? Or are we letting our terminology get the best of us?
Virginia, your little friends are wrong. They have been affected by the skepticism of a skeptical age. They do not believe except they see. They think that nothing can be which is not comprehensible by their little minds. All minds, Virginia, whether they be men’s or children’s, are little. In this great universe of ours man is a mere insect, an ant, in his intellect, as compared with the boundless world about him, as measured by the intelligence capable of grasping the whole of truth and knowledge.
Yes, Virginia, there is a Dark Matter. It exists as certainly as squarks and sleptons and Higgsinos exist, and you know that they abound and give to your life its highest beauty and joy. Alas! how dreary would be the world if there were no Dark Matter. It would be as dreary as if there were no supersymmetry. There would be no childlike faith then, no papers, no grants to make tolerable this existence. We should have no enjoyment, except in observation and experiment. The eternal light with which childhood fills the world would be extinguished.
Not believe in Dark Matter! You might as well not believe in Dark Energy! You might get the DOE to hire men to watch in all the underground laboratories to catch Dark Matter, but even if they did not see Dark Matter coming down, what would that prove? Nobody sees Dark Matter, but that is no sign that there is no Dark Matter. The most real things in the world are those that neither children nor men can see. Did you ever see fairies dancing on the lawn? Of course not, but that’s no proof that they are not there. Nobody can conceive or imagine all the wonders there are unseen and unseeable in the world.
You may tear apart the baby’s rattle and see what makes the noise inside, but there is a veil covering the unseen world which not the best experiment, nor even the united efforts of all the keenest experiments ever conducted, could tear apart. Only faith, fancy, poetry, love, romance, can push aside that curtain and view and picture the supernal beauty and glory beyond. Is it all real? Ah, Virginia, in all this world there is nothing else real and abiding.
No Dark Matter! Thank God! It exists, and it exists forever. A thousand years from now, Virginia, nay, ten times ten thousand years from now, it will continue to make glad the coffers of science.
I have not posted here in a while. This is mostly due to the fact that I have a job that is both engaging and demanding. I started this blog as a way to blow off steam, but I realized this mostly meant ranting about those fools at the academy! of whom there are indeed plenty. These are reality based rants, but I’ve got better things to do.
As it happens, I’ve come down with a bug that keeps me at home but leaves just enough energy to read and type, but little else. This is an excellent recipe for inciting a rant. Reading the Washington Post article on delayed gratification in children brings it on.
It is not really the article that gets me, let alone the scholarly paper on which it is based. I have not read the latter, and have no intention of doing so. I hope its author has thought through the interpretation better than is implied by what I see in the WaPo article. That is easy for me to believe; my own experience is that what academics say to the press has little to do with what eventually appears in the press – sometimes even inverting its meaning outright. (At one point I was quoted as saying that dark matter experimentalists should give up, when what I had said was that it was important to pursue these experiments to their logical conclusion, but that we also needed to think about what would constitute a logical conclusion if dark matter remains undetected.)
So I am at pains to say that my ire is not directed at the published academic article. In this case it isn’t even directed at the article in the WaPo, regardless of whether it is a fair representation of the academic work or not. My ire is directed entirely at the interpretation of a single graph, which I am going to eviscerate.
The graph in question shows the delay time measured in psychology experiments over the years. It is an attempt to measure self-control in children. When presented with a marshmallow but told they may have two marshmallows if they wait for it, how long can they hold out? This delayed gratification is thought to be a measure of self-control that correlates positively with all manners of subsequent development. Which may indeed be true. But what can we learn from this particular graph?
The graph plots the time delay measured from different experiments against the date of the experiment. Every point (plotted as a marshmallow – cute! I don’t object to that) represents an average over many children tested at that time. Apparently they have been “corrected” to account for the age of the children (one gets better at delayed gratification as one matures) which is certainly necessary, but it also raises a flag. How was the correction made? Such details can matter.
However, my primary concern is more basic. Do the data, as shown, actually demonstrate a trend?
To answer this question for yourself, the first thing you have to be able to do is mentally remove the line. That big black bold line that so nicely connects the dots. Perhaps it is a legitimate statistical fit of some sort. Or perhaps it is boldface to [mis]guide the eye. Doesn’t matter. Ignore it. Look at the data.
The first thing I notice about the data are the outliers – in this case, 3 points at very high delay times. These do not follow the advertised trend, or any trend. Indeed, they seem in no way related to the other data. It is as if a different experiment had been conducted.
When confronted with outlying data, one has a couple of choices. If we accept that these data are correct and from the same experiment, then there is no trend: the time of delayed gratification could be pretty much anything from a minute to half an hour. However, the rest of the data do clump together, so the other option is that these outliers are not really representing the same thing as the rest of the data, and should be ignored, or at least treated with less weight.
The outliers may be the most striking part of the data set, but they are usually the least important. There are all sorts of statistical measures by which to deal with them. I do not know which, if any, have been applied. There are no error bars, no boxes representing quartiles or some other percentage spanned by the data each point represents. Just marshmallows. Now I’m a little grumpy about the cutesy marshmallows. All marshmallows are portrayed as equal, but are some marshmallows more equal than others? This graph provides no information on this critical point.
In the absence of any knowledge about the accuracy of each marshmallow, one is forced to use one’s brain. This is called judgement. This can be good or bad. It is possible to train the brain to be a good judge of these things – a skill that seems to be in decline these days.
What I see in the data are several clumps of points (disregarding the outliers). In the past decade there are over a dozen points all clumped together around an average of 8 minutes. That seems like a pretty consistent measure of the delayed gratification of the current generation of children.
Before 2007, the data are more sparse. There are a half a dozen points on either side of 1997. These have a similar average of 7 or 8 minutes.
Before that there are very little data. What there is goes back to the sixties. One could choose to see that as two clumps of three points, or one clump of six points. If one does the latter, the mean is around 5 minutes. So we had a “trend” of 5 minutes circa 1970, 7 minutes circa 1997, and 8 minutes circa 2010. That is an increase over time, but it is also a tiny trend – much less persuasive than the heavy solid line in the graph implies.
If we treat the two clumps of three separately – as I think we should, since they sit well apart from each other – then we have to choose which to believe. They aren’t consistent. The delay time in 1968 looks to have an average of two minutes; in 1970 it looks to be 8 minutes. So which is it?
According to the line in the graph, we should believe the 1968 data and not the 1970 data. That is, the 1968 data fall nicely on the line, while the 1970 data fall well off it. In percentage terms, the 1970 data are as far from the trend as the highest 2010 point that we rejected as an outlier.
When fitting a line, the slope of the line can be strongly influence by the points at its ends. In this case, the earliest and the latest data. The latest data seem pretty consistent, but the earliest data are split. So the slope depends entirely on which clump of three early points you choose to believe.
If we choose to believe the 1970 clump, then the “trend” becomes 8 minutes in 1970, 7 minutes in 1997, 8 minutes in 2010. Which is to say, no trend at all. Try disregarding the first three (1968) points and draw your own line on this graph. Without them, it is pretty flat. In the absence of error bars and credible statistics, I would conclude that there is no meaningful trend present in the data at all. Maybe a formal fit gives a non-zero slope, but I find it hard to believe it is meaningfully non-zero.
None of this happens in a vacuum. Lets step back and apply some external knowledge. Have people changed over the 5 decades of my life?
The contention of the WaPo article is that they have. Specifically, contrary to the perception that iPhones and video games have created a generation with a cripplingly short attention span (congrats if you made it this far!), in fact the data show the opposite. The ability of children to delay gratification has improved over the time these experiments have been conducted.
What does the claimed trend imply? If we take it literally, then extrapolating back in time, the delay time goes to zero around 1917. People in the past must have been completely incapable of delaying gratification for even an instant. This was a power our species only developed in the past century.
I hope that sounds implausible. If there is no trend, which is what the data actually show, then children a half century ago were much the same as children a generation ago are much the same as the children of today. So the more conservative interpretation of the graph would be that human nature is rather invariant, at least as indicated by the measure of delayed gratification in children.
Sadly, null results are dull. There well may be a published study reporting no trend, but it doesn’t get picked up by the Washington Post. Imagine the headline: “Children today are much the same as they’ve always been!” Who’s gonna click on that? In this fashion, even reputable news sources contribute to the scourge of misleading science and fake news that currently pollutes our public discourse.
This sort of over-interpretation of weak trends is rife in many fields. My own, for example. This is why I’m good at spotting them. Fortunately, screwing up in Astronomy seldom threatens life and limb.
Then there is Medicine. My mother was a medical librarian; I occasionally browsed their journals when waiting for her at work. Graphs for the efficacy of treatments that looked like the marshmallow graph were very common. Which is to say, no effect was in evidence, but it was often portrayed as a positive trend. They seem to be getting better lately (which is to say, at some point in the not distant past some medical researchers were exposed to basic statistics), but there is an obvious pressure to provide a treatment, even if the effect of the available course of treatment is tiny. Couple that to the aggressive marketing of drugs in the US, and it would not surprise me if many drugs have been prescribed based on efficacy trends weaker than seen in the marshmallow graph. See! There is a line with a positive slope! It must be doing some good!
Another problem with data interpretation is in the corrections applied. In the case of marshmallows, one must correct for the age of the subject: an eight year old can usually hold out longer than a toddler. No doubt there are other corrections. The way these are usually made is to fit some sort of function to whatever trend is seen with age in a particular experiment. While that trend may be real, it also has scatter (I’ve known eight year olds who couldn’t out wait a toddler), which makes it dodgy to apply. Do all experiments see the same trend? It is safe to apply the same correction to all of them? Worse, it is often necessary to extrapolate these corrections beyond where they are constrained by data. This is known to be dangerous, as the correction can become overlarge upon extrapolation.
It would not surprise me if the abnormally low points around 1968 were over-corrected in some way. But then, it was the sixties. Children may have not changed much since then, but the practice of psychology certainly has. Lets consider the implications that has for comparing 1968 data to 2017 data.
The sixties were a good time for psychological research. The field had grown enormously since the time of Freud and was widely respected. However, this was also the time when many experimental psychologists thought psychotropic drugs were a good idea. Influential people praised the virtues of LSD.
My father was a grad student in psychology in the sixties. He worked with swans. One group of hatchlings imprinted on him. When they grew up, they thought they should mate with people – that’s what their mom looked like, after all. So they’d and make aggressive displays towards any person (they could not distinguish human gender) who ventured too close.
He related the anecdote of a colleague who became interested in the effect of LSD on animals. The field was so respected at the time that this chap was able to talk the local zoo into letting him inject an elephant with LSD. What could go wrong?
Perhaps you’ve heard the expression “That would have killed a horse! Fortunately, you’re not a horse.” Well, the fellow in question figured elephants were a lot bigger than people. So he scaled up the dose by the ratio of body mass. Not, say, the ratio of brain size, or whatever aspect of the metabolism deals with LSD.
That’s enough LSD to kill an elephant.
Sad as that was for the elephant, who is reputed to have been struck dead pretty much instantly – no tripping rampage preceded its demise – my point here is that these were the same people conducting the experiments in 1968. Standards were a little different. The difference seen in the graph may have more to do with differences in the field than with differences in the subjects.
That is not to say we should simply disregard old data. The date on which an observation is made has no bearing on its reliability. The practice of the field at that time does.
The 1968 delay times are absurdly low. All three are under four minutes. Such low delay times are not reproduced in any of the subsequent experiments. They would be more credible if the same result were even occasionally reproduced. It ain’t.
Another way to look at this is that there should be a comparable number of outliers on either side of the correct trend. That isn’t necessarily true – sometimes systematic errors push in a single direction – but in the absence of knowledge of such effects, one would expect outliers on both the high side and the low side.
In the marshmallow graph, with the trend as drawn, there are lots of outliers on the high side. There are none on the low side. [By outlier, I mean points well away from the trend, not just scattered a little to one side or the other.]
If instead we draw a flat line at 7 or 8 minutes, then there are three outliers on both sides. The three very high points, and the three very low points, which happen to occur around 1968. It is entirely because the three outliers on the low side happen at the earliest time that we get even the hint of a trend. Spread them out, and they would immediately be dismissed as outliers – which is probably what they are. Without them, there is no significant trend. This would be the more conservative interpretation of the marshmallow graph.
Perhaps those kids in 1968 were different in other ways. The experiments were presumably conducted in psychology departments on university campuses in the late sixties. It was OK to smoke inside back then, and not everybody restricted themselves to tobacco in those days. Who knows how much second hand marijuana smoke was inhaled just to getting to the test site? I jest, but the 1968 numbers might just measure the impact on delayed gratification when the subject gets the munchies.
It has been twenty years since we coined the phrase NFW halo to describe the cuspy halos that emerge from dark matter simulations of structure formation. Since that time, observations have persistently contradicted this fundamental prediction of the cold dark matter cosmogony. There have, of course, been some theorists who cling to the false hope that somehow it is the data to blame and not a shortcoming of the model.
That this false hope has persisted in some corners for so long is a tribute to the power of ideas over facts and the influence that strident personalities wield over the sort objective evaluation we allegedly value in science. This history is a bit like this skit by Arsenio Hall. Hall is pestered by someone calling, demanding Thelma. Just substitute “cusps” for “Thelma” and that pretty much sums it up.
All during this time, I have never questioned the results of the simulations. While it is a logical possibility that they screwed something up, I don’t think that is likely. Moreover, it is inappropriate to pour derision on one’s scientific colleagues just because you disagree. Such disagreements are part and parcel of the scientific method. We don’t need to be jerks about it.
But some people are jerks about it. There are some – and merely some, certainly not all – theorists who make a habit of pouring scorn on the data for not showing what they want it to show. And that’s what it really boils down to. They’re so sure that their models are right that any disagreement with data must be the fault of the data.
This has been going on so long that in 1996, George Efstathiou was already making light of it in his colleagues, in the form of the Frenk Principle:
“If the Cold Dark Matter Model does not agree with observations, there must be physical processes, no matter how bizarre or unlikely, that can explain the discrepancy.”
There are even different flavors of the Strong Frenk Principle:
1: “The physical processes must be the most bizarre and unlikely.”
2: “If we are incapable of finding any physical processes to explain the discrepancy between CDM models and observations, then observations are wrong.”
In the late ’90s, blame was frequently placed on beam smearing. The resolution of 21 cm data cubes at that time was typically 13 to 30 arcseconds, which made it challenging to resolve the shape of some rotation curves. Some but not all. Nevertheless, beam smearing became the default excuse to pretend the observations were wrong.
This persisted for a number of years, until we obtained better data – long slit optical spectra with 1 or 2 arcsecond resolution. These data did show up a few cases where beam smearing had been a legitimate concern. It also confirmed the rotation curves of many other galaxies where it had not been.
So they made up a different systematic error. Beam smearing was no longer an issue, but longslit data only gave a slice along the major axis, not the whole velocity field. So it was imagined that we observers had placed the slits in the wrong place, thereby missing the signature of the cusps.
This was obviously wrong from the start. It boiled down to an assertion that Vera Rubin didn’t know how to measure rotation curves. If that were true, we wouldn’t have dark matter in the first place. The real lesson of this episode was to never underestimate the power of cognitive dissonance. People believed one thing about the data quality when it agreed with their preconceptions (rotation curves prove dark matter!) and another when it didn’t (rotation curves don’t constrain cusps!)
So, back to the telescope. Now we obtained 2D velocity fields at optical resolution (a few arcseconds). When you do this, there is no where for a cusp to hide. Such a dense concentration makes a pronounced mark on the velocity field.
To give a real world example (O’Neil et. al 2000; yes, we could already do this in the previous millennium), here is a galaxy with a cusp and one without:
It is easy to see the signature of a cusp in a 2D velocity field. You can’t miss it. It stands out like a sore thumb.
The absence of cusps is typical of dwarf and low surface brightness galaxies. In the vast majority of these, we see approximately solid body rotation, as in UGC 12695. This is incredibly reproducible. See, for example, the case of UGC 4325 (Fig. 3 of Bosma 2004), where six independent observations employing three distinct observational techniques all obtain the same result.
There are cases where we do see a cusp. These are inevitably associated with a dense concentration of stars, like a bulge component. There is no need to invoke dark matter cusps when the luminous matter makes the same prediction. Worse, it becomes ambiguous: you can certainly fit a cuspy halo by reducing the fractional contribution of the stars. But this only succeeds by having the dark matter mimic the light distribution. Maybe such galaxies do have cuspy halos, but the data do not require it.
All this was settled a decade ago. Most of the field has moved on, with many theorists trying to simulate the effects of baryonic feedback. An emerging consensus is that such feedback can transform cusps into cores on scales that matter to real galaxies. The problem then moves to finding observational tests of feedback: does it work in the real universe as it must do in the simulations in order to get the “right” result?
Not everyone has kept up with the times. A recent preprint tries to spin the story that non-circular motions make it hard to obtain the true circular velocity curve, and therefore we can still get away with cusps. Like all good misinformation, there is a grain of truth to this. It can indeed be challenging to get the precisely correct 1D rotation curve V(R) in a way that properly accounts for non-circular motions. Challenging but not impossible. Some of the most intense arguments I’ve had have been over how to do this right. But these were arguments among perfectionists about details. We agreed on the basic result.
High quality data paint a clear and compelling picture. The data show an incredible amount of order in the form of Renzo’s rule, the Baryonic Tully-Fisher relation, and the Radial Acceleration Relation. Such order cannot emerge from a series of systematic errors. Models that fail to reproduce these observed relations can be immediately dismissed as incorrect.
The high degree of order in the data has been known for decades, and yet many modeling papers simply ignore these inconvenient facts. Perhaps the authors of such papers are simply unaware of them. Worse, some seem to be fooling themselves through the liberal application of the Frenk’s Principle. This places a notional belief system (dark matter halos must have cusps) above observational reality. This attitude has more in common with religious faith than with the scientific method.