Skip to main content


  • Research article
  • Open Access

On trends and patterns in macroevolution: Williston’s law and the branchiostegal series of extant and extinct osteichthyans

  • 1,
  • 2,
  • 3 and
  • 4Email author
BMC Evolutionary Biology201919:117

  • Received: 6 November 2017
  • Accepted: 13 May 2019
  • Published:



The branchiostegal series consists of an alignment of bony elements in the posterior portion of the skull of osteichthyan vertebrates. We trace the evolution of the number of elements in a comprehensive survey that includes 440 extant and 66 extinct species. Using a newly updated actinopterygian tree in combination with phylogenetic comparative analyses, we test whether osteichthyan branchiostegals follow an evolutionary trend under ‘Williston’s law’, which postulates that osteichthyan lineages experienced a reduction of bony elements over time.


We detected no overall macroevolutionary trend in branchiostegal numbers, providing no support for ‘Williston’s law’. This result is robust to the subsampling of palaeontological data, but the estimation of the model parameters is much more ambiguous.


We find substantial evidence for a macroevolutionary dynamic favouring an ‘early burst’ of trait evolution over alternative models. Our study highlights the challenges of accurately reconstructing macroevolutionary dynamics even with large amounts of data about extant and extinct taxa.


  • Phylogeny
  • Palaeontology
  • Williston’s law
  • Evolutionary trend
  • Early burst


The evolution of the number of skull bones has been postulated to follow a general trend towards the reduction in the number of individual parts, resulting from losses and fusions of bones. This simplification trend is known as “Williston’s law” [1], and it has recently been studied most intensively in tetrapod dermal skull bones [24]. For instance, a study of tetrapod skulls documented the systematic loss of bones connected to few other bones during evolution [3], emphasizing the importance of networks during growth and adult geometry [4]. In synapsid stem-mammalian lineages, a pattern of reduction in the number of skull and lower jaw bones (through either loss or fusion) during approximately 150 million years has also been described [2].

Here, we evaluate the hypothesis that the elements of a meristic series of skull bones of osteichthyians, the branchiostegal ray series (BRS), followed Williston’s law. This hypothesis was first postulated by McAllister, based on a comprehensive study of the variation of the BRS in osteichthyans [5]. The BRS consists of long struts of dermal bone that form a series of elements covering the gills together with the opercular bone series ([6]; Fig. 1). The shape, relative size and the number of elements in the meristic BRS series are highly variable across osteichthyans [79]; Fig. 2), with fusions and losses documented in both extant and extinct species (e.g. [1012]). While the BSR is absent in the extant species of sarcopterygians that we surveyed, the structures are well-characterized in actinopterygians or ray-finned fishes, an extraordinarily diverse group that comprises roughly half of the extant vertebrate diversity. The branchiostegal rays are mostly linked to ventilatory function, with a more prominent suction pump being coupled with a larger number of rays [13, 14]. The BRS is thus part of the buccal pump, a structure that is thought to have played a major role in the evolutionary radiation of actinopterygians [15, 16]. Branchiostegal rays can be highly variable in number and shape within species (Fig. 3; [5]), as is the case with Oncorrhynchus nerka, where documented variation is in the range of 10 to 20 elements. High intraspecific variation is also observed in some fossils such as Discoserra pectinodon [26, 27]). It has been suggested that stress can induce intra-specific variation of BRS [28], and asymmetries in left and right BRS counts have been observed in some species (e.g., the bonefish Albula vulpes; see [5], p. 36). Variation in the number of rays, however, is not uniformly distributed and clade-specific patterns have been documented [29, 30].
Fig. 1
Fig. 1

Skull of the Devonian actinopterygian Cheirolepis trailli in lateral (a), anterior (b), and ventral (c) view (after [59]). Opercular/ branchiostegal series in red outlines with branchiostegal rays in light red fill. The pattern of these bones in this stem-actinopterygian may be considered the basic actinopterygian pattern. The elements in this succession include the operculum, suboperculum, branchiostegal rays, and gulars. Some authors include also Dh dermohyale, aOp accessory operculum, Pop preoperculum (and other absent bones here) in the series, while others exclude the gulars from it (for references see text). Any of these elements may be missing, hence the synonymous names ‘opercular series’, ‘branchiostegal series’, ‘operculo-branchiostegal series’, and ‘operculo-gular series’. Op, operculum; Sop, suboperculum; Br, branchiostegal rays; lG, lateral gular; mG medial gular

Fig. 2
Fig. 2

Diversity of the opercular/branchiostegal series (in red outlines; branchiostegal rays in light red fill) in osteichthyans (skulls in left lateral view); aDialipina (Devonian; [60]). The region of the cheek and the gill cover is studded with multiple bony plates that makes it impossible to delineate an opercular/branchiostegal series [56]. b Guiyu (Silurian; [32]) showing the “standard pattern” of the opercular/branchiostegal series, including (from dorsal to ventral) operculum, suboperculum, a number of branchiostegal rays, and gular. c The recent paddlefish Polydon ([5]) without operculum, the larger bone being the suboperculum and the smaller one a single branchiostegal ray. d Saurichthys (Triassic; [11]) with a single element, the suboperculum. e The gar Lepisosteus ([5]) with operculum and suboperculum and three branchiostegals. f The zebrafish Danio rerio ([61]); as in all cypriniforms its opercular series consists of three elements. g The salmon Salmo ([6]), with variable number of branchiostegal rays (9–13), even within the same species. h The Australian lungfish Neoceratodus ([5]), with a small suboperculum and no branchiostegal rays. Elements from the opercular series may be missing (e.g. the operculum and the gulars in paddlefish, the branchiostegals in lungfish, all elements in saccopharyngiforms ([62] not shown in Fig. 2)

Fig. 3
Fig. 3

Phylogenetic distribution of mean branchiostegal ray numbers (left), and histograms of the species mean and range of branchiostegal numbers (right). The nodal ancestral values were reconstructed under the EB model, and interpolated along branches using the contMap function of the R package phytools v. 0.6 [63, 64]. Branch lengths are proportional to time. The silhouettes show the approximate position of selected clades. The age of the root is 443 Ma. Note that the intraspecific variation in the number of branchiostegals is probably underreported

Major advances in resolving the phylogeny of actinopterygians [17, 18], and the development of statistical approaches for hypothesis-testing and model-fitting in comparative biology [1921] provide an avenue to examine BRS count evolution. Comparative approaches have the potential to deliver major insights into the origin of species diversity and morphological disparity or evolutionary patterns in general, particularly when data from fossils and extant species are analysed in a unified phylogenetic framework [2225]. Much of the information on the anatomy of ostheichytians is richly documented in works that can now be mined to examine how character complexes have evolved.

To test whether the evolution of the BRS conforms to Williston’s law, we used an integrative phy`logenetic framework including placement for about one thousand eight hundred extant and extinct species coupled with osteological data from about five hundred species. Two phylogenetic comparative approaches were implemented. The first approach consists of a comparison of the fit of models that can incorporate a tendency towards reduction in the count of BRS (i.e., Brownian Motion with a trend and Ornstein-Uhlenbeck) with that of other macroevolutionary models for continuous traits (e.g., Brownian Motion, without trend Early Burst, and white noise). The second approach comprises likelihood ratio tests between models that assume either symmetrical or asymmetrical transition rates across discrete character states. We further explore the effects of paleontological data on model fitting through taxon subsampling tests. Previous studies on evolutionary patterns have shown that neontological studies find the early burst model to be rare [31], whereas the palaeontological literature suggests this model may be more common [25]. Using a solid phylogenetic framework that integrates neontological and paleontological data, our study offers a rigorous statistical analysis of the effects of data on fossils in reconstructing evolutionary models based on macroevolutionary patterns.


Compilation of BRS data

We collected branchiostegal count data for 600 taxa (mostly at the species level), 506 of which are represented in our reference tree. Most of the anatomical data were taken from McAllister [5], although several other sources were also consulted (see Additional file 1). In cases where the number of branchiostegals were not stated in written form, we relied on figures, but only if the branchiostegal series was fully labelled with distinct and countable elements. For taxa in which ranges of branchiostegal numbers are reported, we used the mean value rounded off to the nearest upper integer for the discrete Markov models (see below). For many fossils, the original sources reported the branchiostegal ray counts as a minimum value with an unknown maximum, or as a point estimate with an unspecified uncertainty range (e.g. “around 10”). To accommodate those uncertainties, we arbitrarily assigned taxa lacking data on intraspecific variation to a range of variation equal to ±1 (higher than the range observed in most taxa for which polymorphism is reported; Fig. 4), defining “soft minima” and “soft maxima” for branchiostegal count values.
Fig. 4
Fig. 4

Effect of fossil sample size on model support and parameter estimates. Random subsampling of the fossil data shows that model support (Akaike weight) for EB becomes overwhelming with just 7 sampled fossils (extinct taxa), but the relative support of other models only stabilizes at some point between 40 and 47 sampled fossils (left side). In contrast, the model parameter estimates (right side) do not seem to approach an asymptote as more fossils are added, except the adaptive optimum (θ1) of the OU model and, to a lesser extent, the rate of exponential decay (β) of the EB model. Note that we introduced a small horizontal displacement in the points in order to visually separate the various model series; the analyses were performed with the fossil sample sizes labelled on the horizontal scale, with no intermediate values. Also, the Brownian diffusion rate (σ2) is shown log-transformed in order to better accommodate the large range of values. The lines connect the medians between sample size categories; θ0 is the reconstructed number of branchiostegals at the root of the tree. The white noise is not shown here because it is strongly rejected by our results (see Table 1)

An anonymous referee suggested that there might be errors of homology in our data sources. While we did not attempt to validate the homology statements in every consulted publication, we see no reason to expect scattered homology errors to introduce systematic biases in our analyses. Throughout the paper we use the word “loss” of branchiostegal rays in the broad sense to refer to both losses, in the strict sense, as well as fusions. Gregory’s original formulation [1] and more recent works [2, 3] have considered the reduction in number described by Williston’s law to involve both types of processes.

Phylogeny with extant and extinct taxa

The phylogeny used for the analyses is based on the time-scaled supertree of 1841 species (1582 extant and 259 extinct species) [23]. For a few taxa that were not represented in the tree, we used the paleontological literature for phylogenetic placement. These include extinct sarcopterygians, such as Guiyu oneiros [32], Onychodus jandemarrai [33], Osteolepis macrolepidotus [33], Diplocercides [34], Rhabdoderma [34], Gyroptychius milleri [35], and Eusthenopteron foordi [35]. In cases where our sources gave BRS count data at the genus or family level, we assigned the BRS count value to one of the species in the corresponding clade in our reference phylogeny. This is a potential source of error due to taxonomic instability, but we expect such errors to have a limited effect considering the higher-level phylogenetic scale of our study.

Divergence times between the added extant taxa that were not sufficiently constrained by the fossil record were set according to recent molecular studies. This concerned chiefly extant dipnoans, following Heinicke et al. [36]. The inclusion of data on extinct taxa serves both to increase statistical power and to reveal details about the dynamics of trait evolution of the clades in question [22, 3739]. To maximize the inclusion of species for which we had information on branchiostegal counts in the phylogeny, we swapped tip values for 20 extinct congeneric species by assuming the monophyly of subtending genera. We also rescaled the tree, following a procedure similar to that of Betancur-R. et al. [23], in order to account for the stratigraphic ranges of the extinct species examined. For tree rescaling, we used the R package paleotree v. 2.7 [40], imposing a “minimum branch length” stratigraphic fit [41] of 1 My while maintaining the node ages estimated from the molecular clock analysis [42]. To take into account uncertainty in fossil ages, we generated 50 rescaled trees by randomly sampling the fossil tip ages from the span of the stratigraphic intervals in which the fossils were found (we found that sampling more than 50 trees made the analyses prohibitively slow). The 50 rescaled trees were pruned down from 1841 to 506 species for which BRS data are available.

Macroevolutionary analyses and model fitting

By treating the number of branchiostegals as a continuous trait, we first assessed the relative fit of models of trait evolution using the R packages mvMORPH v. 1.0.7 [43] and geiger v. 2.0.6 [44]. We fitted five models (Additional file 3: Figure S1): Brownian Motion (BM), Ornstein-Uhlenbeck (OU), Early Burst (EB), BM with a trend (called “drift” in geiger), and white noise. In the simplest BM model (Additional file 3: Figure S1a), the trait evolves stochastically along the branches of the tree, where the length of the branches represents evolutionary time. At any given time, the trait value can increase or decrease following a normal probability distribution centred on 0 with variance σ2. The variance of the trait increases indefinitely, but because the normal distribution is symmetrical, the mean trait value of all the species in the tree will oscillate around the initial value at the root of the tree (θ0). Without an overall trend in the value of the trait, traits evolving according to Williston’s law are expected to have a poorer fit to BM than to the trend or OU models (see below).

The OU model (Additional file 3: Figure S1b) is a modified BM process where the trait is drawn towards an optimum value (θ1). As the trait drifts farther away from the optimum value θ1, it experiences a stronger pull back to it. The magnitude of that pull is controlled by an attractor strength parameter α. If the trait value at the root θ0 is different from θ1, the mean trait value of all the lineages will tend to increase or decrease over time, eventually coming to oscillate around θ1 (unlike BM with a trend, where the trait value increases or decreases indefinitely). Thus, the OU model can describe a Williston’s law scenario when θ0 > θ1 and α is not very small.

Early burst (EB) is another variation of BM in which the rate of evolution decreases exponentially over time, adding an extra rate change parameter β (Additional file 3: Figure S1c). Such an exponential decrease in the rate of evolution is consistent with an evolutionary radiation, in which a clade diversifies quickly to occupy vacant niches, and then slows down as niche space is filled. The EB model describes no trend in the value of the trait.

The Brownian motion model with a trend (Additional file 3: Figure S1d; “trend model” hereafter) is a slight variation of the BM model, in which the mean of the normal distribution that describes the evolution of the trait is shifted by some amount corresponding to a trend parameter [38, 45]. Negative trend parameters describe an overall decrease in the mean trait value relative to the root, as stipulated by Williston’s law for the number of bony elements. Conversely, a positive trend parameter represents an overall increase and therefore could be interpreted as evidence against Williston’s law. In the trend model the mean trait value increases or decreases indefinitely.

Finally, the “white noise” model (Additional file 3: Figure S1e) is similar to the BM model, except that it ignores phylogenetic structure. In all the species the trait has the same common initial value at the root, but it then evolves independently for each species, ignoring phylogenetic covariance. Under this model, similar trait values in closely related species are purely coincidental. As with the BM model, the white noise model does not describe any trends. A trait with very low phylogenetic signal (e.g., due to extreme rates of evolution, or strong environmental or developmental effects) is expected to fit better the white noise model than any of the other models tested.

In order to facilitate the numerical approximations of the likelihood computations, we rescaled the branches to have a total tree height of 1. Model parameter estimates are reported in the same scale throughout the paper. While OU fitting did not converge using geiger, the likelihood scores and model parameter estimates obtained through mvMORPH and geiger were virtually identical (at least down to the third decimal). Therefore, we consider the outputs of these two programs to be readily comparable but report the results with geiger for the “white noise” model and mvMORPH for all the others (“white noise” is not available in mvMORPH).

In order to determine the impact of range polymorphisms, model fitting was also performed with both soft minima and maxima of branchiostegal counts. We compared the relative fit of the models using Akaike weights. The Akaike information criterion (AIC) is a heuristic founded on information theory that balances the goodness of fit of the data (likelihood scores) and the number of parameters in the model. The AIC expressed as Akaike weights indicates the relative support for each of the models being compared; models with greater Akaike weights are preferred because they offer better trade-off between goodness of fit and number of parameters. By definition, the Akaike weights of all the models considered add up to 1.

In an alternative model-fitting framework, we also tested the presence of an overall bias to the gain or loss of branchiostegals, treating the number of branchiostegal elements as a discrete trait. This was done using symmetric (MkS) and asymmetric (MkA) Markov models of discrete state transitions (e.g., [47]). Because the number of branchiostegals is a meristic trait, both models are ordered; i.e., the only state transitions allowed are gains or losses of a single branchiostegal ray at the time. In the asymmetric model, there are two parameters: rate of branchiostegal gain and rate of branchiostegal loss. The symmetric model has a single parameter, with both gains and losses sharing the same rate. Fitting the Markov models was conducted using the ace function of the R package ape [48], and the best-fitting model was determined via likelihood-ratio tests.

Assessing the robustness of model parameters: post-predictive simulations and jackknifing

The adequacy of the BM and EB models was further explored via post-predictive simulations with the R package arbutus v. 0.1 [46]. The arbutus package takes the phylogeny and the model parameters estimated from the original dataset and uses them to generate hundreds or thousands of simulated traits. The simulations are then used to compute a set of statistics that are compared to the empirical data. Finding statistics in the emprirical data that are outliers in the distribution of simulated data is indicative of poor adequacy of the model to represent certain features of the data.

We also used jackknifing to assess the impact of the size of the fossil and extant species samples in model support and parameter estimation (R code scripts are provided in an additional file [Additional file 2]). We performed two suites of jackknife analyses, one that incrementally removed all 66 fossils from the tree, and a second that incrementally removed up to 66 of the extant species examined (15% of the total 440 extant species in the tree). The incremental removals were done in 10 steps, and at each step we repeated the analysis 500 times randomly selecting different sets of species for removal.


The evolution in the number of branchiostegal rays fits best an ‘early burst’ pattern. This model has a mean Akaike weight of nearly 1, whereas all the other models combined have Akaike weights < 10− 43 over the 50 rescaled trees (Table 1). This indicates that the dominant feature in the macroevolutionary dynamics of the trait consists of variation in “tempo” (rate of evolution), more than biases in the directionality of the changes of the trait value (“mode”) that are the focus of this study. Among the other models, the “trend” model had a much lower support than BM (2.3–2.8x lower), and OU had by far the strongest AIC support (1014 times greater Akaike weight than BM). The relative support for the “white noise” model was practically null, with an Akaike weight over 30 orders of magnitude smaller than that of other models, corroborating the presence of significant phylogenetic signal in the data.
Table 1

Model parameters and support estimated on 50 rescaled trees (mean value ± standard deviation). White noise was fitted with geiger, OU was fitted with mvMORPH. For all the other models, the log-likelihood (lnL) and parameters of all other fitted models were practically identical between geiger and mvMORPH; the mvMORPH results are shown here. θ0, root state; trend, trend parameters. See the methods section for explanation of the other parameters










Akaike weight


5.55 ± 0.09

100.73 ± 7.23

− 1374.41 ± 17.90


2.45 × 10−53


5.80 ± 0.09

4708.20 ± 1453.58

−5.04 ± 0.30

− 1252.25 ± 16.82




5.18 ± 0.14

7.49 ± 0.17

141.84 ± 15.30

2.98 ± 0.44

− 1350.80 ± 12.42


5.80 × 10−44


5.60 ± 0.11

100.71 ± 7.24

−0.75 ± 0.62

−1374.35 ± 17.85


9.40 × 10−54

White noise

7.48 ± 0.00

18.05 ± 0.00

− 1449.92 ± 0.00


3.97 × 10−86

Among the models studied, “trend” and OU can be both indicative of the evolution of BRS under “Williston’s law.” The “trend” model allows an unbounded decrease in the expected trait value over time, while the OU model can also represent changing selecting regimes, in which a trait is initially subject to directional selection and then gradually shifts to stabilising selection as the trait value approaches its adaptive optimum (θ1). Such a shift in a selective regime would be conceivably more realistic than the “trend” model, as the number of branchiostegals has a natural lower bound of zero. In our results, while the trend parameter of the “trend” model has a negative sign, indicating that branchiostegal evolution shows a tendency toward the loss of elements, the fit of the better-supported OU model fails to corroborate such a pattern. Under OU, the adaptive optimum for the number of branchiostegals (θ1) is two or three elements more than the root state (θ0). However, the size of our fossil sample seems insufficient to allow us to estimate all parameters of the models reliably (see below). By and large, we failed to obtain strong relative support for a Williston-like dynamic by fitting continuous trait models; instead we discovered strong support for an early burst pattern of BRS evolution.

Mean Akaike weights and model parameter estimates were virtually identical whether we used the minimum, mean, or maximum counts of branchiostegal rays. However, these results were not as robust to jackknifing. As expected, our results show that fossil sample size (Fig. 4) had a greater effect than extant sample size (Additional file 4: Figure S2) on Akaike weights and model parameter estimations. The relative support of the models remains stable with the removal of up to 19 extinct taxa (29% of the total fossil sample). By contrast, jackknifing indicates that most model parameters do not seem to be near convergence as the size of the fossil subsample approaches 100%. Only the “adaptive optimum” θ1 parameter of the OU model, and to a lesser extent the exponential rate change parameter (β) of the EB model, seem to have reached a plateau with increasing number of sampled extinct taxa. Much worse, the Brownian diffusion parameter σ2 of the EB model and the attractor strength α of OU have clearly not approached the value that they would have with an exhaustive sample of extinct taxa in the complete sample (Fig. 4), which is contrary to what we expected based on similar analyses in previous studies (e.g., [49]: Fig. 4). From this, we conclude that although we can determine which models of evolution are better supported, our sampling of fossils is not comprehensive enough to allow a reliable characterisation of model parameters.

Taxa with more branchiostegals evolve faster, as shown by the Sasr test on the EB fit [46], which consists in regressing the absolute value of the phylogenetic independent contrasts against the corresponding nodal values. The correlation observed remains significant using either raw values or square-root transformed data (p < 00000.1), indicating that the effect observed is not artefactual. This means that it is easier to gain or lose one branchiostegal when several are present than when only one or two are present, which seems intuitive. Note that although we use the nomenclature from Pennell et al. [46], this test was implemented long before in the PDAP:PDTREE module of Mesquite [50].

Among the departures from the early burst model, there is a highly significant skew to the right, as shown by the DCDF test on the EB fit (p < 0.0001). This consists in performing a Kolmogorov-Smirnov test by comparing the distribution of phylogenetic independent contrasts to that of a normal distribution with a mean of 0 and a standard deviation equal to the square root of the mean squared contrasts [46]. This suggests that a few contrasts are much larger than the rest, likely reflecting a “jump-diffusion” process, in which occasional bursts of evolution occur. In our dataset, one such large contrast is found between the Late Carboniferous Tegeolepis clarki, which has 30 branchiostegals, and the slightly older Howqualepis rostridens, which has only 13. Another large contrast is between the Jurassic stem-teleost Pachycormus with 40 branchiostegals, and its sister-group (crown teleosts), whose ancestral state under BM is reconstructed at about 13 branchiostegals. The former of these examples indicating abrupt evolutionary changes might possibly reflect a suboptimal choice of branch lengths subtending extinct taxa, but the latter would be more difficult to explain by this factor given that Pachycormus has a long branch spanning the Early Permian to Middle Jurassic, and that its sister-group is an extant clade with a reasonably well-constrained age. It is important to note, however, that the results of the DCDF test remain unaffected after the removal of these few extinct taxa (T. clarki, H. rostridens, and Pachycormus), suggesting that the DCDF test is not sensitive to those changes alone.

The results from the jackknife analyses indicate that the root value, the adaptive optimum (θ1), and the trend parameter as functions of the number of sampled extinct taxa, all have a non-monotonic behaviour. For these three parameters (out of six studied here), the value obtained from the full fossil sample is closer to the estimate without fossils than it is to estimates based on reduced fossil samples (e.g. 7–27). In fact, the adaptive optimum estimated with few fossils is negative, which is nonsensical, as it is impossible to have a negative number of branchiostegals. These anomalous estimates may be due, in part, to the uneven distribution of sampled fossils across the phylogeny. Most of the fossils (68%) are concentrated in the region of the tree that spans early cladogenetic events (from sarcopterygians to stem teleosts). Given that the jackknife analyses were exhaustive (including 500 replicates for each number of extinct taxa removed), the results obtained were unexpected and are difficult to interpret (i.e., the procedure should have filtered out much of the random variation associated with subsampling).

Finally, when treating the number of branchiostegal rays as a discrete trait by fitting ordered Markov models, the results are in agreement with our previous analyses: a likelihood-ratio test fails to favour an asymmetric model over a symmetric one. This means that no evidence was found for a significant bias toward either loses or gains of branchiostegal elements (Additional file 5).


We found strong support for the EB model to describe the macroevolutionary pattern of branchiostegal ray numbers. Statistical support for this evolutionary model in empirical studies was initially elusive [31]. Harmon et al. [31] assessed the fit of three models (EB, BM, and Ornstein-Uhlenbeck) to body size (49 clades) and shape (39 clades) data. Of these, size and shape were hypothesized to have evolved according to the EB model in only a single clade, but this conclusion was weakly supported: the Akaike weight of the EB model was greater than that of the two other models, but less than 0.95. In contrast, the support for the EB model in our dataset is overwhelming; other models tested have negligible support (Akaike weight < 10− 43). A few previous studies have documented an EB pattern ([51]), but these did not incorporate information from the fossil record. Recent simulations showed a strong decrease in error in parameter estimation associated with incorporating fossils to such analyses [45, 52], but our results suggest that the amount and perhaps the distribution of the fossil data and possible empirical deviations from the models are also important. In this respect, our dataset is not ideal, though the Akaike weights show that our results represent one of the strongest support reported for the EB relative to BM and OU models in empirical studies.

We found no support in our dataset for Williston’s law – a trend towards the reduction in the number of bony elements over time. Our results were robust to analyses treating branchiostegal elements as either continuous or discrete via Markov models. The “trend” model was only the fourth best-supported model, far behind the EB and slightly behind the BM and OU models. The interpretation of a comparatively higher support for the OU model (relative to non-EB models) is more challenging. The fact that the estimated adaptive optimum for this model is higher than the root value (by two or three branchiostegals) is in direct contradiction with the pattern expected by Williston’s law (i.e., it shows a slight increase over time rather than a reduction). However, it should be noted that while the higher support for OU over BM can be suggestive of the presence of biases in the mode of evolution, the OU model as fitted here is unrealistic. This results from the imposition of a uniform adaptive regime, with a single adaptive optimum over the entire osteichthyan phylogeny, an assumption that is likely violated given the extraordinary diversity of osteichthyans in terms of form, function, and ecology. It seems more likely that trends toward the reduction of skeletal elements may characterize some groups of osteichthyans, but not the clade as a whole. Indeed, while McAllister [5] stated that branchiostegal ray evolution follows Williston’s law as “teleostome”-wide phenomenon, he also noted that the apparent trend towards reduction was more evident in certain groups (e.g. “palaeoniscids”, a paraphyletic group of early actinoptegyrigians), and associated to deep-sea habitats, morphology of the buccal apparatus, and a broad attachment of the branchiostegal membrane (see also [14, 30, 53]). In addition, Hubbs [30] suggested that low numbers of branchiostegals were associated to freshwater environments. Unfortunately, while we did not sample fossils for many of the clades needed to test those associations, the results of the jackknife analyses cast significant doubt on conclusions that could be drawn from low fossil sample sizes.

Our study shows the importance of fossils in documenting evolutionary patterns that would be poorly constrained based solely on neontological data [22, 45, 52, 54, 55]. This is most evident from our jackknifing analyses, which show that model support and parameter estimates are strongly affected by the inclusion of fossils (Fig. 4). This phenomenon does not correspond to a simple increase in total sample size, as performing the same analyses removing extant species instead of fossils has a comparatively negligible effect (see Additional file 3). Another example is that of the distribution of the trait among extant sarcopterygians and chondrosteans, which fails to capture the much greater diversity of their closely related extinct forms, as explained above. Extant chondrosteans and sarcopterygians have very few rays (0–3), whereas their ancient relatives had a greater and more variable number of elements (0–17 among the Paleozoic sarcopterygians, and up to 30 among the Paleozoic actinopterygians included in our dataset). These data exemplify why the EB model was strongly supported.

The fossil record is rich enough to provide a significantly more reliable reconstruction of trait evolution and model fitting than analyses that are exclusively neontological [45, 52, 54]. Our study further illustrates this phenomenon, but it is also an example of how a large paleontological sample may still be insufficient to reliably determine model parameters (Fig. 4). Given that other studies are sometimes performed with fewer fossils, a jackknifing analysis, such as the one we performed, seems advisable if conclusions are to be drawn from the specific values of these parameters. In addition to the fossil sample size, the temporal and phylogenetic distribution of the sampled fossils could have a significant impact. The ability to detect trends depends on the reconstruction of the root value, on which older fossils will have a greater effect (e.g., [45]). Also, uneven sampling of fossils across clades could give greater weight to some clades over others in the detection of a model that is intended to fit the entire tree. Our own data present such an uneven sampling, with the majority of fossils sampled for sarcopterygians and non-neopterygian actinopterygians. However, the sheer diversity in branchiostegal number of early neopterygians sampled seems to buffer against a potential bias in model selection.

Part of the failure of several model parameters to approach the asymptotic value in the jackknife analysis may be linked to the inability of candidate models to closely describe the data. In particular, the various clades in the tree appear to show diverse modes of evolution. A visual examination of the data suggests that there are clade-specific changes in the evolutionary rate, and our quantitative analyses strongly support this conclusion. This was also shown by Pennell et al.’s Cvar test on the EB fit, which compares the coefficient of variation of the contrasts on the empirical data to that of a population of datasets generated with similar parameters (p < 0.0001). For instance, Paleozoic sarcopterygians have a fairly variable but often high number of branchiostegals, ranging from 0 in Diplocercides to 17 in Eusthenopteron foordi and Gyroptychius milleri, but extant sarcopterygians lack branchiostgeals (Latimeria chalumnae, Neoceratodus forsteri, and Lepidosiren paradoxa). Similarly, the Paleozoic actinopterygians generally had more branchiostegals (e.g., 12 in Cheirolepis trailli, 13 in Cheirolepis schultzei, and 17 in Osorioichthys marginis) than extant basal members of the clade (none in extant polypterids; a single one in Polyodon, and two in Acipenser). However, no such trend is obvious among teleosts, which form the bulk of our extant sample (although our sample of extinct taxa is less dense among teleosts than in other parts of the tree, so this conclusion is not very robust). A corollary of this heterogeneity in evolutionary rates is that in large clades (e.g., salmonids), the number of elements between closely related groups may vary substantially, whereas in other clades the number of elements remains rather constant (e.g., all 22 sampled species of cypriniforms have three branchiostegals [5]).

Our study indirectly addresses the question of whether micromery (dermal skeleton composed of small elements, often capped by a single odontode) or macromery (i.e., dermal skeleton composed of a few large elements, each of which is typically capped by several odontodes, if these are present) comprise the primitive states for osteichthyans. This question has long been debated [65] but remains unsolved (e.g. [57]). If micromery were primitive, we would expect a decrease in number of skeletal elements over time, at least early in osteichthyan history. By contrast, macromery implies the reverse prediction. In early osteichthyans, there are examples of both mainly micromeric (e.g., Dialipina [56]: Fig. 1a; Cheirolepis [57]), or mainly macromeric (e.g. Guiyu Fig. 1b, [32]) taxa, so this polarity is currently unclear [58]. Given that we found no support for a “trend” model, our study does not allow discriminating strongly between these hypotheses, though the fact that OU’s optimal value is inferred to be slightly greater than the root condition provides some (weak) support for the macromery hypothesis.



We are deeply indebted to Leonhard Schmid for the collection of the majority of the branchiostegal data used in this study. We also thank Alexandra Wegmann and Linda Frey for technical help, Julien Clavel for help with the use of mvMORPH, and two anonymous reviewers for their suggestions to improve this manuscript.


RBR’s work was supported by National Science Foundation (NSF) grants (DEB-147184/DEB-1932759, DEB-1541491/DEB-1929248) and MRSV’s by the University of Zurich.

Availability of data and materials

All data generated or analysed in this study are included in this published article, its supplementary information files, and additional files.

Authors’ contributions

ML, MRSV, and EA designed the study, EA collected and synthesized additional data, RBR provided information on the phylogenetic framework, EA and ML conducted the analysis, ML, MRSV and EA wrote the paper; all authors edited and approved the final manuscript.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Authors’ Affiliations

Department of Geosciences, University of Fribourg, Chemin du Musée 4, 1700 Fribourg, Switzerland
Paläontologisches Institut und Museum, Universität Zürich, Karl-Schmid-Strasse 4, 8006 Zürich, Switzerland
Department of Biology, University of Oklahoma, Norman, 73019, USA
CR2P, UMR 7207 (CNRS/MNHN/Sorbonne Université), Muséum National d‘Histoire Naturelle, Bâtiment de Géologie, Case postale 48, 43 rue Buffon, F-75231, cedex 05 Paris, France


  1. Gregory WK, Roigneau M, Burr E, Evans G, Hellman E, Jackson F, et al. ‘Williston’s law’relating to the evolution of skull bones in the vertebrates. Am J Phys Anthropol. 1935;20:123–52.View ArticleGoogle Scholar
  2. Sidor CA. Simplification as a trend in synapsid cranial evolution. Evolution. 2001;55:1419–42.View ArticleGoogle Scholar
  3. Esteve-Altava B, Marugán-Lobón J, Botella H, Rasskin-Gutman D. Structural constraints in the evolution of the tetrapod skull complexity: Williston’s law revisited using network models. Evol Biol. 2013;40:209–19.View ArticleGoogle Scholar
  4. Esteve-Altava B, Rasskin-Gutman D. Theoretical morphology of tetrapod skull networks. Comptes Rendus Palevol. 2014;13:41–50.View ArticleGoogle Scholar
  5. McAllister DE. Evolution of branchiostegals and classification of teleostome fishes. Nat Mus Can Biol Ser. 1968;221:1–239.Google Scholar
  6. Gregory W. Fish skulls: a study of the evolution of natural mechanism. 2002, reprint. Malabar, Florida: the American philosophical Society; 1933.Google Scholar
  7. Lindsey C. Factors controlling meristic variation. In: Hoar WS, Randall DJ, editors. Fish physiology volume XI: the physiology of developing fish part B, viviparity and posthatching juveniles. New York and London: Academic; 1988. p. 197–274.Google Scholar
  8. Helfman G, Collette B, Facey D, Bowen B. The diversity of fishes: biology, evolution, and ecology. 2nd ed. Oxford: Wiley-Blackwell; 2009.Google Scholar
  9. Kimmel CB, Walker MB, Miller CT. Morphing the hyomandibular skeleton in development and evolution. J Exp Zool Part B. 2007;308:609–24.View ArticleGoogle Scholar
  10. Jarvik E. On the morphology and taxonomy of the middle Devonian osteolepid fishes of Scotland. Kungl Sv vet akademiens handlingar. 1948;25:1–301.Google Scholar
  11. Rieppel OC. Die Triasfauna der Tessiner Kalkalpen XXV. Die Gattung Saurichthys (Pisces, Actinopterygii) aus der mittleren Trias des Monte San Giorgio, Kanton Tessin. Schweiz Paläontol Abh. 1985;108:1–103.Google Scholar
  12. Mickle KE, Lund R, Grogan ED. Three new palaeoniscoid fishes from the bear gulch limestone (Serpukhovian, Mississippian) of Montana (USA) and the relationships of lower actinopterygians. Geodiversitas. 2009;31:623–68.View ArticleGoogle Scholar
  13. Hughes G. A comparative study of gill ventilation in marine teleosts. J Exp Biol. 1960;37:28–45.Google Scholar
  14. Farina SC, Near TJ, Bemis WE. Evolution of the branchiostegal membrane and restricted gill openings in actinopterygian fishes. J Morphol. 2015;276:681–94.View ArticleGoogle Scholar
  15. Lauder G. Aquatic feeding in lower vertebrates. In: Hildebrand M, Bramble DM, Liem KF, Wake DB, editors. Functional vertebrate morphology. Cambridge: The Belknap Press of Harvard University Press; 1985. p. 210–29.Google Scholar
  16. Wainwright PC, McGee MD, Longo SJ, Hernandez LP. Origins, innovations, and diversification of suction feeding in vertebrates. Integr Comp Biol. 2015;55:134–45.View ArticleGoogle Scholar
  17. Betancur-R R, Orti G, Stein AM, Marceniuk AP, Pyron RA. Apparent signal of competition limiting diversification after ecological transitions from marine to freshwater habitats. Ecol Lett. 2012;15:822–30.View ArticleGoogle Scholar
  18. Near TJ, Eytan RI, Dornburg A, Kuhn KL, Moore JA, Davis MP, Wainwright PC, Friedman M, Smith WL. Resolution of ray-finned fish phylogeny and timing of diversification. Proc Natl Acad Sci. 2012;109:13698–703.View ArticleGoogle Scholar
  19. Felsenstein J. A comparative method for both discrete and continuous characters using the threshold model. Am Nat. 2012;179:145–56.View ArticleGoogle Scholar
  20. Hernandez CE, Rodríguez-Serrano E, Avaria-Llautureo J, Inostroza-Michael O, Morales-Pallero B, Boric-Bargetto D, Canales-Aguirre CB, Marquet PA, Meade A. Using phylogenetic information and the comparative method to evaluate hypotheses in macroecology. Methods Ecol Evol. 2013;4:401–15.View ArticleGoogle Scholar
  21. Pennell MW, Harmon LJ. An integrative view of phylogenetic comparative methods: connections to population genetics, community ecology, and paleobiology. Ann N Y Acad Sci. 2013;1289:90–105.View ArticleGoogle Scholar
  22. Laurin M. Assessment of the relative merits of a few methods to detect evolutionary trends. Syst Biol. 2010;59:689–704.View ArticleGoogle Scholar
  23. Betancur-R R, Ortí G, Pyron RA. Fossil-based comparative analyses reveal ancient marine ancestry erased by extinction in ray-finned fishes. Ecol Lett. 2015;18:441–50.View ArticleGoogle Scholar
  24. Guinot G, Cavin L. Fish’ (Actinopterygii and Elasmobranchii) diversification patterns through deep time. Biol Rev. 2016;91:950–81.View ArticleGoogle Scholar
  25. Benton MJ, Forth J, Langer MC. Models for the rise of the dinosaurs. Curr Biol. 2014;24:R87–95.View ArticleGoogle Scholar
  26. Lund R. The new actinopterygian order Guildayichthyiformes from the lower carboniferous of Montana (USA). Geodiversitas. 2000;22:171–206.Google Scholar
  27. Lund R, Poplin C. The rhadinichthyids (paleoniscoid actinopterygians) from the bear gulch limestone of Montana (USA, lower carboniferous). J Vertebr Paleontol. 1997;17:466–86.View ArticleGoogle Scholar
  28. Campbell WB. Assessing developmental errors in branchiostegal rays as indicators of chronic stress in two species of Pacific salmon. Can J Zool. 2003;81:1876–84.View ArticleGoogle Scholar
  29. Ridewood W. On the cranial osteology of the fishes of the families Mormyridae, Notopteridae and Hyodontidae. Zool J Linn Soc-Lond. 1904;29:188–217.View ArticleGoogle Scholar
  30. Hubbs CL. A comparative study of the bones forming the opercular series of fishes. J Morphol. 1919;33:60–71.View ArticleGoogle Scholar
  31. Harmon LJ, Losos JB, Jonathan Davies T, Gillespie RG, Gittleman JL, Bryan Jennings W, Kozak KH, McPeek MA, Moreno-Roark F, Near TJ, et al. Early bursts of body size and shape evolution are rare in comparative data. Evolution. 2010;64:2385–96.PubMedGoogle Scholar
  32. Zhu M, Zhao W, Jia L, Lu J, Qiao T, Qu Q. The oldest articulated osteichthyan reveals mosaic gnathostome characters. Nature. 2009;458–474:469.View ArticleGoogle Scholar
  33. Friedman M. Styloichthys as the oldest coelacanth: implications for early osteichthyan interrelationships. J Syst Palaeontol. 2007;5:289–343.View ArticleGoogle Scholar
  34. Zhu M, Yu X, Lu J, Qiao T, Zhao W, Jia L. Earliest known coelacanth skull extends the range of anatomically modern coelacanths to the early Devonian. Nat Commun. 2012;3:772.View ArticleGoogle Scholar
  35. Swartz B. A marine stem-tetrapod from the Devonian of western North America. PLoS One. 2012;7:e33683.View ArticleGoogle Scholar
  36. Heinicke M, Sander J, Hedges S. Lungfishes (Dipnoi). In: Hedges S, Kumar S, editors. The Timetree of life. New York: Oxford University Press; 2009. p. 348–50.Google Scholar
  37. Polly PD. Paleontology and the comparative method: ancestral node reconstructions versus observed node values. Am Nat. 2001;157:596–609.View ArticleGoogle Scholar
  38. Hunt G. Fitting and comparing models of phyletic evolution: random walks and beyond. Paleobiology. 2006;32:578–601.View ArticleGoogle Scholar
  39. Ruta M, Wagner PJ, Coates MI. Evolutionary patterns in early tetrapods. I. Rapid initial diversification followed by decrease in rates of character change. Proc R Soc Lond B Biol Sci. 2006;273:2107–11.View ArticleGoogle Scholar
  40. Bapst DW. Paleotree: an R package for paleontological and phylogenetic analyses of evolution. Methods Ecol Evol. 2012;3:803–7.View ArticleGoogle Scholar
  41. Laurin M. The evolution of body size, Cope's rule and the origin of amniotes. Syst Biol. 2004;53:594–622.View ArticleGoogle Scholar
  42. Betancur-R R, Broughton RE, Wiley EO, Carpenter K, López JA, Li C, Holcroft NI, Arcila D, Sanciangco M, Cureton Ii JC, et al. The tree of life and a new classification of bony fishes. PLoS Currents. 2013;5:ecurrents.tol.53ba26640df0ccaee75bb165c8c26288.Google Scholar
  43. Clavel J, Escarguel G, Merceron G. mvMORPH: an R package for fitting multivariate evolutionary models to morphometric data. Methods Ecol Evol. 2015;6:1311–9.View ArticleGoogle Scholar
  44. Harmon LJ, Weir JT, Brock CD, Glor RE, Challenger W. GEIGER: investigating evolutionary radiations. Bioinformatics. 2008;24:129–31.View ArticleGoogle Scholar
  45. Slater GJ, Harmon LJ, Wegmann D, Joyce P, Revell LJ, Alfaro ME. Fitting models of continuous trait evolution to incompletely sampled comparative data using approximate Bayesian computation. Evolution. 2012;66:752–62.View ArticleGoogle Scholar
  46. Pennell MW, FitzJohn RG, Cornwell WK, Harmon LJ. Model adequacy and the macroevolution of angiosperm functional traits. Am Nat. 2015;186:E33–50.View ArticleGoogle Scholar
  47. Adamowicz SJ, Purvis A. From more to fewer? Testing an allegedly pervasive trend in the evolution of morphological structure. Evolution. 2006;60:1402–16.View ArticleGoogle Scholar
  48. Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20:289–90.View ArticleGoogle Scholar
  49. Bokma F, Godinot M, Maridet O, Ladevèze S, Costeur L, Solé F, Gheerbrant E, Peigné S, Jacques F, Laurin M. Testing for Depéret's rule (body size increase) in mammals using combined extinct and extant data. Syst Biol. 2016;65:98–108.View ArticleGoogle Scholar
  50. Midford PE, Garland T Jr, Maddison WP. PDAP package for Mesquite. 1.16 2010.Google Scholar
  51. Chira AM, Thomas GH. The impact of rate heterogeneity on inference of phylogenetic models of trait evolution. J Evolution Biol. 2016;29:2502–18.View ArticleGoogle Scholar
  52. Didier G, Fau M, Laurin M. Likelihood of tree topologies with fossils and diversification rate estimation. Syst Biol. 2017;66(6):964–87.View ArticleGoogle Scholar
  53. Gosline WA. Reduction in branchiostegal ray number. Copeia. 1967;1967:237–9.View ArticleGoogle Scholar
  54. Finarelli JA, Goswami A. Potential pitfalls of reconstructing deep time evolutionary history with only extant data, a case study using the Canidae (Mammalia, Carnivora). Evolution. 2013;67:3678–85.View ArticleGoogle Scholar
  55. Hunt G, Slater G. Integrating paleontological and phylogenetic approaches to macroevolution. Annu Rev Ecol Evol Syst. 2016;47:189–213.View ArticleGoogle Scholar
  56. Schultze H-P, Cumbaa SL. Dialipina and the characters of basal actinopterygians. In: Ahlberg P, editor. Major events in early vertebrate evolution. London and New York: Taylor and Francis; 2001. p. 315–32.Google Scholar
  57. Zylberberg L, Meunier FJ, Laurin M. A microanatomical and histological study of the postcranial dermal skeleton of the Devonian actinopterygian Cheirolepis canadensis. Acta Palaeontol Pol. 2016;61:363–76.Google Scholar
  58. Young GC. Placoderms (armored fish): dominant vertebrates of the Devonian period. Annu Rev Earth Planet Sci. 2010;38:523–50.View ArticleGoogle Scholar
  59. Pearson DM, Westoll TS. The Devonian actinopterygian Cheirolepis Agassiz. Earth Environ Sci Trans R Soc Edinb. 1979;70:337–99.Google Scholar
  60. Janvier P. Living primitive fishes and fishes from deep time. In: McKenzie D, Farrell A, Brauner C, editors. Fish physiology: primitive fishes. New York and London: Academic; 2007. p. 1–51.Google Scholar
  61. Digital Morphology library at The University of Texas at Austin. Accessed 19 July 2014.
  62. Nelson JS. Fishes of the world. Hoboken (NJ): Wiley; 2006.Google Scholar
  63. Revell LJ. Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol. 2012;3:217–23.View ArticleGoogle Scholar
  64. Revell LJ. Two new graphical methods for mapping trait evolution on phylogenies. Methods Ecol Evol. 2013;4:754–9.View ArticleGoogle Scholar
  65. Pearson D. Primitive bony fishes, with especial reference to Cheirolepis and palaeonisciform actinopterygians. Zool J Linnean Soc. 1982;74(1):35–67.View ArticleGoogle Scholar


© The Author(s). 2019