A supertree approach to shorebird phylogeny
© Thomas et al. 2004
Received: 14 May 2004
Accepted: 24 August 2004
Published: 24 August 2004
Skip to main content
© Thomas et al. 2004
Received: 14 May 2004
Accepted: 24 August 2004
Published: 24 August 2004
Order Charadriiformes (shorebirds) is an ideal model group in which to study a wide range of behavioural, ecological and macroevolutionary processes across species. However, comparative studies depend on phylogeny to control for the effects of shared evolutionary history. Although numerous hypotheses have been presented for subsets of the Charadriiformes none to date include all recognised species. Here we use the matrix representation with parsimony method to produce the first fully inclusive supertree of Charadriiformes. We also provide preliminary estimates of ages for all nodes in the tree.
Three main lineages are revealed: i) the plovers and allies; ii) the gulls and allies; and iii) the sandpipers and allies. The relative position of these clades is unresolved in the strict consensus tree but a 50% majority-rule consensus tree indicates that the sandpiper clade is sister group to the gulls and allies whilst the plover group is placed at the base of the tree. The overall topology is highly consistent with recent molecular hypotheses of shorebird phylogeny.
The supertree hypothesis presented herein is (to our knowledge) the only complete phylogenetic hypothesis of all extant shorebirds. Despite concerns over the robustness of supertrees (see Discussion), we believe that it provides a valuable framework for testing numerous evolutionary hypotheses relating to the diversity of behaviour, ecology and life-history of the Charadriiformes.
The shorebirds and allies (Aves: Charadriiformes; ) present an exceptional group for studying numerous evolutionary hypotheses. Their remarkable diversity of social mating system, parental care, sexual dimorphism, ecology and life-history make them an ideal group for unravelling the mechanisms of, for example, sexual selection and sexual conflict. Previous comparative studies have made significant contributions to our understanding of the evolution of mating systems , parental care [3, 4], sexual size dimorphism [5–7], locomotion and morphology , migratory behaviour , egg size , and plumage colouration . The importance of phylogeny in cross-species comparative studies is well documented [12–14]. Large and well-resolved phylogenies that incorporate divergence times provide powerful tests of a wide range of hypotheses whilst accounting for the effects of shared evolutionary history [13, 15]. However, the shorebird studies listed above were limited by the lack of a complete phylogeny for the group. Most of these studies are based on derivations of the seminal work of Sibley and Ahlquist , yet this study included less than a quarter of extant and recently extinct shorebird species. Recently extinct taxa (according to Monroe and Sibley ) are: the Tahitian sandpiper Prosobonia leucoptera, the Canary Islands oystercatcher Haematopus maedewaldoi, and the Great auk Pinguinus impennis.
Numerous methods and types of data can be used to infer phylogeny. Frequently, as in Charadriiformes, a single analysis incorporating all taxa of interest is absent. Under the principle of total evidence , all sources of phylogenetic information should be combined to maximize their explanatory power. Eernisse and Kluge  define total evidence as a method for seeking the best fitting phylogenetic hypothesis for an unpartitioned set of synapomorphies (shared derived characters) using character congruence (characters combined in a supermatrix). Hence, this method combines the primary data (molecular, morphological and behavioural characters) into a single analysis. The approach is powerful because weak signals in the partitioned data sets may be enhanced when combined, and previously obscured relationships may be revealed .
The total evidence approach has both practical and theoretical problems. First, only certain types of data can be combined. For example, nucleotide sequences and morphological traits can be readily assessed together as characters, but it is not generally possible to include nucleotide sequences and genetic distance data in a single analysis . We acknowledge that Lapointe et al.  suggest a distance based approach to combine otherwise incompatible data in a total evidence analysis, although this method has not been tested beyond a single application. The consequence is that it is rarely possible to combine all sources of data in practice and the lack of overlap in combinable data sets may result in a reduction of the number of taxa included. Second, Miyamoto and Fitch  contend that combining data sets is rarely justified because partitions of phylogenetic data are real and unequivocal. They argue that several partitions producing similar topologies provide multiple lines of independent evidence supporting that topology.
Theoretical arguments over the benefits of total evidence will undoubtedly continue, but perhaps the major barriers to its use are the often very high computational demands of large matrices, and the a priori exclusion of certain data types. This is particularly true of Charadriiformes phylogeny, where one of the most significant contributions to the field – DNA-DNA hybridisation – cannot be included. An alternative set of techniques, collectively termed supertrees (e.g., Matrix Representation with Parsimony, MRP; [37, 38]), enables combination of trees (rather than raw data) from otherwise incompatible sources. MRP methods code source phylogenies based on the presence and absence of taxa at each node of the tree [37–39] and are thus one step removed from the primary data. It is important to recognise that supertrees should not be regarded as a replacement for exhaustive phylogenetic studies of the primary data and there are drawbacks to the methods (see Discussion). However, they do enable very large phylogenies to be constructed rapidly . Supertrees have been constructed successfully for a wide variety of taxa including carnivores , primates , seabirds , dinosaurs , and grasses .
Shorebirds are particularly well suited for supertree treatment, since there are numerous incomplete phylogenies available and a broader phylogeny is desirable to facilitate powerful analyses of numerous evolutionary hypotheses (see above). Here, we present the first complete composite phylogeny of extant and recently extinct  shorebirds using the MRP approach. We are therefore combining data on tree topologies, and not conducting a simultaneous analysis on the original data. We also use fossil and molecular data to estimate divergence times (see Methods). The combination of complete taxonomic coverage and the inclusion of branch lengths provide the basis for future comparative analyses of Charadriiformes evolution. In addition, conflicting and unresolved areas of Charadriiformes phylogeny are revealed.
The majority of unresolved nodes in the shorebird supertree are located towards the tips of the phylogeny. For example, the genus Gallinago forms a monophyletic clade but only two pairs of species are resolved from 14 species (G. megala and G. negripennis; G. macrodactyla and G. media) in the majority-rule tree. Only the latter relationship remains in the strict consensus tree. In addition, clades including the genera Charadrius and Vanellus, Calidris and Tringa, Sterna, and Scolopax are poorly resolved. This may reflect a bias in phylogenetic studies of shorebirds. For instance, we found six source trees for Alcinae [43–48] but none devoted to Scolopax or Gallinago. Thomas et al.  indicate that this may be a problem for shorebird studies in general and reported a strong skew favouring research on northern hemisphere species.
In contrast to the within genera relationships, the generic and family levels are generally well resolved. The supertree indicates three monophyletic Charadriiformes lineages (figure 2). Family and subfamily resolution within each lineage is high, however the relative position of each group is unresolved in the strict consensus tree. This is an important point because the deepest relationships of shorebird phylogeny are contentious . The 50% majority-rule consensus tree indicates that the gulls and allies (Larini, Sternini, Rynchopini, Stercorariini, Dromas, Alcinae, and Glareolidae) are sister to the sandpipers and allies (Scolopacidae, Jacanidae, Rostratulidae, Thinocoridae, Pedionomidae). The most basal lineage includes the plovers and allies (Charadriinae, Pluvianellidae, Chionidae, Burhinidae, Haematopodini and Recurvirostrini). The gulls and allies clade is most consistent with DNA-DNA hybridisation , indicating that Larini are sister to Sternini and that Rynchopini are sister to this group. This conflicts with morphology-based topologies where Stercorariini are sister to Larini and Sternini with Rynchopini basal to both. Indeed, the position of Stercorariini remains controversial and most recently they were placed as sister to Alcinae [20–22]. In contrast, morphological evidence [18, 19] places Alcinae at the base of the whole Charadriiformes tree with Stercorariini sister to Larini. Thus, the position of Alcinae is uncertain and appears to be dependent on the type of data, with fundamental differences between molecular based analyses and morphological analyses. The taxon sampling of previous morphological and molecular studies varies considerably and it may be this, rather than genuine differences in the phylogenetic signal of different data types, that is the cause of conflict in resolving the phylogenetic position of Alcinae. However, it is encouraging that van Tuinen et al.  suggested that new unpublished osteological data are consistent with the more derived position indicated by molecular data. The supertree resolves Glareolidae outside the Larini, Sternini, Rynchopini, Stercorariini, Dromas, Alcinae clade. This is also the case with recent molecular and previous DNA-DNA hybridisation studies. Morphological studies have failed to resolve the position of Glareolidae, placing the family in a large polytomy with all other major groups except Alcinae and the sandpipers and allies (fig. 1). A novel development in shorebird phylogeny is the placement of the black-rumped buttonquail Turnix hottentotta as a sister to the gulls and allies (Larini, Sternini, Rynchopini, Stercorariini, Dromas, Alcinae, and Glareolidae) based on the nuclear RAG-1 gene . We did not include this species in the supertree because to date Paton et al.  remains the only study to reveal an apparently robust relationship. More diverse sampling of the buttonquails (Turnicidae) is essential to corroborate the general affinities of this family.
The relationships within the plover clade appear to be reasonably stable. Morphological, molecular, and DNA-DNA hybridisation all place Charadriinae as sister to Haematopodini and Recurvirostrini; our supertree is consistent with these relationships. However, it is not clear whether Burhinidae and Chionidae are sister to each other [20–22] or whether Chionidae are sister to a Charadriinae, Haematopodini, Recurvirostrini, and Burhinidae clade . Our supertree also included Pluvianellidae, a family consisting of only one species (magellanic plover Pluvianellus socialis) and places this as sister to Chionidae. If Pluvianellidae are excluded, the supertree is consistent with the sister group relationship of Burhinidae and Chionidae.
The sister group relationship of Jacanidae to Rostratulidae is well established [16, 18–22] and is found in our supertree. The supertree resolves the Thinocoridae and Pedionomidae as sister taxa and this group is sister to the Jacanidae and Rostratulidae. The large Scolopacidae clade is at the base of the sandpiper clade consistent with recent molecular studies [20–22] and the DNA-DNA hybridisation tapestry .
Taken together, it is evident that the supertree is generally more consistent with molecular data (both recent sequence studies and DNA-DNA hybridisation) than with analyses based on morphology. However, it is of course possible that this reflects the greater number of molecular source trees available rather than indicating that molecular data is actually better at resolving shorebird phylogeny. We included several large morphological phylogenies [e.g [18, 19, 26, 30, 43]] but the majority of source trees (29 out of 51) were based on molecular evidence (see additional file 5).
The higher resolution of the majority-rule tree means it is more likely to be of use in comparative studies. We therefore estimated node ages for this topology only (see additional file 1 and 2). We stress that our estimates of node dates are a first attempt at dating the whole tree and have several limitations. First, the fossils used to calibrate seven nodes in the tree are unlikely to be the earliest members of their respective families thus these dates will be underestimates. Second, we assumed that the fossils are grouped with the extant members of the family but this requires formal testing in a phylogenetic framework. Third, the pure birth model assumes that no extinction occurs but this may be unrealistic and it is likely that extinction processes have reduced the representation of older lineages . Furthermore, this model is derived from the topological structure of the tree so errors in tree reconstruction will likely lead to errors in branch length estimation. However, this approach has been employed previously in supertrees of primates  and carnivores  explicitly to facilitate comparative analyses. Despite these caveats, simulation studies have demonstrated that comparative methods such as independent contrasts are robust to errors in branch length  and no viable alternative for dating supertrees has been proposed. Nonetheless, we urge that alternative branch length assumptions are explored if the shorebird supertree is used in future comparative studies. At present, the calibrated RAG-1 tree of Paton et al.  remains arguably the most thorough and reliable measure of divergence times for Charadriiformes.
A fuller understanding of the phylogenetic affinities of fossil shorebirds will probably improve estimates of node ages for the group. For example, the extinct form Graculavidae, is represented by fossils from the Maastrichtian of New Jersey  and Cretaceous of Wyoming  but its position within the shorebird clade is unclear. Feduccia  suggests that it may be basal and a formal corroboration of this would support proposals for a late Cretaceous origin of shorebirds. The difficulties in dating the shorebird tree are further illustrated by fossil representatives of Recurvirostrini and Burhinidae which are much older than current estimates suggests. The earliest record of the Recurvirostrini is estimated to be over 50 million years old  whilst recent discoveries of a possible member of the Burhinidae are dated to around 70 mya [55, 56]. There is clearly a need for an integrated phylogenetic study including both extinct and extant shorebirds.
Supertrees are still at an early stage of development and many aspects of MRP, and supertree methods in general, are not yet clearly understood. Steps can be taken to ensure that the supertree includes the most appropriate sets of sources trees, such as only using trees from explicitly phylogenetic studies. This is not always straightforward and could result in the exclusion of important information. For instance, in our shorebird supertree, we included Sibley and Ahlquist's DNA-DNA hybridisation tapestry  although this is based on distance measures rather than more rigorous phylogenetic methods. Even if very strict tree selection criteria are applied, there are still likely to be biases in the data set. For example, not all source trees are equally well supported, yet in most supertree analyses each tree is treated equally . This is a problem for supertree construction because whilst it is theoretically possible, and indeed beneficial, to weight source trees based on support values  it is rarely possible in practice. Many source trees do not have support values and those that do may use different methods, (e.g, bootstrapping or decay indices) which cannot be directly compared with each other. An additional problem that has not been fully resolved relates to correlations between source trees . Several source trees based on the same data set may unduly increase the influence of that data set on the supertree analysis. However, there is no formal way of determining how much overlap to allow and the choice of source trees that go into supertree construction inevitably involves some degree of subjective reasoning. For the shorebird supertree we used strict Reduced Cladistic Consensus trees to summarise potential source trees that were from the same data set but based on different methods. For example, Thomas et al.  based their phylogeny on cytochrome- b but used a range of methods including parsimony and Bayesian analyses. We therefore combined these trees to minimise bias. In contrast, Ericson et al.  used two types of data: sequences from the nuclear RAG 1 gene and sequences from the myoglobin intron II. They carried out three analyses: each gene separately and then the two combined in a single analysis. In this case, we used three source trees. It could be argued that the combined analysis of Ericson et al.  should be excluded because of the possible overlap with the individual analyses. However, under the principle of total evidence, the combined data set may result in novel relationships being revealed [31, 33] and therefore could contribute important information to the supertree. Simulation and empirical studies are required to fully understand these and other possible biases in supertree construction (e.g., the influence of source tree size and shape) and formal protocols for the selection of source trees are desirable. For transparency, we include a summary of the source trees used, data type, and the main taxa included in the study (additional file 5).
Our shorebird supertree is highly consistent with recent advances in the molecular phylogenetics Charadriiformes. However, we urge caution when using the tree in comparative analyses and encourage the additional use of alternative phylogenies and branch length assumptions. It is particularly important to note that the position of some groups such as the Alcinae remains controversial and that although the majority rule tree is consistent with recent molecular studies, the strict consensus tree fails to resolve the deepest nodes.
The supertree presented here is, to our knowledge, the first attempt to reconstruct the phylogeny of the entire order Charadriiformes. Overall, the supertree is highly consistent with recent molecular hypotheses of shorebird phylogeny. However, it is apparent that fresh attempts to resolve both the phylogeny and estimates of age will be dependent on further gene sequencing and new fossil discoveries. The affinities of the Alcinae and the relationships between the three major shorebird clades require further corroboration, and studies of several genera such as Gallinago and Vanellus are desirable. Furthermore, additional work is required to establish the true affinities of the Turnicidae. Nonetheless, it appears that shorebird phylogeny is gradually approaching a consensus view. The broad taxonomic scope and consistency of the supertree mean that is of potentially great value to future comparative studies (accepting the caveats discussed above) of the behaviour, life-history, ecology and conservation of this diverse group.
Possible source trees were identified from online searches of Web of Science http://wos.mimas.ac.uk covering the years 1981 to 2004. We used the single key strings phylogen*, cladistic*, clado*, classif*, systematic*, and taxonom* (where the asterisks allow variations such as "phylogeny" or "phylogenetics") in the topic field, in conjunction with a major Charadriiformes taxon name (scientific or common). As supertree methods have been criticized for being biased towards historical trends, we preferred those studies that explicitly set out to derive a phylogenetic hypothesis and so exclude purely (and typically older) descriptive taxonomic works. The Sibley and Ahlquist  DNA-DNA hybridisation tapestry may be viewed as non-cladistic, but it was clearly the authors' intention to reconstruct the phylogeny of birds. Furthermore, it provided a vital catalyst for subsequent studies of avian (including shorebird) phylogeny. We therefore included the DNA-DNA hybridisation hypothesis as a source tree in our analyses. Simulation studies have demonstrated that the performance of supertree methods is improved by including at least one taxonomically complete (or near complete) source tree . We therefore make an exception to our self-imposed rule, and in addition use the taxonomic hierarchy of Monroe and Sibley  as a source tree as this includes all extant Charadriiformes species. We acknowledge that this taxonomy is based largely on Sibley and Ahlquist's  DNA-DNA hybridisation tapestry.
The initial search identified 78 source trees from 44 publications. Each source tree was typed as a text file in Nexus format . We coded trees to the species level with species names taken from Monroe and Sibley , but note that contra Monroe and Sibley , we use Charadriiformes not Charadrii to refer to the whole group. Several studies included the gull Larus thayeri [26, 60–63] either as a subspecies of Larus glaucoides (Larus glaucoides thayeri in Monroe and Sibley ) or a species in its own right. In recognition of this, we included Larus glaucoides thayeri as the only subspecies in our data set thus increasing the total taxa to 366. Monroe and Sibley  include 16 species of the family Pteroclidae within the Charadriiformes. However, the relationship of this family to the Charadriiformes is uncertain and they have recently been placed in their own order . We include the Pteroclidae in our analyses only as a means of rooting the tree. Where there were multiple most parsimonious trees (MPTs), or where source trees had been derived from predominantly overlapping data (e.g., from the same data but using alternative methods), we used RadCon  to produce strict Reduced Cladistic Consensus trees (RCC [66, 67]). The output is in the form of a reduced consensus profile and from this we selected the tree with the highest Cladistic Information Content (CIC) [65, 68]. This resulted in a total of 51 source trees from which our supertree is derived and these are summarised in additional file 5.
We produced an MRP matrix of the 51 Nexus  source trees in RadCon  (see additional file 6 for the MRP file). We used the original MRP coding method of Baum  and Ragan . Weighting source trees based on node support such as bootstrapping improves the accuracy of MRP supertrees . However, this is only possible if all source trees can be weighted on the same criteria . The absence of branch support measures in many of the shorebird source trees precludes this approach from the present study; hence, subsequent analyses were conducted using equally weighted parsimony.
The tendency of large data sets to produce many sub-optimal trees that are close in length and topology to the shortest tree is a serious problem in phylogenetics. Standard heuristic searches frequently are trapped searching within globally sub-optimal "islands" and the tree search is often aborted before completion. Nixon  proposed a new method to avoid this problem. The "Parsimony Ratchet" reweights a random set of characters from the data set. This may result in the tree island no longer representing a local optimum and the heuristic search continues until a new optimum is reached. The algorithm then reverts to the original weighting and the search continues. Nixon  demonstrated the efficacy of the method on a 500-taxon data set, where the ratchet-based search found a tree two steps shorter than standard heuristic searches. We used PAUPRat  to implement a parsimony ratchet in PAUP* . The default settings of 200 iterations and 15% perturbation of characters for reweighting were used and we carried out 20 replicates. Equally parsimonious trees were summarized using both strict and 50% majority-rule consensus methods.
We did not calculate any measures of branch support for two reasons. First, their validity and meaning is questionable in MRP supertrees . Second, the number of taxa included in our data set is too large to allow practical calculation of any branch support indices (e.g., decay indices ) on a desktop computer.
Following Purvis  and Bininda-Emonds et al.  we dated the supertree using both absolute and relative dates. We used data from the Fossil Record 2  as the source of fossil-based absolute dates. This yielded estimates for Jacanidae (Nupharanassa tolutaria, Rupellian), Phalaropus (Phalaropus elenorae, Middle Pliocene), Burhinidae (Burhinus lucorum, Lower Miocene), Glareolidae (Paractiornis perpusillus, Lower Miocene), Alcinae (Petralca austrica, Rupellian), Stercoariini (Stercorarius sp., Middle Miocene), and Larini (undetermined, Rupellian). We took the midpoint of the range from the Fossil Record 2  as our date estimate. More recent publications of fossil Charadriiformes were not included because they either represent specimens that are younger or have not been assigned to families that are represented amongst the extant Charadriiformes (such as Turnipacidae ). We assumed that fossil dates represent the earliest occurrence for each group which inevitably resulted in underestimates of clade age. The fossil record of Charadriiformes is amongst the best of the modern bird groups  in terms of the numbers of taxa, but many specimens are fragmentary and reliable estimates of divergence dates are dependent on a limited number of exceptional specimens . The phylogenetic affinities of the fossil shorebirds in relation to their extant relatives have not yet been fully established, hence have implicitly assumed that fossil representatives of extant groups would be resolved amongst their living relatives.
Source trees may include estimates of relative branch lengths (e.g., genetic distances). This allows further dating of the supertree but is problematic because different relative estimates are not comparable and cannot be applied directly to the supertree . However, where a source trees shares a node that has an absolute date in the supertree (a node dated from fossil evidence), the relative branch lengths can easily be converted to estimates of age. All taxa in our supertree are either extant, or very recently extinct; hence, the tips of the calibrated supertree should be equidistant from the root of the tree. In source trees where the relative branch lengths are not equidistant from the root, we followed the protocol of Purvis [; p.407–8]. We estimated relative dates using the local molecular clock logic  as implemented by Purvis  and Bininda-Emonds et al. . For example, consider three taxa A, B, and C where A and B are sister taxa and C is sister to A and B. The root is dated to 10 million years (myr) from fossil evidence, and independent molecular data provides estimates of divergence based on the number of substitutions per site. The molecular estimates of branch lengths are as follows: A, 6 substitutions; B, 8 substitutions; C, 20 substitutions; A and B are 11 substitutions from the root. A and B are therefore separated from their common node by a mean of 7 substitutions. The total length from A and B to the root is thus 18 substitutions compared to 20 for C (a mean of 19). This can be converted to date estimates such that 19 substitutions are equivalent to 10 myr. The dates of the tree are then: ((A : 3.68, B : 3.68), C : 10)). There were no cases where multiple source trees with molecular divergence dates were able to provide estimates for the same node. We estimated relative dates from multiple nodes rather than a single dated node to minimise correlative errors in estimates.
To provide date estimates for all nodes in the tree we employed a pure birth model to date nodes for which absolute and relative dates could not be attained . Pure birth models infer that a clade's age is proportional to the logarithm of the number of species within the clade:
date of daughter = date of ancestor *(log daughter clade size/log parent clade size)
For example, the age of a daughter node that subtends 12 taxa, estimated from its immediate ancestor dated to 20 myr and which subtends 19 taxa is:
20*(log(12)/log(19)) = 16.879
We applied this approach to estimate the ages of daughter nodes based on dates (absolute or calibrated) of ancestral nodes. We had no ancestral node on which to base estimates of the most basal clade. In this case, we rearranged the pure birth formula and calculated the age of the ancestral node from its two daughter nodes, taking the mean as our "best estimate". Finally, to estimate the ages of nodes between daughter and ancestor nodes of known age we spaced the nodes evenly along the branches length .
We thank for Davide Pisani for stimulating discussion of supertree methods during the early planning of this work, Gareth Dyke for information on recent shorebird fossil discoveries, and three anonymous reviewers for helpful comments on an earlier version of the manuscript. Gavin Thomas was supported by a University of Bath Research Studentship.
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.