Algorithmic approaches to aid species' delimitation in multidimensional morphospace
- Thomas HG Ezard^{1}Email author,
- Paul N Pearson^{2} and
- Andy Purvis^{1}
DOI: 10.1186/1471-2148-10-175
© Ezard et al; licensee BioMed Central Ltd. 2010
Received: 30 April 2010
Accepted: 11 June 2010
Published: 11 June 2010
Abstract
Background
The species is a fundamental unit of biological pattern and process, but its delimitation has proven a ready source of argument and disagreement. Here, we discuss four key steps that utilize statistical thresholds to describe the morphological variability within a sample and hence assess whether there is evidence for one or multiple species. Once the initial set of biologically relevant traits on comparable individuals has been identified, there is no need for the investigator to hypothesise how specimens might be divided among groups, nor the traits on which groups might be separated.
Results
Principal components are obtained using robust covariance estimates and retained only if they exceed threshold amounts of explanatory power, before model-based clustering is performed on the dimension-reduced space. We apply these steps in an attempt to resolve ongoing debates among taxonomists working on the extinct Eocene planktonic foraminifera Turborotalia, providing statistical evidence for two species shortly before the lineage's extinction near the Eocene/Oligocene boundary.
Conclusion
By estimating variance robustly (samples containing incipient species are unlikely to be scaled optimally by means and standard deviations) and identifying thresholds relevant to a particular system rather than universal standards, the steps of the framework aim to optimize the chances of delineation without imposing pre-conceived patterns onto estimates of species limits.
Background
Whilst the fundamental importance of the species remains paramount in systematic and evolutionary biology, debates on species concepts continue [1–3]. There is, however, a growing consensus that conceptualizing what species ought to be is different from delimiting them in practice [2, 4, 5]. Operationally, species delimitation pivots on the assumption that, once sexual dimorphism, ontogeny and other causes of group difference have been taken into account to leave directly comparable individuals for analysis (semaphoronts [6]), expression of genotypes or phenotypes [7] or both [8] should be more similar in two individuals of species X than between one individual of species X and one individual of species Y. Hence individuals within species are expected to cluster in genotypic or phenotypic space, with sparsely inhabited or empty space between them. The range of intra-species variation can surpass the range of inter-species variation [9] however, which is problematic for morphological species concepts which rely critically on the assumption that gaps between species do exist [1–3].
The development of molecular diagnostics to identify analogous gaps has been dramatic [10], yielding automated approaches to test specific evolutionary hypotheses and enabling rapid delimitation of previously-undescribed species [10, 11] across taxa and biogeographical region [12]. Morphological data are nevertheless useful in delimiting species [13] and strong arguments exist to refine non-molecular approaches [14, 15]. Morphological data are often the only sort available. Elsewhere, they can be integrated with genetic data to augment evidence for species delimitation [8] or provoke new hypotheses given a lack of congruence [16]. Morphological traits are the outcome of multi-locus variation, and thus constitute a more thorough reflection of variation among individuals than a particular, pre-determined section of the DNA sequence [14]. Furthermore, if divergent selection is the main driver of speciation, then adaptive (morphological) traits may provide the best insights into species' limits [17, 18]. Whilst various approaches have been developed for delimiting species morphometrically using an a priori definition [19], development of hypothesis-driven morphometric techniques when there is no guide as to whether an individual in a genus belongs to species X or Y has been much slower than for equivalent molecular tests [10, 20].
Any threshold measure effectively aims to identify how much "noise" can be discarded. In structured data, the identification of additional clusters increases within-group cohesion until, in the limit, each individual sits proudly as an isolated cluster [26] and no noise is discarded. Hence the problem becomes finding the optimal number of clusters to describe the structure parsimoniously. The simplest structure where an additional cluster is not warranted can be identified sequentially [e.g., are 3 clusters better than 2? [26, 27]] or simultaneously [e.g., are 3 clusters better than 1,2,4,5, and so on? [28, 29]]. These clusters may vary in their size (i.e., number of included individuals), shape and orientation [30]. Quantifying how much noise to discard relies heavily on the traits used to characterize differences among individuals. Any method to calculate variability within and between species should aim to neglect redundant measurements, whilst retaining essential differences. Accurate identification of the dimension and orientation of any dimension-reduced space is therefore a key problem.
In high-dimensional data, there often exists a lower-dimensional space that describes the majority of the observed variation, i.e., a smaller "fundamental set of independent variables ... which determine the values of the original data" [[31], cited in [32]]. Principal Components Analysis (PCA) finds a dimension-reduced space using variable covariance to rotate the original data to a new orientation that emphasizes similarities and differences among variables. Use of covariance for rotation assumes a multivariate normal distribution, which is often violated in taxonomic studies where data consist of a mixture of ratio, ordinal, binary and continuous variables, i.e. mixed mode, and might contain multiple species distributed in clumps across space. Many workers seek to circumvent this problem by using the correlation matrix rather than the covariance matrix as the basis for the PCA [ch. 11 in [32]], but outliers can still skew the orientation of the rotated axes markedly [33, 34]. An alternative is to estimate variance robustly, using, for example, the median absolute deviation to detect the most variable direction [35]. The approach has already been employed in environmetrics [36] but not, to our knowledge, until now in systematic biology where it is relevant because samples containing incipient species are unlikely to be scaled optimally by means and standard deviations (Fig. 1).
Our approach to species delimitation consists of four steps: (1) obtaining orthogonal axes with robust covariance estimators [37]; (2) reducing the dimensionality of the orthogonal axes to only those with significant explanatory power [38]; (3) identifying the optimal number, shape and orientation of groups within the rotated, dimension-reduced data [28]; and (4) performing model diagnostics to assess the impact of unusual or extreme individuals on the dimension-reduced space and cluster model [39]. At each step, we use heuristic thresholds to retain or remove additional complexity, thereby minimising the scope for subjective choices. The details of application likely vary from question to question, but we argue here that they are key steps when splitting continuous morphological variation into discrete species. We first introduce the case study that motivated development, before going on to discuss application in this contentious area.
Case Study - Turborotalia cerroazulensislineage
Statistical Thresholds used to Delimit Species (STUDS)
The Orientation of the Dimension-Reduced Space
The restrictive assumption of a multivariate normal distribution induced by use of covariance fails in many applied contexts, prompting refinement of the original PCA [see ch. 11 in [32]]. Using correlation matrices as an alternative does not always account for the potentially large effect that outliers can have on the orientation of the principal components, which aims to remove interdependence among components but can be biased by unusual observations [33, 37]. As an illustration, Croux & Ruiz-Gazen [33] compared a PCA on the correlation matrix with a PCA using robust variance estimators, finding a correlation between the two "independent" components obtained from the correlation matrix argued to be a result of two outlying points. If scaled and centered data are not well-approximated by a multivariate normal distribution (multiple clusters within the data can also violate the normality assumption), rotated axes are more likely to pass through the centre of the multivariate data cloud when obtained using robust covariance estimators to identify the most variable traits using alternative metrics [35]. As the mixed mode data we analyze here are poorly described by the assumption of normality, each variable was centered on the median and scaled by percentile variability prior to rotation. Note that rotation is based upon the robust covariance estimates and pays no attention to the ultimate goal of finding cohesive clusters of comparable individuals - the implicit assumption is that the key distinctions among groups are present and identifiable in the traits used for analysis.
Dimension Reduction
Retaining too few axes risks neglecting an influential one whereas retaining too many factors can deliver attention to relatively unimportant components (or, possibly, measurement error and inaccuracy [47]): either can generate bias, although the former is more serious [48]. Each retained variable should provide a significant improvement in explanatory power, whilst the improvement of each discarded variable should be minor: the threshold is located where the morphological trait information changes from being useful to irrelevant noise. Many stopping criteria have been proposed to determine where the threshold for dimension retention lies [see reviews in [38, 47]] and no single measure outperforms all others [47]. Our case study contains multiple traits that are reasonably highly correlated. In such instances, adapting Horn's parallel analysis [49] to test whether the observed eigenvalues in the data are greater than the centiles of 10,000 Monte Carlo estimates is one of two most promising stopping criteria [[47], Additional file 1, online appendix 1 contains a comparison of multiple criteria and simulations documenting why dimension reduction is important in this context].
Identification of well-separated clusters
The dimension-reduced sample can be split into groups using cluster analysis, the optimal delimitation of which has low intra- and high inter-group variability. K-means approaches [50, 51] generate clusters with equal lengths in all dimensions (e.g., circles, spheres,....), which is not always representative of individuals in species clusters [30]. The flexibility in the shape and orientation of clusters can be incorporated using Gaussian mixture models and a Bayesian approach to estimate the support for particular arrangements of clusters using iterative Expectation-Maximization methods for maximum-likelihood [28]. The volume and shape can be equal or variable among axes to assess, for example, whether elliptical clusters fit better than round ones. The choice between competing models is made through the Bayesian Information Criterion (BIC, also known as the Schwarz Information Criterion), which is similar to the Akaike Information Criterion (AIC) and provides a compromise between explained variation and the number of parameters used [52]. The difference is that the penalty term per parameter for the BIC is log(n), where n is the number of observations, rather than 2 for the AIC. Consequently, BIC favours complex models more than AIC if n is at least 8, but simpler ones otherwise [52]. We present BIC values and also model weights, which sum to 1 and can be interpreted as the probability that a particular model provides the best description of the data structure among the set of candidate models [52].
Model Diagnostics
Unusual or extreme individuals can exert great influence on the orientation of principal components [34, 39]. The distances between individuals, when transformed to approach a chi-squared distribution, can be used to identify unusual observations at a desired level of significance [34]. Even if an individual is significantly unusual at a given level, there is still no guarantee that it unduly affects interpretation [39]; small groups of such points might rather be an under-sampled group that is diverging away from the main population.
Implementation
All calculations were performed in R version 2.9.1 [53] and used the pcaPP [54], mvoutlier [34, 54] and mclust [28] packages. A self-contained example of our code is included in Additional file 1, online appendix 1.
Results
Rotated and dimension-reduced morphospaces
Based on stopping criteria, two components were retained in both samples (see Additional file 1, online appendix 1 for justification of stopping criteria and dimension reduction).
Component | Middle Eocene | Upper Eocene | ||
---|---|---|---|---|
Eigenvalue | Cumulative Variance Explained | Eigenvalue | Cumulative Variance Explained | |
1 | 3.47 | 0.29 | 5.97 | 0.42 |
2 | 2.33 | 0.49 | 3.24 | 0.64 |
3 | 1.43 | 0.61 | 1.44 | 0.74 |
4 | 1.27 | 0.72 | 0.95 | 0.81 |
5 | 1.15 | 0.81 | 0.74 | 0.86 |
6 | 0.75 | 0.88 | 0.68 | 0.91 |
7 | 0.5 | 0.92 | 0.5 | 0.94 |
8 | 0.42 | 0.95 | 0.31 | 0.97 |
9 | 0.42 | 0.99 | 0.31 | 0.99 |
10 | 0.09 | 1 | 0.11 | 1 |
The loadings onto the robust principal components; larger absolute values indicate more influential traits on the dimension-reduced space.
Trait | Middle Eocene | Upper Eocene | ||
---|---|---|---|---|
Component 1 | Component 2 | Component 1 | Component 2 | |
Area | 0.111 | -0.355 | 0.289 | -0.364 |
Filled | -0.508 | 0.239 | 0.382 | 0.317 |
Chamber Aspect Ratio | -0.395 | -0.438 | -0.418 | 0.453 |
Chamber Inflation | -0.390 | -0.396 | -0.274 | 0.062 |
Aperture Aspect Ratio | -0.176 | -0.026 | -0.249 | 0.148 |
Test Height | -0.261 | 0.374 | 0.470 | 0.220 |
Test Expansion | -0.505 | 0.228 | -0.112 | 0.469 |
Umbilical Angle | 0.103 | 0.522 | -0.361 | -0.047 |
Chirality | 0.043 | -0.037 | 0.122 | 0.192 |
Chamber Number | 0.233 | -0.048 | -0.284 | -0.478 |
Sample Structure
Bayesian Information Criterion values and, in brackets, model weights for the model-based clustering based on robust principal components for the two samples.
Distribution | Model | Middle Eocene | Upper Eocene | ||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | ||
Spherical | E-- | 1574 (0.010) | 1575.5 (0.004) | ||||||
Spherical | V-- | 1574 (0.010) | 1649 (0.002) | ||||||
Diagonal | EE- | 1567 (0.227) | |||||||
Diagonal | VE- | 1567 (0.227) | |||||||
Diagonal | EV- | 1567 (0.227) | |||||||
Diagonal | VV- | 1567 (0.227) | |||||||
Ellipsoidal | EEE | 1573 (0.017) | |||||||
Ellipsoidal | EEV | 1573 (0.017) | 1639 (0.224) | 1648 (0.003) | |||||
Ellipsoidal | VEV | 1573 (0.017) | 1637 (0.611) | 1641 (0.101) | |||||
Ellipsoidal | VVV | 1573 (0.017) | 1642 (0.059) | 1649 (0.002) |
Discussion
Whilst automation at the expense of all else is undesirable [55], the use of taxonomically informative data should enable species' limits to be readily visualized [21]. We set out to test the null hypothesis that the species lineage of the Eocene planktonic foraminifer Turborotalia is homogeneous, i.e. one that contains an insensibly continuous series of intergrading populations [44] despite comprehensive description of multiple morphospecies regularly used in biostratigraphy [40]. In the middle Eocene sample, there was no statistical evidence to split the sample; this does not mean that multiple species are not present, just that there is no statistical evidence to delimit them. In the upper Eocene sample, the statistical evidence supported two species strongly (model weights ~0.9), whereas support for the existence of other numbers of species was very low. Delimitation is subtle enough to be difficult to determine by eye and requires investigation of large populations using multivariate data analysis. Crucially, we neither made a priori assumptions about the assignment of individuals to clusters nor employed prior knowledge of which traits determined those differences. Our use of robust variance estimators [37] is particularly appropriate for samples containing incipient species diverging in opposing directions (Fig. 1). Consequently, our approach is more general than this particular study: it can be used if a test is sought for the null hypothesis of multivariate homogeneity without specifying a particular alternative.
The increased adoption of geometric morphometrics - which utilizes distances among functionally-important characters referred to as landmarks - has been underpinned by a desire to quantify shape precisely across diverse questions, often to understand changes in shape independent of size [56] and can delimit previously-undetected species [57]. Size, however, often has a decisive role in diversification [58] and there is no guarantee that separating shape from it is biologically reasonable [59]. Our aim was to aid 'classical' morphometric treatments, where taxonomists have established decisive and informative functional traits. The steps discussed here serve to increase repeatability of taxonomic decisions, through the quantitative element that morphometric treatment adds to descriptions of sample variability. The connectivity of samples to be delimited can be achieved in many ways using, for example, Fourier analysis of distances between neighbours [60], neural networks [61] or, as here, by encompassing clusters of similar individuals using polygons [28]. The details of application may change on a case-by-case basis as there is rarely a one-size-fits-all recipe, but the steps of the framework have widespread utility in aiding species delimitation that might otherwise be obscured through mathematical conflation.
Distributions of biological traits are rarely described optimally when centered and scaled by means and standard deviations and rotated using covariance; such data are treated more appropriately using robust covariance estimates (Fig. 1). The use of robust covariance estimators to rotate raw data onto principal components is not widespread and yields a different dimension-reduced space from correlation- or covariance-based approaches [37]. There are alternatives to obtain appropriate scaling, notably principal co-ordinates analysis [62] but that method does not lend itself readily to threshold criteria, which are important to ensure parsimonious description of the untransformed traits in orthogonal space. Data can be, and often are, transformed prior to scaling, centering and rotation, but there is often no biologically meaningful transformation [63]. Whilst each variable could be transformed independently to obtain approximately symmetric, normally distributed distributions, doing so can inhibit interpretation. An advantage of robust approaches is that they de-emphasize extreme values: medians and median absolute deviations are less affected by long-tailed or asymmetric distributions than means and standard deviations. Both of these circumstances may arise during the early stages of divergence, when a population is splitting into incipient species. Inadequate scaling can skew the orientation of the principal components [35], hence obscuring potentially critical differences. The notion of being 'well-separated' applies to extreme data points as well as distinct clusters, meaning that identification of genuine outliers can be problematic. In the case studies we present here, we have no biological or methodological reason to remove the outliers, which might correspond to one or more undersampled or hidden groups. Without a priori grounds to eliminate them, results are reported here from analyses that retained these data and we note that the use of robust variance estimates means that manual removal is less likely to alter conclusions than standard, covariance-based approaches.
If species are argued to be morphologically distinct [1–3], then threshold-stopping criteria can be used to aid delimitation in morphospace. The failure to use prior knowledge of which individuals belong to which species and which traits are the key distinctions between species does not maximise power, but is based upon the argument that rejecting the null hypothesis (homogeneous data) in a contemporaneous sample implies two or more species. Analogous approaches have been applied to the problem of species' delimitation from alternative perspectives. Pons et al. [11] used changes in per-lineage branching rate to cluster tips on a phylogenetic tree, with the threshold between intra- and inter-species variation being the point where branching rates underwent a striking increase. Clusters in contemporaneous samples of comparable individuals are argued to be putative species [1–3]. The approach we present provides a useful analogue to methods that cluster tips on a phylogeny [11], from a bottom-up (samples into multiple clusters) rather than top-down (merging phylogenetic and phylogeographic methods) perspective for estimating sample structure in, amongst others, genotypic or phenotypic [7] or geographical [64] space. Application is likely to be particularly appropriate for species delimitation questions in the fossil record [57], or to test for congruence between morphological and genetic differentiation without resorting to using the other evidence as a prior hypothesis [8].
Conclusion
Taxonomic decisions should ideally be taken by aligning state-of-the-art statistics with taxonomic expertise. Any technique to delimit species pivots on its ability to quantify the heterogeneity contained within species: "even an improved taxonomy still imposes structure on macroevolutionary investigation" [pp. 371, [60]] and the success of any statistical approach pivots on the use of biologically relevant data. The details of application will differ from question to question and from group to group, but the four steps discussed here decrease the potential for subjective decisions in species delimitation once biologically relevant traits have been identified. By scaling and centering appropriately, estimating variance robustly and identifying the thresholds that are relevant to the particular question and data set rather than a universal guide, the objective is to minimize the extent to which pre-conceived pattern is forced onto data.
Methods
Raw data were obtained from Hole 865B of the Ocean Drilling Program (equatorial mid-Pacific Ocean). The entire data set consists of 51 time slices through a stratigraphic interval spanning around 13 million years. We focus on two samples here: one middle Eocene and one upper Eocene; a future contribution will analyze many more time slices and discuss the pattern of evolution in detail. Samples of 10 cm^{3} were taken from the sediment and washed over a 63-micron sieve to remove fine particles (mainly coccoliths). The sieved residue is >99% planktonic foraminifer shells. All specimens of the Turborotalia cerroazulensis group of morphospecies were identified by eye using the taxonomic criteria of Pearson et al. [40] from multispecies assemblages and picked without further reference to species designation. Most other groups of foraminifera are easily distinguished, although rejection of specimens belonging to T. altispiroides and T. ampliapertura required a greater degree of expert discrimination [40]. The first 200 specimens encountered were manually separated and mounted on cardboard slides in a standard orientation (edge on, aperture facing). For each specimen, fine adjustments were made using a universal stage to achieve as consistent a standard orientation as possible. The choice of orientation in side view and measurements were carefully designed to capture the greatest range of morphological variability in the group, including the characters that are used in qualitative discrimination of the morphospecies by working taxonomists [40].
Measurements were made from photographs of each individual using Image Pro+ (Image Software, UK). The following morphological traits were incorporated in analysis: area, 'filled' (the proportion of a circle of an individual's radius filled by that individual), final chamber inflation (chamber width scaled by length), final chamber and aperture aspect ratio (the height: width ratio of the final chamber and aperture, respectively), test height (axis/radius), test expansion (diameter/radius), umbilical angle, chamber number and chirality. See fig. 2 for more information.
Declarations
Acknowledgements
We would like to thank the authors of the packages we use as well as Tim Barraclough, Norman MacLeod, Albert Phillimore and four anonymous reviewers for insightful suggestions that improved the manuscript, Catherine Watling for the SEM photograph and Matthew Carroll for repeating all morphometric measurements. The Natural Environment Research Council (grant NE/E015956/1 to AP & PNP) provided financial support.
Authors’ Affiliations
References
- Cracraft J: Species Concepts and the Ontology of Evolution. Biol Philosoph. 1987, 2 (3): 329-346. 10.1007/BF00128837.View ArticleGoogle Scholar
- De Queiroz K: Species concepts and species delimitation. Syst Biol. 2007, 56 (6): 879-886. 10.1080/10635150701701083.View ArticlePubMedGoogle Scholar
- Claridge MF, Dawah AH, Wilson MR, (eds): Species: The Units of Biodiversity. 1997, New York: SpringerGoogle Scholar
- Hey J, Waples RS, Arnold ML, Butlin RK, Harrison RG: Understanding and confronting species uncertainty in biology and conservation. Trends Ecol Evol. 2003, 18 (11): 597-603. 10.1016/j.tree.2003.08.014.View ArticleGoogle Scholar
- Mayden RL: A hierarchy of species concepts: the denouement in the saga of the species problem. Species: The Units of Biodiversity. Edited by: Claridge MF, Dawah AH, Wilson MR. 1997, New York: Springer, 381-424.Google Scholar
- Hennig WL: Phylogenetic systematics. 1966, Urbana, IL: University of Illinois PressGoogle Scholar
- Sites JW, Marshall JC: Operational criteria for delimiting species. Ann Rev Ecol Evol Syst. 2004, 35: 199-227. 10.1146/annurev.ecolsys.35.112202.130128.View ArticleGoogle Scholar
- Fontaneto D, Herniou EA, Boschetti C, Caprioli M, Melone G, Ricci C, Barraclough TG: Independently evolving species in asexual bdelloid rotifers. PLoS Biol. 2007, 5 (4): 914-921. 10.1371/journal.pbio.0050087.View ArticleGoogle Scholar
- Tabachnick RE, Bookstein FL: The Structure of Individual Variation in Miocene Globorotalia. Evolution. 1990, 44 (2): 416-434. 10.2307/2409418.View ArticleGoogle Scholar
- Vogler AP, Monaghan MT: Recent advances in DNA taxonomy. J Zool Syst Evol Res. 2007, 45 (1): 1-10. 10.1111/j.1439-0469.2006.00384.x.View ArticleGoogle Scholar
- Pons J, Barraclough TG, Gomez-Zurita J, Cardoso A, Duran DP, Hazell S, Kamoun S, Sumlin WD, Vogler AP: Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Syst Biol. 2006, 55 (4): 595-609. 10.1080/10635150600852011.View ArticlePubMedGoogle Scholar
- Pfenninger M, Schwenk K: Cryptic animal species are homogeneously distributed among taxa and biogeographical regions. BMC Evol Biol. 2007, 7: 121-127. 10.1186/1471-2148-7-121.PubMed CentralView ArticlePubMedGoogle Scholar
- Wiens JJ: The role of morphological data in phylogeny reconstruction. Syst Biol. 2004, 53 (4): 653-661. 10.1080/10635150490472959.View ArticlePubMedGoogle Scholar
- Will KW, Mishler BD, Wheeler QD: The perils of DNA barcoding and the need for integrative taxonomy. Syst Biol. 2005, 54 (5): 844-851. 10.1080/10635150500354878.View ArticlePubMedGoogle Scholar
- Dayrat B: Towards integrative taxonomy. Biol J Linn Soc. 2005, 85 (3): 407-415. 10.1111/j.1095-8312.2005.00503.x.View ArticleGoogle Scholar
- Cardoso A, Serrano A, Vogler AP: Morphological and molecular variation in tiger beetles of the Cicindela hyrbida complex: is an 'integrative taxonomy' possible?. Mol Ecol. 2009, 18 (4): 648-664. 10.1111/j.1365-294X.2008.04048.x.View ArticlePubMedGoogle Scholar
- Rundle HD, Nosil P: Ecological speciation. Ecol Lett. 2005, 8: 336-352. 10.1111/j.1461-0248.2004.00715.x.View ArticleGoogle Scholar
- Schluter D: Ecological character displacement in adaptive radiation. Am Nat. 2000, 156: S4-S16. 10.1086/303412.View ArticleGoogle Scholar
- MacLeod N, Forey PL: Morphology, shape and phylogeny. 2002, Boca Raton, FL: CRC PressView ArticleGoogle Scholar
- Wiens JJ: Species delimitation: New approaches for discovering diversity. Syst Biol. 2007, 56 (6): 875-878. 10.1080/10635150701748506.View ArticlePubMedGoogle Scholar
- Sneath PHA, Sokal RR: Numerical Taxonomy. 1973, San Francisco: W.H. FreemanGoogle Scholar
- Wright S: An analysis of variability in the number of digits in an inbred strain of guinea pigs. Genetics. 1934, 19 (6): 506-534.PubMed CentralPubMedGoogle Scholar
- Simpson GG, Roe A, Lewontin RC: Quantitative Zoology. 1960, New York: Harcourt, Bruce & CoGoogle Scholar
- Wiens JJ, Servedio MR: Species delimitation in systematics: inferring diagnostic differences between species. Proc Roy Soc Lond B. 2000, 267 (1444): 631-636. 10.1098/rspb.2000.1049.View ArticleGoogle Scholar
- Templeton AR: Using phylogenetic analyses of gene trees to test species status and processes. Mol Ecol. 2001, 10 (3): 779-791. 10.1046/j.1365-294x.2001.01199.x.View ArticlePubMedGoogle Scholar
- Tibshirani R, Walther G, Hastie T: Estimating the number of data clusters via the gap statistic. J Roy Stat Soc B Met. 2001, 63 (2): 411-423. 10.1111/1467-9868.00293.View ArticleGoogle Scholar
- Yan M, Ye K: Determining the number of clusters using the weighted gap statistic. Biometrics. 2007, 63 (4): 1031-1037.View ArticlePubMedGoogle Scholar
- Fraley C, Raftery AE: Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc. 2002, 97 (458): 611-631. 10.1198/016214502760047131.View ArticleGoogle Scholar
- Oh MS, Raftery AE: Model-Based Clustering with Dissimilarities: A Bayesian Approach. J Comput Graph Stat. 2007, 16 (3): 559-585. 10.1198/106186007X236127.View ArticleGoogle Scholar
- Hohenlohe PA, Arnold SJ: MIPoD: A Hypothesis-Testing Framework for Microevolutionary Inference from Patterns of Divergence. Am Nat. 2008, 171 (3): 366-385. 10.1086/527498.PubMed CentralView ArticlePubMedGoogle Scholar
- Hotelling H: Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933, 24 (6): 417-441. 10.1037/h0071325. 498-500View ArticleGoogle Scholar
- Jolliffe IT: Principal Components Analysis. 2002, New York, Springer, 2Google Scholar
- Croux C, Ruiz-Gazen A: High Breakdown Estimators for Principal Components: The Projection-pursuit Approach Revisited. J Multivariate Anal. 2005, 95 (1): 206-226. 10.1016/j.jmva.2004.08.002.View ArticleGoogle Scholar
- Filzmoser P, Maronna R, Werner M: Outlier identification in high dimensions. Comp Stat Data Anal. 2008, 52 (3): 1694-1711. 10.1016/j.csda.2007.05.018.View ArticleGoogle Scholar
- Li G, Chen Z: Projection-pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo. J Am Stat Assoc. 1985, 80 (391): 759-766. 10.2307/2288497.View ArticleGoogle Scholar
- Filzmoser P: Robust Principal Component and Factor Analysis in the Geostatistical Treatment of Environmental Data. Environmetrics. 1999, 10: 363-375. 10.1002/(SICI)1099-095X(199907/08)10:4<363::AID-ENV362>3.0.CO;2-0.View ArticleGoogle Scholar
- Croux C, Filzmoser P, Oliveira M: Algorithms for Projection-Pursuit Robust Principal Component Analysis. Chemometr Intell Lab. 2007, 87 (2): 218-225. 10.1016/j.chemolab.2007.01.004.View ArticleGoogle Scholar
- Jackson DA: Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology. 1993, 74 (8): 2204-2214. 10.2307/1939574.View ArticleGoogle Scholar
- Brooks SP: Diagnostics for Principal Components: Influence Functions as Diagnostic Tools. The Statistician. 1994, 43 (4): 483-494. 10.2307/2348133.View ArticleGoogle Scholar
- Pearson PN, Olsson Premec-Fucek V, Premoli Silva IRK: Taxonomy, biostratigraphy, and phylogeny of Eocene Turborotalia. 2006 In. Atlas of Eocene Planktonic Foraminifera. Edited by: Pearson PN, Olsson RK, Huber BT, Hemleben C, Berggren WA. 2006, Frederiksberg, VA: Cushman Foundation for Foraminiferal Research, 433-460.Google Scholar
- Bolli HM: Planktonic foraminifera from the Eocene Navet and San Fernando formations of Trinidad, B.W.I. Studies in Foraminifera: United States National Museum Bulletin. Edited by: Loeblich Jr A, Tappan, H. 1957, 215: 97-124.Google Scholar
- Blow WH, Banner FT: The Mid-Tertiary (Upper Eocene to Acquitanian) Globigerinaceae. Fundamentals of Mid-Tertiary Stratigraphical Correlation. Edited by: Eames FE, Banner, FT, Blow, WH, Clarke WJ. 1962, Cambridge: Cambridge University Press, 2: 287-331.Google Scholar
- Toumarkine M, Bolli HM: Evolution de Globorotalia cerroazulensis (Cole) dans l'Eocene moyen et superieur de Possagno (Italie). Revue de Micropaleontologie. 1970, 13: 131-145.Google Scholar
- Bolli HM, Saunders JM: Oligocene to Holocene low latitude planktic foraminifers. Plankton Stratigraphy. Edited by: Bolli HM, Saunders JM, Perch-Nielsen K. 1985, Cambridge: Cambridge University Press, 155-1621.Google Scholar
- Toumarkine M: Middle and Late Eocene planktonic foraminifera from the northwestern Pacific Ocean, Leg 32 of the Deep Sea Drilling Project. Initial Reports of the Deep Sea Drilling Project. Edited by: Larson RL, Moberley R. 1975, Washington D.C.: U.S. Government Printing Office, 40: 679-721.Google Scholar
- Toumarkine M: Planktonic Foraminiferal Biostratigraphy of the Paleogene of Sites 360 to 364 and the Neogene of Sites 362A, 363, and 364 Leg 40. Initial Results of the Deep Sea Drilling Project. Edited by: Bolli HM, Ryan WBF. 1978, 40:Google Scholar
- Peres-Neto PR, Jackson DA, Somers KM: How many principal components? stopping rules for determining the number of non-trivial axes revisited. Comp Stat Data Anal. 2005, 49 (4): 974-997. 10.1016/j.csda.2004.06.015.View ArticleGoogle Scholar
- Hayton JC, Allen DG, Scarpello V: Factor Retention Decisions in Exploratory Factor Analysis: a Tutorial on Parallel Analysis. Organ Res Methods. 2004, 7 (2): 191-205. 10.1177/1094428104263675.View ArticleGoogle Scholar
- Horn JL: A rationale and a test for the number of factors in factor analysis. Psychometrika. 1965, 30 (2): 179-185. 10.1007/BF02289447.View ArticlePubMedGoogle Scholar
- Flury BA: Principal points. Biometrika. 1990, 77 (1): 33-41. 10.1093/biomet/77.1.33.View ArticleGoogle Scholar
- Klingenberg CP, Froese R: A multivariate comparison of allometric growth patterns. Syst Biol. 1991, 40 (4): 410-419.View ArticleGoogle Scholar
- Burnham KP, Anderson DR: Model selection and multimodel inference. a practical information-theoretical Approach. 2002, New York.: Springer-VerlagGoogle Scholar
- R Development Core Team: R: A language and environment for statistical computing. 2009, R Foundation for Statistical Computing, Vienna, Austria, [http://www.R-project.org]Google Scholar
- Filzmoser P, Garrett RG, Reimann C: Multivariate outlier detection in exploration geochemistry. Comput Geosci. 2005, 31: 579-587. 10.1016/j.cageo.2004.11.013.View ArticleGoogle Scholar
- Crisci JV: One-Dimensional Systematist: Perils in a Time of Steady Progress. Syst Bot. 2006, 31 (1): 217-221. 10.1600/036364406775971859.View ArticleGoogle Scholar
- Zelditch ML, Swiderski DL, Sheets DH, Fink WL: Geometric Morphometrics for Biologists: A Primer. 2004, San Diego, CA: Academic PressGoogle Scholar
- Hull PM, Norris RD: Evidence for abrupt speciation in a classic case of gradual evolution. Proc Natl Acad USA. 2009, 106 (50): 21224-21229. 10.1073/pnas.0902887106.View ArticleGoogle Scholar
- Stanley SM: An Explanation for Cope's Rule. Evolution. 1973, 27 (1): 1-26. 10.2307/2407115.View ArticleGoogle Scholar
- Klingenberg CP: Heterochrony and allometry: the analysis of evolutionary change in ontogeny. Biol Rev. 1998, 73 (1): 79-123. 10.1017/S000632319800512X.View ArticlePubMedGoogle Scholar
- Foote MJ: Nearest-neighbor analysis of trilobite morphosapce. Syst Zool. 1990, 39 (4): 371-382. 10.2307/2992357.View ArticleGoogle Scholar
- MacKay DJC: Information Theory, Inference, and Learning Algorithms. 2003, Cambridge, UK: Cambridge University PressGoogle Scholar
- Gower JC: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 1966, 53 (3-4): 325-338. 10.1093/biomet/53.3-4.325.View ArticleGoogle Scholar
- Box GEP, Cox DR: An analysis of transformations (with discussion). J Roy Stat Soc B B. 1964, 26 (2): 211-252.Google Scholar
- Dépraz A, Hausser J, Pfenninger M: A species delimitation approach in the Trochulus sericeus/hispidus complex reveals two cryptic species within a sharp contact zone. BMC Evol Biol. 2009, 9: 171-180. 10.1186/1471-2148-9-171.PubMed CentralView ArticlePubMedGoogle Scholar
- Venables WN, Ripley BD: Modern Applied Statistics with S-PLUS. 1999, New York: Springer-VerlagView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.