The use of multiple molecular data sources enables the comparison of the evolutionary history of each gene and potentially more accurate estimation of the evolutionary history of the focal organisms (e.g. ). In the case of B. harrisoni, the combination of sequence data from four genes identified which genes are useful for identifying lineages with this level of divergence (e.g. COI and PEPCK) and which genes should be avoided in future studies (e.g. 16S and EF1α). Furthermore, the combined sequence data resulted in a more resolved and well-supported phylogenetic hypothesis than when each of the individual datasets were analysed alone (Figures 2 and 3). This result is highlighted by the partitioned Bremer support values which show little incongruence (i.e. absence of negative and positive Bremer support values on a particular node) and, when the Bremer values are combined, provide moderate to strong support for the relationships of the major lineages (Figure 2), suggesting that single-gene datasets should be supplemented wherever possible [59, 60]. Thus, the results presented here provide substantiate the approach of previous studies of the genus Baetis[15, 16] and suggest means of refining them.
Gene tree incongruence
The results produced from the 16S gene were incongruent (but not significantly so) with the COI and nuclear (PEPCK) results, due to the recovery of a paraphyletic clade (SA East + MAL) (Table 2, Figure 3a). Possible explanations for incongruence include a lack of appropriately informative sites, substitution saturation, introgression, paralogy and/or incomplete lineage sorting. The 16S gene is second to COI in its informative value (Table 1), although the data suffer from higher levels of homoplasy relative to the COI data (Table 1). Plots of transitions and transversions against genetic distance (not illustrated) showed that the 16S data are not saturated. The mitochondrion is inherited as a single unit, thus the incongruence between the 16S and COI data cannot be due to introgression, but may be due to incomplete lineage sorting of the 16S gene.
The results produced from the EF1α gene were significantly incongruent with both the mitochondrial (COI and 16S) and nuclear (PEPCK) results. Possible explanations for the observed pattern are limited to introgression and/or incomplete lineage sorting [61, 62]. Although EF1α is known to occur in two copies in some insects [63, 64], the lack of multiple PCR bands and the low between-clade divergence (0.03) estimated between the two clades would imply that only one copy has been amplified in this study. Although limited introgression is possible between a few samples in this study, it is not a viable explanation for the geographically distant samples, which were genetically indistinguishable (Figure 3d). Introgression in this case is also unlikely as there would have to have been multiple separate hybridisation events to result in the five distinct clades obtained from the COI and PEPCK data. Incomplete lineage sorting is the most likely explanation for the incongruence obtained between gene trees (Figure 3a-d), resulting from the random retention and extinction of alleles between species .
Potential causes of lineage diversification
Although the dispersal of mayflies is thought to be limited to nearby water bodies because they are weak fliers with short adult lifespans , it has been shown that long-distance dispersal is more prevalent than previously thought . The lack of within-clade isolation by distance apparent in this study (when excluding geographically isolated samples from Malawi and Zambia) would suggest high levels of gene flow within the three lineages in South Africa.
The role of catchments and their corresponding watersheds in structuring the genetic history of organisms in southern Africa is well documented (e.g. redfin barbs: [66, 67]; atyid shrimps: [68, 69]; and cicadas: [70, 71]), although the impact of different catchments on invertebrates with aquatic stages is varied, with evidence both for (e.g. [72–74]) and against (e.g. [75, 76]) a catchment effect, dependent primarily on the dispersal ability of the organism . Although the SA East lineage is not found in the same catchments as the SA West lineage, the combination of the widespread distribution of the SA East and SA West lineages over multiple primary catchments and the range overlap between the SA Ced and SA West lineages would suggest that there is little-to-no effect of catchments influencing population structuring.
The tentative dating analyses are based solely on two estimates of the COI substitution rate and thus are to be interpreted with caution, but they provide plausible first estimates for the group. The estimated time to most recent common ancestor (TMRCA) for each clade is not markedly affected by the tree prior (Tables 3 & 4; Additional file 2: Figure S1), but the between-clade divergence estimates depend on the choice of tree prior with a Yule prior favouring younger estimates (Tables 3 & 4; Additional file 2: Figure S1). In this case the combination of Bayes Factor support for the Yule prior and the proposed status of each clade as recently diverged but distinct phylogenetic species support the use of the between-species Yule prior over the within-species Coalescent prior and the more complex between-species with extinction Birth-Death prior.
Between clade divergence estimates under the Yule prior suggest that the major cladogenic events within the group took place during the Pliocene to mid-Miocene (Tables 3 & 4; Additional file 2: Figure S1), a pattern in common with previous studies in the region, focussing on aquatic or semi-aquatic freshwater invertebrates [78–82] and fish , strongly suggesting that common processes (outlined below) may have been responsible.
The TMRCA estimated within each lineage ranged from the youngest (MAL: 0.5–0.7 mya) to the oldest (SA East: 1.5–1.7 mya) when using the faster COI substitution rate, suggesting population fragmentation within the Plio-Pleistocene. Within the Plio-Pleistocene, aridity and seasonality in southern Africa increased with the intensification of the Benguela current and the formation of the winter rainfall zone [83, 84] and global climate fluctuations in response to Milankovitch oscillations  which resulted in glacial cycles throughout the Pleistocene. These glacial cycles, and the associated aridity linked to glacial periods , probably resulted in the repeated fragmentation and bottlenecking of Baetis populations within southern Africa and have been previously cited as a population isolating mechanism within the region [71, 87].
The lineages sampled in this study correspond to environmentally identified (pH-based) and summer versus winter rainfall variation. The SA Ced and SA West lineages comprise samples from streams that flow through acidic Table Mountain Sandstone geologies whereas the SA East, MAL and ZAM lineages comprise samples from streams with an alkaline pH. The difference in the pH range of the rivers is primarily a result of the local geology [14, 25], so river pH is unlikely to have changed within the timeframe (Mio-Pliocene) required to result in the isolation of lineages. The response of many aquatic invertebrates to pH is well studied and it is known that the pH of a river affects its community composition [88, 89]. Thus the lineages discussed in this study are each likely to have a restricted pH tolerance and this lineage-specific pH tolerance is the most likely mechanism for the continued genetic isolation between the parapatric SA West and SA East lineages of B. harrisoni. The Krom River (SA East, Figure 1, see Additional file 1: Table S1) and Elands River (SA West, Figure 1, see Additional file 1: Table S1) are approximately 30 km apart, yet the shift in geological composition results in a drastic change in pH over the short distance. The Elands River flows through areas of Table Mountain Sandstone (resulting in poorly-buffered acid waters) and the Krom River flows through Bokkeveld (mostly shale and sandstone composition which results in well-buffered neutral-to-alkaline waters) . This mechanism requires experimental verification and cannot be invoked to explain the continued isolation between the sympatric SA West and SA Ced lineages.
Taxonomic status of clades
Although the South African material used in this study has been well sampled (Figure 1), the GMYC results must be interpreted with caution as there are large sampling gaps between South Africa and Malawi / Zambia and the GMYC model is sensitive to sampling strategies, which may result in artificial clustering due to a lack of intermediate haplotypes. Within South Africa the lineages are found both in sympatry (Figure 1: SA Ced and SA West) and parapatry (Figure 1: SA West and SA East) and are estimated to share a common ancestor at least 2.4 mya (Tables 3 & 4), suggesting five species under the phylogenetic species concept. The individual GMYC analyses resulted in the recognition of between 5 and 7 maximum likelihood (ML) entities, with the combined data favouring six ML entities. In analyses based on the COI, 16S and the combined data, the SA East lineage was divided into two ML entities. As these two ML entities were not well supported monophyletic clades in all three (COI, 16S and PEPCK) datasets, we have chosen a more conservative estimate of five monophyletic, well supported clades, corresponding to five phylogenetic species, here highlighted as: SA Ced, SA West, SA East, ZAM and MAL (Figure 2).
Furthermore the between-clade genetic distances estimated here are comparable to those found between distinct species both in this study (i.e. outgroup vs. ingroup) and in previous studies of Baetis species [1, 15, 16], providing strong evidence for the recognition of five species corresponding to each of the lineages mentioned using the phylogenetic species concept. Although subtle morphological variations have been observed in B. harrisoni, these findings require a thorough morphological investigation to assess whether there are diagnostic morphological characters and observable physiological adaptations to acid or alkaline pH conditions which are consistent and congruent with the molecular lineages.
The Afrotropical region is undoubtedly under-collected and material collected is often insufficiently studied. This may explain the low diversity of African Baetis species despite the high generic diversity in the Afrotropical Baetidae . Further sampling within Africa is needed to determine the geographical extent of each of the lineages within B. harrisoni sensu lato and to assess the diversity of the other Baetis species in the region (e.g. B. parvulus Crass, B. monikae Kopelke, B. permultus Kopelke and B. pseudogemellus Soldán).