Leucetta chagosensis specimens were collected by SCUBA diving on subtidal reef slopes. Specimens from the East Coast of Australia, the Coral Sea, Vanuatu, the Red Sea and Japan were collected by Gert Wörheide (GW); other specimens were received from various colleagues. Due to this fact, and because this taxon can be quite rare (e.g. in the Gulf of Aqaba and the Red Sea), sample sizes were limited in some localities, varying from two in Fiji to 10 in the Philippines individuals per site. Voucher sample numbers and localities are given in Additional file 1. Immediately after collection on deck, macroinvertebrate commensals were carefully removed, if present, and parts of the choanosome were cut into small pieces. Sometimes, small specimens were preserved whole. Samples were preserved in >90% ethanol or in silica-gel after being cut into small pieces, and were stored at -20°C or room temperature until extraction (see also [15, 53]). The external surface was avoided to minimize potential contamination. Voucher samples have been deposited at the Queensland Museum, Brisbane, Australia (indicated by the prefix "QMG" in Additional Table 1 see Additional File 1) or are held by G. Wörheide.
DNA extraction and PCR amplification
Genomic DNA was extracted from the choanosomal tissue using the DNeasy-Tissue Kit (Qiagen) according to the manufacturer's instructions. Internal transcribed spacer (ITS) and partial 28S (C2-D2 region ) rDNA sequences were generated according to previously published protocols [15, 21, 55].
The second intron of the ATP synthetase beta subunit gene (ATPSb -iII) was amplified using the following strategy. Initially, degenerate primers (ATPSbf1: 5'-CGT GAG GGH AAY GAT TTH TAC CAT GAG ATG AT-3'; ATPSbr1: 5'-CGG GCA CGG GCR CCD GGN GGT TCG TTC AT-3'; ) were used for PCR amplification of four samples from disparate geographic areas (Red Sea, Taiwan, GBR, Tuamotu). BLAST searches  were run to confirm that the amplified sequences were of poriferan origin. Exon/intron boundaries were determined by comparison with ATPS beta gene sequences from Genbank. The intron was determined to start about 2.3 kb downstream from the 5'-end of Exon 1 in Drosophila melanogaster (Position 3416; GenBank accession no. X86015). From an initial alignment, nested species-specific primers were designed (ATPSbf2: 5'-TTG TCT TGG ACA AGG AGG GG-3'; ATPSbr2: 5'-TCG TTC ATT TGA CCG TAC AC-3'), which were still located in the exon but were about 10–15 bp closer to the exon/intron boundary. These primers were then used for subsequent PCR amplification of the larger data set.
PCR reactions consisted of 2 mM MgCl2, 16 mM (NH4)2SO4, 67 mM Tris-Cl (pH 8.8 at 25°C), and 0.01% Tween-20. Each 25 μ l reaction included 1.0 μ l of DNeasy-extracted DNA of various concentration, 0.625 μl dNTPs (10 mM each), 1.5 mM of each primer, and 0.25 U Bio- Taq DNA polymerase (Bioline, Luckenwalde). PCR cycling conditions included an initial denaturation of 2 min at 94°C; 38 cycles of 94°C for 20 s, 54°C for 60 s, and 72°C for 50 s; and a final extension step at 72°C for 10 min. Amplicons were purified from Agarose gels using a silica-based method . PCR products were sequenced directly with ABI BigDye Terminator Cycle Sequencing (Version 3.1) on an ABI 3100, and some PCR products were subcloned into pGEM-T (Promega) according to the manufacturer's instructions. A minimum of three positive clones were sequenced in both directions using M13 vector primers to determine if more than two alleles were present, indicating paralogous copies. Amplicons were directly sequenced as above. Because the specific primers were very close to the exon/intron boundaries, the first 42 bases of the 5'- end and the first 53 bases of the 3'- end of the intron were not included in the final alignment.
Sequence assembly and alignment
Double-stranded sequences of novel sequences generated in this study were assembled with CodonCode Aligner , automatically aligned using ClustalW , and manually inspected and optimized using Se-Al v2.0a11 . Existing rDNA ITS sequence types of Leucetta chagosensis from previous studies [15, 21] (Genbank accession nos. AF458852–AF458870) were added to the alignment. Polymorphic sites in heterozygotes were detected using the 'find mutations' option in the CodonCode Aligner, and coded using IUPAC  ambiguous DNA characters. Alleles of heterozygotes that showed no length variation within individuals were resolved manually using a parsimony approach, in an attempt to minimize the number of alleles observed in the data set . Intragenomic polymorphisms in rDNA were resolved into two different sequence types per individual using the same approach, primarily to enable estimation of migration rates in MIGRATE (see below). Due to the low intraspecific diversity of the C2-D2 fragment, rDNA alignments were concatenated for subsequent phylogeographic analyses. All sequences from this study were submitted to the EMBL Nucleotide Sequence Database and can be accessed under the following accession numbers: [EMBL: AM850255–AM850677].
Intron heterozygous indel detection
We attempted to resolve length variant heterozygotes (LVH), which often cause problems when analyzing intron sequences , using Single-Stranded Conformational Polymorphism (SSCP) analyses , as described in . However, the intron was too long for alleles to be resolved using this method. Therefore, to obtain an overview of the distribution of LVH alleles on the estimated phylogenies, i.e. whether one heterozygote individual harbours alleles distributed in different larger clades on the estimated phylogeny, amplicons of several such heterozygotes from each larger clade were subcloned as above, sequenced, and both alleles were included in the alignment. Due to restricted resources, it was not possible to subclone and sequence both alleles of all heterozygous individuals.
Indels and minisatellite repeats were treated as missing data, and were not recoded for phylogenetic analyses because they represented autapomorphies for regional populations, and as such did not contribute to resolving phylogeographic structure.
All sequences were assembled and aligned as described above. Uncorrected p-distances were calculated using Paup*4 . Nucleotide diversities and GC contents were calculated, and tests of neutrality using Tajima's D  were carried out in DNAsp v4.1 . Nuclear loci are subject to recombination, potentially confounding phylogeny estimation . Consequently, recombination rates and events were estimated using RDP 2.0 , employing the default options with all methods implemented in the program suite (RDP, GENECONV, Bootscan, Chimera, SciScan). The program MODELTEST version 3.7  was used to find the model of DNA substitution that best fit the rDNA and intron data. The best-fit model selected for each sequenced region by hierarchical Akaike information criterion (AIC)  (ITS rDNA: TIM+G, partial 28S rDNA: HKY+I, intron: TrN+G) was used for a subsequent Bayesian phylogeny inference (BI) using MrBayes 3.1.2  with default priors and mixed models.
Phylogenies were estimated from the combined ITS/partial 28S rDNA alignment, the intron alignment, and a combined ITS/28S/intron alignment containing only specimens from which all three fragments could be obtained. For the latter, consensus sequences of ATPSb -iII alleles were used and concatenated with both rDNA fragments into one alignment. Alignments were collapsed to contain only unique sequence types/alleles in COLLAPSE 1.2  for phylogeny estimation, and intron sequences were additionally analyzed using the full number of detected alleles.
Three independent runs with one cold and seven heated Markov chains each per analysis were performed simultaneously in MrBayes 3.1.2 until the average standard deviation of split frequencies between the three runs dropped below 0.005 (lowered from the default value of 0.01 to improve chain convergence). Analyses were carried out with the MPI-enabled parallel version of MrBayes  on a 64-node Linux-cluster at the Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen (GWDG), using one processor for each of the 24 Markov chains per analysis. Batch files are available upon request. All MrBayes analyses were run at least twice to check for the consistency of results. The second analysis was allowed to run longer without a stop value, until the wall time of 48 hours was reached, to check if convergence of chains could be further improved. Trees were sampled every 1000th cycle, with a burn-in of 25% of sampled trees. The appropriateness of burn-in values was determined after the graphical display of likelihood values, and the convergence of chains was evaluated using AWTY . The remaining trees after burn-in were used to generate a 50% majority rule consensus tree, where posterior probabilities for internal branches were indicated by their sample frequency.
For comparison, Maximum Likelihood bootstrap analyses were conducted with GARLI 0.94  using a heuristic search with the default option, i.e. under the GTR model of nucleotide substitution, with gamma distributed rate heterogeneity, a proportion of invariant sites, and 100 bootstrap replicates. Phylogenies estimated from the intron-only and combined rDNA/intron data set were rooted with the Indian Ocean clades, as they represent a different biogeographic region (e.g. ). Outgroup rooting with intron sequences from a closely related species was not possible because such sequences were unalignable due to high evolutionary rates, and because the same intron in Pericharax heteroraphis, for example, is about 60% shorter .
Due to the low divergence of rDNA sequences, we estimated a statistical parsimony sequence type network  using the program TCS 1.21  with the default options. The maximum number of mutational steps that constitute a parsimonious connection between two sequence types was calculated with 95% confidence. An attempt at constructing a nested clade design  was not successful because no unambiguous nested design could be constructed due to a central loop that could not be resolved with confidence.
To obtain an overview of the level of ambiguity and phylogeographic congruency within and among loci and to visually explore the different signals contained in the data, the concatenated alignment was divided into rDNA and ATPSb -iII partitions. Each partition contained samples from 92 specimens from which both loci could be sequenced. Both loci were analyzed separately using Neighbor-Net , as implemented in SplitsTree4 , using uncorrected p-distances.
In brief, Neighbor-Net is a distance-based method that provides an effective tool to visualize and detect conflicting signals or alternative phylogenetic histories. First, a collection of weighted splits is constructed from a distance matrix, and then these splits are represented using a splits graph, where a totally compatible collection of splits would be precisely represented as a tree, but incompatible splits as cycles or boxes. Such incompatibilities might represent, among other things, hybridization, recombination, gene duplication, or in general, conflicting signals in the data. In general, the more (and larger) cycles/boxes connecting operational taxonomic units (OTUs) in a split graph, the more incompatibilities of splits and incongruences exist in the data.
Population genetic analyses
Analysis of molecular variance (AMOVA)  was conducted to estimate the significance of population structure at several hierarchical levels. Pairwise fixation indices (F
st)  and their significance were calculated for intron alleles only to estimate population differentiation because true heterozygotes cannot be distinguished from intragenomic variation in the rDNA cistron (see ), however, an AMOVA was carried out using the rDNA sequence types. Calculations were carried out in ARLEQUIN 3.1  using 10,100 random permutations for significance tests. All analyses were run at least twice to check for the consistency of results. Two sets of analyses were run for each data set separately, because Arlequin does not allow for unequal sample sizes among loci. First, sample localities were pooled into 15 geographic populations. Pooling of sample localities into populations on the Great Barrier Reef (GBR) followed the sections of the Great Barrier Reef Marine Park: N'GBR, Central GBR, Capricorn (GBR), Brisbane, Queensland Plateau; Taiwan, Guam, Okinawa, Philippines, Indonesia; Maldives, Red Sea; and PNG, Samoa/Fiji/Vanuatu, Polynesia. The 15 populations were grouped into four regional groups (separated by semicolons in the previous list: Australia, NW Pacific, S'Pacific, and Indian Ocean). Due to low sample sizes from some archipelagos in the SW Pacific (Fiji, Samoa) in the intron data set, those localities were pooled with samples from Vanuatu as one population in order to increase the statistical power. To enable comparison between the two loci, the same geographic population structure was employed for the rDNA data set, where more sequences were available from those archipelagos. A second analysis included only eight populations from the SW Pacific (Northern GBR, Central GBR, Capricorn, Brisbane, Queensland Plateau, PNG, Samoa/Fiji/Vanuatu, and Polynesia), grouped into two groups (Australia, S'Pacific), where sampling was more comprehensive both in terms of geography and sample sizes. A Mantel test was carried out using the Isolation by Distance Web Service  to test for correlations between spatial and genetic (F
st) distances , also using gene flow M calculated as (1/F
ST-1)/4 . The significance of the slope of the reduced major axis (RMA) regression was assessed by 30,000 randomizations.
Because of the acknowledged difficulties of using F
ST's to estimate gene flow , we also took a coalescent-based approach to estimate the parameters of populations , such as theta (θ = 4 N
e μ; where N
e is the effective number of individuals and μ is the mutation rate in mutations per generation), migration rates (M), and the number of effective migrants per generation (N
em). We used the coalescent-based Markov-Chain Monte Carlo method implemented in MIGRATE 2.1.7, which explicitly takes into account historical processes and asymmetrical migration/gene flow . The parallelized version of MIGRATE was compiled to run on the LINUX-cluster of the GWDG (see above), requesting several processors for each run depending on the numbers of replicates. Several independent runs were conducted to check for convergence, the consistency of results, and the shape of posterior distributions. The Bayesian search strategy was optimized for the following settings (parm files available on request): recorded genealogies [a]: 100,000; increment (record every x genealogy [b]: 500; and the number of concurrent chains [c]: 4, resulting in 2*108 visited (sampled) genealogies [a*b*c] with 10,000 discarded trees per chain (burn-in). For MIGRATE analyses, each diploid sponge individual was represented by two alleles/sequence types in the input file, whether it was homozygous or heterozygous. rDNA sequences showing IGPs were resolved into two sequence types using the parsimony approach outlined above, since MIGRATE does not allow ambiguously (IUPAC) coded nucleotide sites.
Due to the large geographic distances between some populations without intermediate sample localities, (e.g. Red Sea-Maldives-IWP) and the low sample sizes of some, we focussed our attention on estimating the migration rates among populations along the East-Australian coast, Papua New Guinea, and the southern Pacific only.