Genome-wide evidence for positive selection and recombination in Actinobacillus pleuropneumoniae
© Xu et al; licensee BioMed Central Ltd. 2011
Received: 9 December 2010
Accepted: 13 July 2011
Published: 13 July 2011
Skip to main content
© Xu et al; licensee BioMed Central Ltd. 2011
Received: 9 December 2010
Accepted: 13 July 2011
Published: 13 July 2011
Actinobacillus pleuropneumoniae is an economically important animal pathogen that causes contagious pleuropneumonia in pigs. Currently, the molecular evolutionary trajectories for this pathogenic bacterium remain to require a better elucidation under the help of comparative genomics data. For this reason, we employed a comparative phylogenomic approach to obtain a comprehensive understanding of roles of natural selective pressure and homologous recombination during adaptation of this pathogen to its swine host.
In this study, 12 A. pleuropneumoniae genomes were used to carry out a phylogenomic analyses. We identified 1,587 orthologous core genes as an initial data set for the estimation of genetic recombination and positive selection. Based on the analyses of four recombination tests, 23% of the core genome of A. pleuropneumoniae showed strong signals for intragenic homologous recombination. Furthermore, the selection analyses indicated that 57 genes were undergoing significant positive selection. Extensive function properties underlying these positively selected genes demonstrated that genes coding for products relevant to bacterial surface structures and pathogenesis are prone to natural selective pressure, presumably due to their potential roles in the avoidance of the porcine immune system.
Overall, substantial genetic evidence was shown to indicate that recombination and positive selection indeed play a crucial role in the adaptive evolution of A. pleuropneumoniae. The genome-wide profile of positively selected genes and/or amino acid residues will provide valuable targets for further research into the mechanisms of immune evasion and host-pathogen interactions for this serious swine pathogen.
In the evolutionary history of many microorganisms, positive selection and homologous recombination are two indispensable driving forces for adaptation to new niches. Both of them contribute to the genetic variations that might influence the population diversification and adaptation of pathogenic microorganisms [1, 2]. Recent studies on the genome-wide evolutionary dynamics have highlighted the important roles of selection and recombination in the molecular evolution of bacterial pathogens, including Escherichia coli , Listeria monocytogenes , Salmonella spp. , Streptococcus spp. , and Campylobacter spp. . These analyses have revealed that a certain number of protein-coding genes subject to natural selection pressure are usually involved in the dynamical interactions between host and pathogen, especially in the immune and defense-associated functions . Diversifying selection operating on these genes may be caused by pathogen-host co-evolutionary arms race [7, 8].
In the present study, d N/d S-based methods were applied to detect evidence of genome-wide positive Darwinian selection. Estimating the ratio (ω) of the rate of nonsynonymous nucleotide substitutions (d N) to that of synonymous substitutions (d S) is a powerful approach for measuring selective pressure on the protein-coding level: ω = 1, < 1, > 1 indicate neutral evolution, purifying (negative) selection, and positive (adaptive) selection, respectively [9, 10]. The codon models further developed by Nielsen and Yang allow variation in ω among sites , which have an extensive capability to find evidence for adaptive evolution in most functional genes where only a small fraction of amino acid sites are subject to strong positive selective pressure . Thus far this approach has been widely used for genome-wide selection analyses in pathogenic viruses, bacteria, and eukaryotes [9, 13]. A substantial number of genes encoding highly variable antigens are identified to undergo adaptive selection particularly on some functional sites for evasion of host immunity [[1, 4, 14]].
Actinobacillus pleuropneumoniae, a Gram-negative coccobacillus belonging to the Actinobacillus genus of Pasteurellaceae family, is a strictly swine pathogen and colonizes in the upper respiratory tract of porcine . This pathogen has caused an economically severe disease characterized by pulmonary lesions, pleuritis, and pericarditis in pigs . According to the differences in capsular polysaccharides, A. pleuropneumoniae has been divided into 15 serovars . The recent comparative genomics studies through both high-throughput approaches of genome sequencing and microarray have depicted the compositions of the pan-genome and confirmed the contribution of genes loss or gain to the diversity in virulence and serovar of A. pleuropneumoniae [18, 19]. However, besides large genetic variations resulting from DNA acquisition and genome reduction, small sequence differences occurring in the conserved genes, including point mutations, insertions/deletions (indels), and intragenic recombination, may also play a crucial part in the alteration of antibiotic resistance, pathogenicity and immunogenicity [20, 21]. But to date, no research pays enough attention to the linkages between genetic alterations and putative functional roles in intraspecies conserved genes of A. pleuropneumoniae at the whole genome level.
In order to further trace evolutionary trajectories on the core genome of pathogenic bacterium A. pleuropneumoniae, we employed a genome-wide analyses approach to investigate the effects of natural selection and homologous recombination operating on the coding genes. Our analyses focused on the evolutionary characterizations of core genome genes that are shared by 12 A. pleuropneumoniae genomes. Many genes were shown to be under strong positive selective pressure and primarily associated with the fitness and immunogenic properties of this swine pathogen.
Genome sequences of A. pleuropneumoniae used in this study
GenBank accession no.
Genome size (Mbp)
No. of CDS (> 49 codons)
Frameshift mutations (indels of a number of nucleotides not divisible by three) can lead to high nonsynonymous substitution rates, resulting in more false positive results when positive selection was estimated based on d N/d S ratios . To avoid incorrect indels in the codon alignments, multiple sequence alignments were initially performed with amino acid sequences from each gene cluster, followed by conversion to the corresponding codon alignments using custom Perl scripts. The coding sequences located at the beginning or end of the contigs appeared to be more prone to frameshift sequencing errors. Therefore, we further assessed the quality for each alignment through obtaining the following information: overall identity, and identity in the first 30 nt and last 30 nt per alignment. The codon alignment sequences that contain frameshift mutations were checked and edited manually in the software MEGA4  if identity is low.
According to the method as defined by Nei and Gojobori , the number of synonymous nucleotide substitutions per synonymous site (d S) and the number of nonsynonymous nucleotide substitutions per nonsynonymous site (d N) were estimated for the resulting gene alignments using the program SNAP . Gene-by-gene number of informative sites and genetic diversity were obtained from the output of the PhiPack program . The analyses for the codon usage variation was performed by computing the effective number of codons (Nc), which is a general measure of bias from equal codon usage in a gene. The Nc value ranges from 20 for the strongest bias (where only one codon is used for each amino acid) to 61 for no bias [27, 28]. The calculation of Nc were implemented in the program CodonW 1.4 (http://codonw.sourceforge.net/).
Since recombining fragments among aligned codon sequences have a profound effect on the detection of the positively selective evidence , we first tested for recombination signals between sequences in the alignment of orthologous genes. Four statistical procedures GENECONV , pairwise homoplasy index (PHI) , maximum χ 2  and neighbor similarity score (NSS)  were applied to discover the homologous recombination signals. Besides GENECONV version 1.81, the other three methods were implemented in the PhiPack package . For the analyses of GENECONV, the parameter g-scale was set to 1, which allows mismatches within a recombining fragment. The p-values for inner fragments using 10,000 random permutations were used to indicate the significance of putative recombinant regions. For maximum χ 2, a fixed window-size of 2/3 the number of polymorphic sites was used. For PHI, the window size was set to 100 nucleotides. Simulated p-values were estimated based on 1,000 permutations for PHI, maximum χ 2 and NSS.
Maximum likelihood (ML) phylogenetic trees were reconstructed for each gene in the dataset of the core genome genes using the PhyML program . A general time-reversible (GTR) model of nucleotide substitution with the ML estimates for gamma distributed rate heterogeneity of four categories (Г4) and a proportion of invariable sites were used in all tree reconstruction methods. The resulting topologies of ML trees were applied to the subsequent selection analyses.
To detect selective pressure acting on each coding gene, the rates of synonymous and nonsynonymous substitutions were estimated site-by-site using the codeml program from the PAML 4.2b package . According to the topology of the resulting ML tree per gene alignment, two site-specific models that allow variable nonsynonymous (d N) and synonymous (d S) rate ratios (ω = d N/d S) among codons were applied to analyze our data set: M1a (NearlyNeutral) and M2a (PositiveSelection). Null hypothesis model M1a was nested with alternative selection model M2a. The latter model adds an extra site class for a fraction of positively selected amino acid sites with ω > 1; whereas models M1a only allows site classes with ω varying between 0 and 1 [10, 35]. A likelihood ratio test (LRT) was carried out to infer the occurrence of sites subject to positive selective pressure through comparing M1a against M2a. Three replicates were run with codeml and the maximum likelihood values for each model were used in the LRT. The LRT statistic (twice the log-likelihood difference between the null and the alternative models) was compared with the χ 2 distribution with two degrees of freedom. The Bayes empirical Bayes approach was employed to identify positively selected sites under the likelihood framework .
The web server PSORTb v3.0 was used to predict bacterial protein subcellular localization . Integral beta-barrel outer membrane proteins were predicted by BOMP . The three dimensional structure model of the protein encoded by the gene that showed evidence for positive Darwinian selection was modeled using the Phyre server . The sites subject to positive selective pressure were mapped onto the structure and visualized by PyMol (http://www.pymol.org/).
Multiple testing correction was performed to control for Type I errors according to the approach presented by Benjamini & Hochberg . For all genes tested for recombination and positive selection, q-values were calculated from p-values using the R package qvalue with the proportion of true null hypothesis set to 1 (π0 = 1) . A false discovery rate (FDR) of 10% was used for the recombination analyses. As the tests used for detecting positive selection was conservative , an FDR of 20% was set.
The non-parametric Mann-Whitney U-test was employed to determine the significance level for the differences among the selected continuous variables (i.e., d N, d S, codon bias and nucleotide diversity) between a given COG functional categories and all other categories. Binomial test was used to estimate association between each COG category and evolutionary forces (i.e. positive selection and/or homologous recombination); Bonferroni corrections for multiple comparisons were performed according to the number of one-sided tests. The significance level was set to 5% in this study. All statistic analyses were carried out using Perl scripts and R 2.11.1 .
In our recent work , 2,531 orthologous genes and 772 strain-specific genes have been identified in the pan-genome of 12 A. pleuropneumoniae strains using BlastClust. The above data set was used to further decode phylogenomic characterizations of pathogenic A. pleuropneumoniae. The evidence for homologous recombination and natural selection pressure whether operate on the conserved coding genes was estimated at the present study. After manually editing the aligned gene sequences and removing the low quality ones, a data set of sequence alignments of 1,960 orthologs was selected out, 81% (n = 1,587) of which were core genes that are present one copy per genome and the remaining (n = 373) were distributed genes present in at least four genomes.
The codon bias for each orthologous gene was measured by the effective number of codons (Nc value) calculated by CodonW . The reduction in Nc indicates strong bias that significantly correlates with high gene expressivity . A. pleuropneumoniae genes in the COG functional categories "Energy production and conversion", "Translation", "Amino acid transport and metabolism", "Nucleotide transport and metabolism", and "Carbohydrate transport and metabolism" were evident to have higher codon usage bias (P < 0.001, P < 0.001, P = 0.003, P = 0.001, and P < 0.001, respectively; one-tailed U-test) compared with genes in other COG categories. As is well known, genes bearing stronger codon bias are likely to be highly expressed and have housekeeping features [3, 45]. So, high codon bias of genes present in the five COG categories is likely to elucidate the necessity of relevant coding products for implementing fundamental life cycle and essential physiological activities of A. pleuropneumoniae.
The rates of synonymous (d S) and nonsynonymous (d N) nucleotide substitutions among different functional categories for A. pleuropneumoniae genes
Number of genes analyzed
dS (± se)b
dN (± se)c
Energy production and conversion
44.2 × 10-3 (± 5.2 × 10-3)
2.7 × 10-3 (± 0.5 × 10-3)
Cell cycle control and cell division
59.7 × 10-3 (± 13.0 × 10-3)
3.3 × 10-3 (± 0.8 × 10-3)
Amino acid transport and metabolism
80.1 × 10-3 (± 6.6 × 10-3)
4.1 × 10-3 (± 0.4 × 10-3)
Nucleotide transport and metabolism
53.7 × 10-3 (± 8.3 × 10-3)
2.5 × 10-3 (± 0.4 × 10-3)
Carbohydrate transport and metabolism
56.8 × 10-3 (± 6.5 × 10-3)
3.2 × 10-3 (± 0.6 × 10-3)
Coenzyme transport and metabolism
59.5 × 10-3 (± 7.7 × 10-3)
4.7 × 10-3 (± 0.7 × 10-3)
Lipid transport and metabolism
61.4 × 10-3 (± 11.3 × 10-3)
3.7 × 10-3 (± 0.9 × 10-3)
42.9 × 10-3 (± 4.5 × 10-3)
2.1 × 10-3 (± 0.3 × 10-3)
69.4 × 10-3 (± 11.8 × 10-3)
3.8 × 10-3 (± 0.8 × 10-3)
Replication, recombination and repair
83.3 × 10-3 (± 11.8 × 10-3)
6.2 × 10-3 (± 1.3 × 10-3)
Cell membrane and envelope biogenesis
63.3 × 10-3 (± 10.8 × 10-3)
4.8 × 10-3 (± 0.8 × 10-3)
Posttranslational modification, protein turnover, chaperones
55.3 × 10-3 (± 7.2 × 10-3)
3.6 × 10-3 (± 0.9 × 10-3)
Inorganic ion transport and metabolism
63.5 × 10-3 (± 6.1 × 10-3)
3.9 × 10-3 (± 0.4 × 10-3)
Secondary metabolites biosynthesis, transport and catabolism
34.3 × 10-3 (± 11.4 × 10-3)
2.7 × 10-3 (± 0.9 × 10-3)
General function prediction only
65.7 × 10-3 (± 6.1 × 10-3)
4.6 × 10-3 (± 0.5 × 10-3)
73.6 × 10-3 (± 9.1 × 10-3)
4.7 × 10-3 (± 0.6 × 10-3)
Signal transduction mechanisms
39.5 × 10-3 (± 8.6 × 10-3)
2.0 × 10-3 (± 0.4 × 10-3)
Intracellular trafficking, secretion and vesicular transport
66.4 × 10-3 (± 16.5 × 10-3)
7.0 × 10-3 (± 3.5 × 10-3)
63.6 × 10-3 (± 20.1 × 10-3)
3.9 × 10-3 (± 1.5 × 10-3)
Not in COGs
56.1 × 10-3 (± 5.9 × 10-3)
6.6 × 10-3 (± 0.8 × 10-3)
Among the 1,587 orthologous core genes, 2% (29 genes) had no occurrence of nucleotide substitutions and thus were not further investigated for evidence of homologous recombination. Furthermore, among the remaining genes, 197 gene alignments that contain few informative sites less than two could not be analyzed with programs in PhiPack and were removed from the ortholog sets. Finally, 86% of total core genes were selected to conduct the subsequent recombination analyses through four approaches. The evolutionarily conserved core genes (n = 226) were summarized (Additional file 1) and the biological functions carried out by their coding products may be essential for the survival of A. pleuropneumoniae. Notably, conserved genes were significantly enriched in the COG category "Translation" with a low Bonferroni corrected p-value (P < 0.001; Binomial test); this result was consistent with low d S and d N values mentioned before. These translation-associated protein-coding genes are generally involved in the fundamental cellular activity and thus hardly have any changes at the amino acid level as a result of functional constraints. Overall, among 12 A. pleuropneumoniae genomes, 822 orthologous core genes (52% of all 1,587 core genome genes) were identified to have significant evidence for recombination (FDR < 10%) that was detected by at least one of the four tests (Additional file 2). A total of 493, 675, 659 and 559 orthologs were identified to have recombination signals using GENECONV, Maximum χ 2, NSS and PHI, respectively. Additionally, a total of 149, 148, 160, and 365 orthologs exhibiting recombination signals were identified by using one, two, three, and all four recombination tests, respectively.
Genes that show evidence for positive Darwinian selection
Cluster ID a
Putative function of coding products
Positively selected sites f
Periplasmic nitrate reductase
Anaerobic glycerol-3-phosphate dehydrogenase subunit A
Transporter, gluconate/H+ symporter (GntP) family
Gamma-glutamyl phosphate reductase
Arginine transport ATP-binding protein
3-isopropylmalate dehydratase small subunit
9, 99, 554
13, 27, 392
Oxygen-independent coproporphyrinogen-III oxidase
38, 136, 428
23S rRNA (uracil-5-)-methyltransferase
Transcriptional accessory protein
Transcriptional regulatory protein
284, 287, 292
DNA topoisomerase 4 subunit A
188, 204, 234
Capsule polysaccharide modification protein
Capsule polysaccharide modification protein
13, 136, 366
Outer membrane protein P2-like protein
306, 317, 320
Large-conductance mechanosensitive channel
43, 44, 50, 91, 617, 909
17, 190, 514, 551
Formate-dependent nitrite reductase complex subunit
Tellurite resistance protein
Sulfate transport system permease protein
Paraquat-inducible protein B
259, 318, 384
Major facilitator transporter
195, 238, 416, 417
Thiazole biosynthesis protein
N-acetylmuramic acid 6-phosphate etherase
54, 99, 293
Nucleoside-diphosphate sugar epimerase
Hydrolases of the HAD superfamily
Membrane protein affecting hemolysin expression
82, 127, 546
395, 567, 569
ABC transporter, periplasmic binding protein
97, 104, 134, 355
199, 275, 283
TDP-Fuc4NAc:lipid II Fuc4NAc transferase
70, 92, 182, 183, 289
Notably, there was no obvious discrepancy for the values of d S between genes under positive selection and the remaining genes; whereas the d N values together with the number of informative sites and genetic diversity were significantly higher in the positively selected genes (P < 0.001, P < 0.001, P = 0.023; one-sided U-test). No association between positive selection and COG categories was observed, as the number of positively selected genes is low in each role category.
Among 57 positively selected genes, 24 genes also showed significant evidence for homologous recombination detected by all four recombination tests. Furthermore, 41 genes under positive selection pressure showed evidence for recombination identified by at least one test. It indicates that positive selection should be associated with intragenic recombination, as recombination can lead to phylogenic incongruence and highly false positives when selective pressure on protein coding sequences was estimated [3, 29].
Gene acquisitions and losses that contribute to the virulence and serotypic diversification of A. pleuropneumoniae have been depicted in detail [18, 19], but our understanding on small genetic variations caused by positive selection and homologous recombination, which also factually influence the evolutionary trajectories of protein coding genes, has not been well considered for this swine pathogen so far. In this report, we chose 12 genomes of A. pleuropneumoniae to study the evolutionary driving forces acting on the core genome of this animal pathogen using a comparative phylogenomic approach.
Tests for intragenic homologous recombination and positive selection were performed with 1,587 orthologous genes present in the core genomes of twelve strains of A. pleuropneumoniae. Overall, our results indicated that about a quarter of the genes in A. pleuropneumoniae core genome showed significant evidence for intragenic recombination. In comparison, core-genome recombination was also evident in both species of the genus Streptococcus, as 18% and 37% of the core genome for S. agalactiae and S. pyogenes, respectively, showed evidence for homologous recombination . Notably, in A. pleuropneumoniae, two COG categories "Replication, recombination and repair" and "Amino acid transport and metabolism", which both presented high values of d S and d N, were favored by intragenic recombination.
On the other hand, 57 A. pleuropneumoniae genes, accounting for approximately 3.6% of the core genome, were identified to be undergoing positive selection. Another similar study on the identification of genes under positive selection in E. coli reported that 0.7% of 3,505 genes found in six E. coli genomes showed evidence for positive selection and no evidence for recombination . Like other pathogenic bacteria, a substantial number of positively selected genes in A. pleuropneumoniae encode protein products involved in the biogenesis and structural components of bacterial cell wall and/or outer membrane. These genes are likely to be associated with co-evolutionary arms races between pathogenic microorganisms and hosts. To further decipher the roles of evolutionary pressure operating on the core genome of A. pleuropneumoniae, we analyzed the functional properties of the positively selected genes and potentially important residues subject to positive selection.
We found that many protein products encoded by the positively selected genes were exposed on the cell surface or involved in structural constituents of bacterial cell wall. Some of these proteins have been reported to be important virulence factors associated with bacterial adherence, colonization and persistence. Therefore, it suggests that the genes under diversifying selection may dynamically interact with the host immune and defense systems.
Bacterial surface polysaccharides, which are often involved in adherence and colonization, may be directly exposed to the host immune pressure. Three A. pleuropneumoniae genes (hcsA, hcsB, and wecF) participated in biogenesis of surface polysaccharides showed significant evidence for positive selection. The products of selected genes hcsA and hcsB code for capsule polysaccharide modification proteins that share 63% and 64% identity with Haemophilus influenzae HcsA and HcsB, respectively, which facilitate transport of capsular polysaccharide across outer membrane and are essential for bacterial virulence . Besides, the positively selected gene wecF codes for a 4-alpha-L-fucosyltransferase and is located at a wec locus which has highly conserved colinearity in all A. pleuropneumoniae genomes. The products of wecF together with other wec genes exhibit high similarity to the E. coli K12 homologues that are involved in the assembly of a cell surface glycolipid .
The other gene apaA encoding an antigenic membrane lipoprotein that could provide cross-protection against heterologous A. pleuropneumoniae serovars , was also under strong positive selection (q-value = 0.062). The above analyses strongly demonstrated that the positively selected genes involved in the biosynthesis and structural composition of cell surface/wall have undergone adaptive functional changes, perhaps allowing bacterial pathogens to escape recognition by the host immune system and phages. Such phenomena have already been proposed by the previous studies of natural selection on the E. coli genome [1, 14].
The proteases of A. pleuropneumoniae have been reviewed to be one of important virulence factors and contribute to pathogenesis . Overall, 4 protease genes (i.e., ptrA, lonH, sppA and tldD) showed significant evidence for positive selection. The precise function of these protease genes identified here, to our knowledge, was not well understood for this pathogen. However, proteolytic enzymes are pivotal to the secretion processes of Gram-negative pathogens and several of them have been described as attractive drug targets in other pathogens, e.g. ClpP  and Lon .
Our findings indicated that intragenic homologous recombination and positive Darwinian selection, unsurprisingly, indeed play crucial roles in the evolution of pathogenic A. pleuropneumoniae. In genes with extensive functional classification we found genes involved in the formation of cell surface/membrane are favored by the positive selective pressure. The adaptive changes in these positively selected genes and/or residues likely attribute to dynamic interaction caused by the host immune and defense systems. Of course, the diversifying selective forces of genes encoding metabolic functions may be also advantage for improving bacterial fitness in response to a variety of environmental signals. More experimental works are required for verifying the functions of these adaptive genes in future. Overall, the genetic evidence of positive selection will provide promising targets for further researches in the mechanisms of immune evasion and the host-pathogen interaction in A. pleuropneumoniae.
This work was supported by the National Natural Science Foundation of China (30901075).
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.