Evolution of competence and DNA uptake specificity in the Pasteurellaceae

Background Many bacteria can take up DNA, but the evolutionary history and function of natural competence and transformation remain obscure. The sporadic distribution of competence suggests it is frequently lost and/or gained, but this has not been examined in an explicitly phylogenetic context. Additional insight may come from the sequence specificity of uptake by species such as Haemophilus influenzae, where a 9 bp uptake signal sequence (USS) repeat is both highly overrepresented in the genome and needed for efficient DNA uptake. We used the distribution of competence genes and DNA uptake specificity in H. influenzae's family, the Pasteurellaceae, to examine the ancestry of competence. Results A phylogeny of the Pasteurellaceae based on 12 protein coding genes from species with sequenced genomes shows two strongly supported subclades: the Hin subclade (H. influenzae, Actinobacillus actinomycetemcomitans, Pasteurella multocida, Mannheimia succiniciproducens, and H. somnus), and the Apl subclade (A. pleuropneumoniae, M. haemolytica, and H. ducreyi). All species contained homologues of all known H. influenzae competence genes, consistent with an ancestral origin of competence. Competence gene defects were identified in three species (H. somnus, H. ducreyi and M. haemolytica); each appeared to be of recent origin. The assumption that USS arise by mutation rather than copying was first confirmed using alignments of H. influenzae proteins with distant homologues. Abundant USS-like repeats were found in all eight Pasteurellacean genomes; the repeat consensuses of species in the Hin subclade were identical to that of H. influenzae (AAGTGCGGT), whereas members of the Apl subclade shared the consensus ACAAGCGGT. All species' USSs had the strong consensus and flanking AT-rich repeats of H. influenzae USSs. DNA uptake and competition experiments demonstrated that the Apl-type repeat is a true USS distinct from the Hin-type USS: A. pleuropneumoniae preferentially takes up DNA fragments containing the Apl-type USS over both H. influenzae and unrelated DNAs, and H. influenzae prefers its own USS over the Apl type. Conclusion Competence and DNA uptake specificity are ancestral properties of the Pasteurellaceae, with divergent USSs and uptake specificity distinguishing only the two major subclades. The conservation of most competence genes over the ~350 million year history of the family suggests that lineages that lose competence may be evolutionary dead ends.


Background
Many bacteria are able to take up DNA from the environment [1]. DNA provides these naturally competent cells with nutrients (nucleotides, N and P), while recombination of incoming DNA with the cell's genome can also provide new genetic information. However, many aspects of the evolution of competence remain unclear.
Competence is widely distributed among bacteria, and some of the genes required for DNA uptake are shared between even distant relatives, suggesting an ancient common origin for competence. For example, the Gram positive bacteria Bacillus subtilis and Streptococcus pneumoniae and the Gram negative Neisseria gonorrhoeae and Haemophilus influenzae all require homologues of type four pilus proteins and of the ComEC/Rec2 membrane channel [1]. However, the regulatory processes controlling expression of these competence genes are very different in the different organisms [2]. Furthermore the distribution of natural competence is surprisingly sporadic; most naturally competent bacteria have many relatives, including other strains of the same species, that cannot be transformed under laboratory conditions (for examples see [3][4][5][6]). Two explanations seem equally plausible. First, competence might be ancestral to most major lineages but frequently lost (and possibly regained, under different regulation). Alternatively, competence might be frequently gained in independent lineages, e.g. if the genetic requirements for DNA uptake are simple and readily met by laterally transferred genes or by mutation of genes with related functions such as those associated with type IV pili.
The uptake specificity of some naturally competent bacteria can also guide inferences about the evolution of competence. Although many naturally competent bacteria will take up DNA fragments from any source with equal efficiency, members of some Gram-negative families take up DNA fragments from their own species much more efficiently than unrelated DNA. In the Pasteurellaceae and Neisseriaceae the molecular basis of this specificity is preferential binding of the uptake machinery to short DNA sequences present in thousands of copies in each species' genome. Such sequences are referred to as uptake signal sequences (USSs) in the Pasteurellaceae and DNA uptake sequences (DUSs) in the Neisseriaceae; they are not known in other naturally transformable bacteria [7,8].
The best-characterized uptake sequences are those of Haemophilus influenzae and Neisseria meningitidis and N. gonorrhoeae. The preferred sequences themselves appear to have little in common: the core H. influenzae USS is 5'-AAGT-GCGGT (5'-ACCGCACTT in the reverse orientation), with two AT-rich motifs on the 3' side of the standard orientation [9], whereas the Neisseria DUS is GCCGTCTGAA with no flanking motifs [10]. However, similarities in genomic frequencies and distributions suggest that they have arisen by similar processes. Both USSs and DUSs are present in their respective genomes at frequencies close to one copy per kb and both show no significant orientation bias. Both types are distributed somewhat more regularly around their genomes than expected for randomly located repeats, but both have some copies occurring in closely spaced oppositely oriented pairs [7]. Both USSs and DUSs are preferentially found in non-coding DNA sequences, but both have many copies in coding sequences. Both types are also overrepresented in the genomes of at least some other members of their genus or family [4,[7][8][9][11][12][13].
One puzzling attribute shared by USSs and DUSs is an unusually strong consensus, with each genome containing many more copies that perfectly match its consensus core sequence than singly mismatched copies. This pattern is typical of young transposons and other genetic elements that multiply by copying, but very different from the more relaxed consensus typical of sequences that function as binding sites for regulatory proteins, which arise by point mutation of pre-existing sequences. Because USSs are thought to function by binding to DNA-receptor proteins at the cell surface [14,15], their very strong consensus is anomalous. The explanation might be that the DNA uptake machinery at the cell surface binds DNA with much higher specificity than do intracellular DNA-binding proteins. However the possibility that USSs arise by a copying process has not been excluded.
Previous analysis has found that copies of the H. influenzae USS are abundant in the genomes of several other members of the Pasteurellaceae (Actinobacillus actinomycetemcomitans, Pasteurella multocida, and H. somnus), and comparison of homologous genes in H. influenzae and P. multocida has shown that individual USSs can be stable over hundreds of millions of years [13]. However, preliminary examinations of the sequenced genomes of the Pasteurellaceans Mannheimia haemolytica (by Sarah Highlander) and Actinobacillus pleuropneumoniae (by ourselves) found that the H. influenzae USS was much less abundant than a related sequence that differs at several positions, suggesting that these genomes might contain a variant USS.
Insight into the evolution of competence will depend on an improved understanding of Pasteurellaceaen phylogeny. Almost all Pasteurellaceae (gamma-proteobacteria) are commensals and/or pathogens of the mucosal surfaces of vertebrates, primarily birds and mammals, and several are important human pathogens. Although phylogenetic analysis based on 16S rRNA sequences has confirmed that the family is monophyletic, the relationships of its members remain poorly resolved. Two recently published phy-logenies used small-subunit rRNA sequences from 83 Pasteurellacean taxa [16] and partial sequences of the housekeeping genes atpD, infB and rpoB from 28-36 strains [17], but the resolution was unsatisfactory, with many low bootstrap values and unresolved nodes.
Here we use the concatenated sequences of 12 proteins to construct a well-resolved phylogenetic tree for the species of Pasteurellaceae with genome sequences available. This tree then serves as a framework against which we characterize the long-term evolution of competence genes and of DNA uptake specificity.

A robust Pasteurellacean phylogeny
The amino acid sequences of 12 well-conserved genes were identified from the available published and draft sequences of Pasteurellacean genomes and used to infer the consensus phylogeny shown in Figure 1. Homologous E. coli genes were used as the outgroup. The chosen genes did not contain the H. influenzae USS, were distributed around the H. influenzae genome, and had strong homologues in the other genomes. Intracellular proteins were chosen to preclude the diversifying selection that can bias evolution of proteins exposed on the cell surface, and genes with base compositions typical of their species' genomes were used to preclude recent horizontal transfer.
The resulting phylogenetic tree ( Fig. 1)  This phylogeny is restricted to the eight species with sequenced genomes, but it is the first Pasteurellacean tree to have strong statistical support. It differs in many respects from both the 16S rRNA and protein phylogenies previously published for the Pasteurellaceae. However in our view these discrepancies are the consequence of most branches of those earlier phylogenies having very poor bootstrap support, making them intrinsically unreliable, and so should not be a cause for concern. The Apl subclade it predicts was also supported by the protein tree of Christensen et al. [17]. Although the Apl subclade is not seen in Christensen et al.'s small-subunit rRNA tree, the resolution of that region of their tree is poor (best bootstraps are 69% and 75%) [16]. The topology of an earlier tree based on small subunit rRNA sequences agrees with ours, although none of the relevant bootstrap values in that tree are significant [18]. The new tree also confirms what the previous more-detailed but less-well-supported trees had predicted -that within the Pasteurellaceae true evolutionary relatedness is not well correlated with many of the features previously used to assign isolates to genus [17,19,20].
The genus assignment of M. succiniciproducens provides an example. This species, isolated from bovine rumen, was assigned to Mannheimia based on a simple small-subunit rRNA tree with no bootstrap analysis [21]. Our phylogenetic analysis instead places the two Mannheimia species in separate subclades. The 80% bootstrap score supporting M. succiniciproducens's placement as the sister group to Phylogeny of 8 Pasteurellacean species   Fig. 1 is too low to rule out a closer affinity with P. multocida. Hong et al. compared the M. succiniciproducens genome sequence to those of both P. multocida and H. influenzae; more genes are shared with the former, but the amino acid identities are higher with the latter [22,23]. In any case it is striking to find a rumen bacterium as such a close relative of bacteria otherwise restricted to respiratory mucosa.

Competence genes in Pasteurellacean genomes
Natural transformation has been demonstrated experimentally in only three of the eight sequenced species (H. influenzae, A. actinomycetemcomitans and A. pleuropneumoniae [12,24,25]). Only two other species within the Pasteurellaceae have also been shown to be naturally competent (Haemophilus parasuis [26] and Haemophilus parainfluenzae [4,27]). A number of other species have resisted multiple attempts at transformation in the laboratory, but their nontransformability could be misleading, as cellular processes important in the natural environment may not be induced under laboratory culture conditions.
VanWagoner et al. identified homologues of several H. influenzae competence genes (HI0366, HI0938 and HI0939) in most sequenced Pasteurellacean genomes [28]. As we have recently identified the complete competence regulon of H. influenzae, we examined the genomes of all of the sequenced Pasteurellaceae for homologues of all of these genes [29]. Table 1 shows that all of the genomes contain recognizable homologues of all of the genes known to be required for competence in H. influenzae, as well as homologues of most other genes consistently occurring in the same operons.
However not all of the genes in this ancestor's sequenced descendants appear to be functional. The H. ducreyi comA, comB and comM genes are interrupted by an internal stop codon (comA) and frameshifts (comB and comM). A deletion in the H. somnus genome fuses the 5' portion of comD to the 3' 67% of comE, which also contains a frameshift. A 17 kb insertion disrupts the comM gene of M. succiniciproducens, and most of pilB in M. haemolytica has been deleted. In some cases, examination of genome sequences from different isolates revealed discrepancies; these may result from strain-specific variation or from the preliminary nature of some of the sequences used. Only the sequenced genomes of H influenzae, A. actinomycetemcomitans, P. multocida and A. pleuropneumoniae retain fully intact sets of competence genes.
What inferences can be drawn about the evolution of competence? First, the most parsimonious explanation for the presence of all competence genes in all genomes is that the ancestral Pasteurellacean had functional copies of all these genes and was naturally competent. When and where did this ancestor live? Although dating bacterial divergences is highly problematic, the most recent common ancestor of H. influenzae and P. multocida, and thus of the Hin-subclade in Fig. 1, has been estimated to have lived about 270 million years ago (mya), and last common ancestor of the entire family must be older still [30,31]. Thus the origin of the Pasteurellaceae is likely to have long predated the origin of mammals (c.195 mya) and may be contemporaneous with the origin of tetrapods about 360 mya. If so, it is possible that these bacteria moved into the respiratory tract and used the abundant DNA found there [32] almost as soon as the first respiratory tracts evolved.
What then explains the sporadic distribution of competence in its descendants? Three of the five genomes from 'non-transformable' species we analyzed carry obvious genetic defects that would prevent DNA uptake. (Loss of comM in M. haemolytica would only prevent transformation.) Each defect is unique and so must have arisen since the most recent divergence in its lineage. Furthermore, the substitution rate indicated by the scale bar on Fig. 1 allows estimation of the minimum number of chain-terminating and frameshift mutations expected to have accumulated since loss of a competence gene removed selection on other competence-specific genes. The scarcity of such mutations in each of these strains (0, 1 or 2) suggests that competence was lost quite recently. This is consistent also with the high densities and strong consensuses of the USS in all genomes except H. ducreyi. Frequent recent losses of competence would also explain the reported variation in competence within populations [3][4][5][6].

Uptake signal sequences (USS) are not insertions
One goal of this work was to use USS distribution to make inferences about the evolution of DNA uptake specificity. However, the anomalously strong consensuses of H. influenzae USSs (and other USSs) raised the concern that they might have been produced by insertion of a replicating element rather than by point mutations in pre-existing sequences. Fortunately the mode of USS origin makes a simple prediction about the positions of gaps in sequence alignments. If individual H. influenzae USSs have arisen by insertion, gaps should be seen when the segments containing these USS are aligned with homologous sequences from genomes that diverged before USS arose. We used this prediction to test whether the many H. influenzae USSs in protein coding sequences arose in an ancestral Pasteurellacean by insertion or by accumulation of point mutations in the ancestral genes.
Because of the evolutionary distance between H. influenzae and species with no USS, the alignments were done between predicted amino acid sequences rather than nucleotide sequences. Segments of well-conserved H. influenzae proteins, centred on USS-encoded amino acids, were aligned with homologous protein segments from Escherichia coli, Vibrio cholerae and Pseudomonas aeruginosa, whose genomes do not contain USS-like repeats. A typical alignment is shown in Fig 2, along with a sketch of the evolutionary relationships of these bacteria.  The sketch at the right shows the phylogenetic relationships of these taxa [57].

All Pasteurellacean genomes contain USS-like repeats
The next step was to characterize the phylogenetic distribution of USS. Bakkali et al. found that the only overrepresented short repeats in the P. multocida genome are variants of the 9 bp H. influenzae USS core [13]. They also found the H. influenzae USS core to be highly overrepresented in the H. somnus and A. actinomycetemcomitans genomes but did not survey other repeats. To avoid the bias of searching for a specific USS sequence, we extended this analysis by counting all 6-12 bp repeats in all eight genomes (Table 2) and calculating the number of each repeat expected in a random-sequence genome of the same size and base composition. Table 2 shows that all genomes had highly overrepresented repeats related to the H. influenzae USS. The most common 9-mer repeats in the genomes of A. actinomycetemcomitans, P. multocida, M. succiniciproducens and H. somnus are the H. influenzae USS core AAGTGCGGT and its reverse complement. All of the ten most abundant 8mer, 9-mer and 10-mer repeats in these genomes also contain or closely overlap this 9-mer. We will refer to this as the Hin-type USS. However the most frequent 9-mer repeats in the genomes of A. pleuropneumoniae and M. haemolytica differed from the Hin-type USS at the second, third and fourth positions (ACAAGCGGT rather than AAGTGCGGT); we will refer to this as the Apl-type USS. The most abundant repeats in the H. ducreyi genome were not recognizable USSs but simple palindromes and strings of As and Ts, so Table 2 also gives the frequencies of the most common USS-like 8, 9 and 10-mers for this genome. These resembled the Apl-type USS but their copy numbers were substantially lower than in the other genomes. (Although the 10-mer AATAAGCGGT was the most common USS-like 10-mer repeat, ATAAGCGGT and TAAGCGGT were not among the 50 most frequent 9-mers and 8-mers.) Each genome was specifically checked for repeats of the other USS type. The frequencies of both types of 9 bp USSs per Mb sequence in all eight genomes are shown in Fig.  3A. Only 4 copies per Mb would be expected in randomsequence genomes of the same base compositions. Although the minority USS type (e.g. Hin-type USS in A. pleuropneumoniae) is several-fold overrepresented in each genome, it is not significantly more frequent than other 9mers sharing the global consensus ANNNGCGGT. Thus each genome appears to have a predominant subcladespecific USS type. Fig. 3B shows, for each Pasteurellacean genome, the ratio of repeats perfectly matching each USS type to repeats with single mismatches to that type. Genomes with Hintype USSs resemble H. influenzae in having more perfect than singly mismatched copies, despite the 27-fold greater number of possible sequences. The discrepancy is also seen for genomes with Apl-type USSs; with the exception of H. ducreyi, the ratio is substantially higher for the subclade-specific USS type than for the other type. The consistency of the pattern suggests that USS accumulation is shaped by similar forces in the different genomes.

Detailed comparisons of USSs
As USSs are thought to function by binding to DNA receptors on the cell surface, bases at different positions in the USS core would be expected to show consensus strengths reflecting their differing contributions to this DNA-protein binding. Sequence logos were used to visualize the representation of each base at each position of the USS (Figs. 4 and 5) [33]. In these logos the relative heights of the A, G, C and T in each stack shows the frequencies of the bases at that position, and the overall height of each stack of letters reflects the strength of the consensus at that position (the information content). The height of the stack is especially sensitive to minor changes in the frequency of a very frequent base (e.g. if the frequency of the most common base falls from 1.0 to 0.9 the height falls from 2.0 to 1.6). The H. influenzae and A. actinomycetemcomitans USSs have been shown to also share conserved motifs (segments 2 and 3) on the 3' side of the USS core [9,12]. The importance of segment 2 was experimentally demonstrated by Danner and coworkers, who showed that USS-containing DNA fragments ethylated at bases in this region were not taken up by competent H. influenzae cells [15]. The functions of these positions in DNA uptake are not known; they may be additional sites of contact with the DNA receptor, or they may be involved in DNA bending or kinking during uptake. In the Hin-type USS this 6nt-motif segment consists primarily of Ts, and is centred 12 positions to the right of segment 2. In A. pleuropneumoniae and M. haemolytica, segment 3 extends slightly farther to the left and substantially farther to the right, and has the more complex consensus AAAATTTTGCAAAT.
Although the H. ducreyi USS consensus in segments 2 and 3 resembles the Apl-type motifs, it is much weaker. Together with the lower frequency of USS in its genome, and the presence of inactivating mutations in three of its competence genes, this suggests a relatively ancient loss of ability to take up DNA.
The consensuses in segments 2 and 3 of the A. pleuropneumoniae and M. haemolytica USSs were particularly strong and extensive. To compare their strengths to that of the core USS we repeated the above analysis in reverse. We chose the nine bases making the strongest contribution to segment 2 and segment 3 (ATTTNNNNNNNNNTTTGC) or to segment 3 alone (TTTTGCAAA) and identified and aligned all A. pleuropneumoniae genomic segments containing them (560 and 454 segments respectively). The resulting logos (Fig. 5A and 5B) show that many of the segments bearing these motifs also contained all but the first two bases of the Apl-type USS core. The weak correlation of the first two positions of the core with the flanking segments may mean that these positions play a lesser role in USS function than the rest of the core, with its larger segment 3 making a greater contribution to the binding specificity. A logo using only the 164 sequences with 12 matches to segments 2 and 3 was even more effective, recovering the full core consensus (Fig. 5C). Taken together, these analyses suggest that, at least in A. pleuropneumoniae, the motifs in segments 2 and 3 may be as important for DNA uptake as the USS core.

USS in H. parasuis
A recent paper reported that H. parasuis has the core USS GAGTTCGGT, which differs from both the Hin and Apl types [26]. However this conclusion was based on analysis of a single putative USS in a cloned 413 bp fragment. We have examined all the available H. parasuis sequences (86,701nt, mainly in ORFs) and find, in addition to the one copy of this repeat described by Bigas et al., four copies of the Hin-type USS, fourteen copies of the Apl-type USS, and seventeen copies of sequences differing at single positions from the Apl-type USS. This suggests that H. parasuis has an Apl-type USS, which would be consistent both with previous phylogenetic analysis placing it in a strongly supported subclade with M. haemolytica and A. pleuropneumoniae [17,34] and with the ability of A. pleuropneumoniae DNA to efficiently transform H. parasuis [35].

H. influenzae and A. pleuropneumoniae recognize subclade-specific USSs
A. actinomycetemcomitans (Hin-type USS) has already been shown to preferentially take up its own and H. influenzae DNAs [12], but for most of the other species the role of the putative USS in DNA uptake could not be directly tested because no competent isolate has been identified. However A. pleuropneumoniae strain HS143 (serotype 15) has recently been shown to be much more competent than other strains (J. Bossé, manuscript in preparation), allowing us to test its uptake specificity by three different exper-iments. Each confirmed that competent A. pleuropneumoniae cells preferentially take up DNA fragments containing the Apl-type USS.
The solid bars in Figure 6A and 6B show measurements of uptake by competent H. influenzae and A. pleuropneumoniae cells of radiolabelled 220 bp DNA fragments containing synthetic H. influenzae and A. pleuropneumoniae USSs. These USSs were designed to contain the most common base at each position of the extended USSs described above; a control fragment contained a randomized version of the H. influenzae USS sequence. As expected, H. influenzae took up about 1500-fold more DNA containing its USS than control DNA (Fig. 6A; note the log scale). The function of the Apl-type putative USS was confirmed; A. pleuropneumoniae took up about 17-fold more DNA with its USS than control DNA (Fig. 6B). Each species also took up substantially less DNA containing the heterologous USS type than its own type (only about twice as much as control DNA), confirming that the DNA uptake machinery discriminates between the two types.
Uptake of chromosomal DNA may provide a more biologically relevant measure of specificity. The dashed bars in Fig. 6A and 6B show uptake of radiolabelled chromosomal DNAs by competent H. influenzae and A. pleuropneumoniae cells. In this assay H. influenzae took up 50-fold more H. influenzae DNA than the control E. coli DNA (Fig.  6A), and A. pleuropneumoniae took up about 37-fold more A. pleuropneumoniae DNA than E. coli DNA (Fig. 6B). In both chromosomal and synthetic-USS uptake experiments A. pleuropneumoniae took up substantially less DNA than did H. influenzae, consistent with its lower transformation frequency.  These results suggest that the A. pleuropneumoniae uptake machinery does indeed weakly recognize the Hin-type USS, and do not preclude a similar overlap in specificity by the H. influenzae uptake machinery. Figure 7 shows the extent to which cells preferentially take up genetically marked conspecific DNA in the presence of competing DNA from their own strain or another species. This is a more sensitive measure of uptake bias than the uptake of pure DNAs tested above. Fig. 7A shows the results of uptake-competition assays using H. influenzae cells and a constant amount of H. influenzae chromosomal DNA carrying a novobiocin resistance allele. As expected, unmarked H. influenzae DNA competed strongly but B. subtilis DNA, which does not contain overrepresented USS-like repeats, did not [36]. A. pleuropneumoniae DNA did not compete for uptake. Fig. 7B shows that A. pleuropneumoniae took up its own DNA in prefer-ence to both H. influenzae DNA and B. subtilis DNA. These results confirm that the DNA uptake machineries of both H. influenzae and A. pleuropneumoniae discriminate strongly in favour of DNAs containing their own USS type. H. parasuis DNA was also tested; it did not compete for uptake by H. influenzae (Fig. 7A), but competed with A. pleuropneumoniae DNA for uptake by A. pleuropneumoniae to an extent consistent with the density of Apl-type USS in its DNA.
We did not test whether cells could discriminate between DNAs from species in the same subclade. However, as an earlier measure of relatedness among the Pasteurellaceae, Albritton et al. examined the ability of DNAs from various species to compete with H. influenzae DNA for uptake by competent H. influenzae cells [37]. The ability to compete for uptake correctly predicted the USS distributions we have found: DNAs from A. actinomycetemcomitans and P. multocida (Hin subclade) competed strongly (54% and WebLogos for USSs and surrounding sequence in 8 genomes Although the other sequenced species were not tested, the competition shown by DNA of non-sequenced Pasteurellacean species is likely to predict the USS types they contain.  [16,17]. The shared features of the Pasteurellacean USS types may reflect generalized features of the DNA uptake process. The 9 bp USS cores may match the size of the recognition domain of the as-yet-unidentified DNA receptor protein, and are similar in length to the 10 bp Neisseria core. The conservation of segment 2 and segment 3 in the Pasteurellaceae is intriguing, as conserved flanking motifs are not seen in Neisseria. It may be significant that the spacings between the USS core, segment 2 and segment 3 correspond roughly to single turns of helical DNA. We know it cannot be the case that H. influenzae cells initiate uptake by threading a DNA end through a membrane pore, because they efficiently take up covalently closed plasmids [38]. However DNA molecules are too highly charged and too stiff (persistence length about 50 nm or 150 bp) to simply pass sideways through the outer membrane. Together the USS core plus flanking motifs may allow the DNA to be sharply kinked (perhaps by strand separation), presenting a compact cross-section for membrane transit. Detailed understanding of the function of USSs will require more complete experimental studies of binding and incorporation of target DNA sequences.

Conclusion
The eight Pasteurellacean species we analyzed fall into two robust subclades. The genomes of all these bacteria contain homologues of all the H. influenzae genes known to be needed for DNA uptake, some of which have recently been inactivated by mutation. All of these genomes also contain high densities of genetically stable repeats, either the well-characterized H. influenzae USS or a related sequence, in each case comprising a 9 bp core and two adjacent AT-rich segments. The distribution of the Hin-type and Apl-type USSs corresponds to the two Pasteurellacean subclades. Competent members of these subclades discriminate between the two USS types, each preferring to take up DNA containing the USS typical of its own genome.
Taken together, these findings are consistent with the following model of the evolution of competence in the Pasteurellaceae: The ancestor of the sequenced Pasteurellaceae possessed a complete set of functional competence genes and was naturally competent, taking up DNA by a mechanism very similar to that used by H. influenzae today. The ancestral genome contained many USSs; these may or may not have been simpler than the USSs in its descendants, but likely had the common motif ANNNGCGGT in the USS core and included the AT-rich segment 2 and much of segment 3. During the initial diversification of the Pasteurellacean subclades the uptake specificity and USS consensus changed in parallel in one or both lineages. This divergence of genomic USSs may have been effectively complete before the divergence of the sequenced species within each subclade, with USS specificities remaining stable since then, although the existence of Pasteurellaceae with other diverged uptake specificities cannot be ruled out. Because the USS consensuses within each subclade are so similar, USS specificity will not enable competent bacteria to distinguish between DNAs derived from different species within their subclade. Because many of these DNAs are otherwise sufficiently diverged that recombination is not only inefficient but toxic [37,39], forces other than exclusion of non-self DNA may be responsible for uptake specificity.
What are the implications for other bacterial families? We suggest that the evolutionary history of competence often follows the pattern shown in Fig. 8. In this model, the ancestors of many bacterial families were naturally competent but competence has been and continues to be frequently lost. Mutations causing loss of competence have not always been strongly selected against, and sometimes may have been actively favoured, so non-competent lineages often persist. However, over the long term the noncompetent lineages are selected against, so that all extant bacteria have recent ancestors who were competent. This hypothesis is consistent not only with our family-level analysis but with the extensive evidence of sporadic distribution of competence within individual species [3][4][5][6]. The pattern is similar to that seen for the mismatch repair system, where mutants with defects in mutation prevention can experience a short-term advantage but are eventually eliminated by selection against accumulating deleterious mutations [40].
Many questions remain unanswered. How deep is the ancestry of competence? Are some bacterial families ancestrally not competent? Have some modern species completely lost competence? Do genes introduced by conjugation or transduction ever restore competence to noncompetent lineages? Thanks to the ever-increasing availability of new genome sequences, answers to these questions will soon be within reach.  [46,47]. Open reading frames for genome sequences lacking annotation were identified from the draft sequence using the GLIMMER software package (now available at [48]).
Phylogenetic analysis of the concatenated alignment used the PHYLIP software package [51]. ProML analysis using maximum likelihood with the JTT method and a gammaplus-invariant-sites distribution of rates across sites yielded a predicted tree with estimated phylogenetic distances. SeqBoot was then used to produce 100 datasets by bootstrapping resampling; these were put into ProML to generate phylogenetic trees. The final bootstrap analysis was done using the program Consense and the bootstrap values were added to the tree generated with the complete sequence above.

Homology of USS-encoded peptides
BLAST searches were used to identify those USS-containing H. influenzae genes that had homologues in all of E. coli, V. cholerae and P. aeruginosa. ClustalW was used to align the homologous protein sequences, with the default penalties of 10 for gap opening and 0.2 for gap extension. Analysis was restricted to 43aa segments centred on amino acids encoded by the USS core that showed >50% amino acid identity across all homologues. All gaps within these alignments were tabulated.

Repeat analysis
The Perl program repeat_finder was developed to search genome sequences for abundant short DNA sequences (code available at [52]). It was used to tabulate the occurrences of the 20 most abundant 6-, 7-, 8-, 9-, 10-, 11-, and 12-mers for each of the 8 Pasteurellacaean genomes, along with the number of each expected for a randomsequence genome of that size and nucleotide composition.
All occurrences of the 9 bp putative USS core for each species were identified, and 50-bp sequence segments containing the core plus 11 bases upstream and 30 bases downstream were aligned. The program WebLogo [7,53]) was used to visualize the consensus for each USS. Similar analyses were done for each genome using all singly mismatched occurrences of the 9 bp core, and for A. pleuropneumoniae using consensus sequences derived from the two flanking regions.

Bacterial strains and culture conditions
A. pleuropneumoniae serotype 15 (strain HS143) and H. influenzae Rd (strain KW20) were grown in Brain Heart Infusion broth (Difco) supplemented with the recommended concentrations of NAD and hemin (H. influenzae Model for the evolution of competence only), and were made competent by transfer of exponentially growing cells to MIV starvation medium as described for H. influenzae [54]. Aliquots of competent cells were stored at -80°C and thawed immediately before use.

DNA uptake
Competent cells of H. influenzae strain KW20 and A. pleuropneumoniae strain HS143 (1.0 ml; ~1 × 10 9 cfu) were incubated with 150 ng of labeled chromosomal DNA or 20 ng of labeled PCR fragment for 15 minutes at 37°C, followed by 5 minutes incubation with DNase I at 1 μg/ ml. Cells were then washed three times at room temperature by pelleting and resuspension in 1.0 ml of MIV, and the radioactivity of the pellets was counted.

Transformation-competition experiments
Competent cells of H. influenzae strain KW20 and A. pleuropneumoniae strain HS143 (0.2 ml) were incubated for 15 minutes at 37°C with 100 ng of genetically marked conspecific DNA (MAP7 DNA for H. influenzae [54] and sodC::Kan DNA for A. pleuropneumoniae [25]) mixed with 100, 300, or 900 ng of competing DNA (H. influenzae KW20, A. pleuropneumoniae HS143, B. subtilis or H. parasuis (strain Nagasaki) DNA). DNaseI was then added at 1.0 μg/ml for a further five minutes and cells were then diluted and plated on supplemented BHI plates containing 2.5 μg.ml novobiocin (H. influenzae) or 25 μg/ml kanamycin (A. pleuropneumoniae). Data were plotted using the double-reciprocal method of Sisco and Smith [56].