Bmc Evolutionary Biology the Evolution of Brassica Napus Flowering Locust Paralogues in the Context of Inverted Chromosomal Duplication Blocks

Background: The gene FLOWERING LOCUS T (FT) and its orthologues play a central role in the integration of flowering signals within Arabidopsis and other diverse species. Multiple copies of FT, with different cis-intronic sequence, exist and appear to operate harmoniously within polyploid crop species such as Brassica napus (AACC), a member of the same plant family as Arabidopsis.


Background
Timing of the onset of flowering is an important agronomic trait affecting crop production. To meet the challenges of climate change, and the need to adapt crops to a wider range of growing environments, it is necessary to coordinate flowering within the context of seasonal variations in order to ensure the greatest possibility of pollination, and thus consistently high seed yield. The genetic basis of variation in flowering time is now well understood in Arabidopsis. Forward and reverse genetics has allowed identification of genes in the context of environmental and developmental cues that mediate the onset of flowering, and allowed detailed characterization of the photoperiod, vernalization, gibberellin and autonomous pathways. Major integrators of these pathways include FLOWERING LOCUS T (FT), along with SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 (SOC1) and LEAFY (LFY) [1][2][3][4][5]. FT induces flowering in response to long days and is a direct target of the nuclear protein CONSTANS (CO) in leaves [6][7][8]. Two proteins within the vernalization pathway, FLOWERING LOCUS C (FLC) and SHORT VEGETATIVE PHASE (SVP), bind to the CArG box within intron 1 and promoter region of FT, respectively, to repress its expression [9,10]. FT is expressed in the phloem of leaves, with the small protein moving as a long-distance signal to the shoot apical meristem (SAM) where it interacts with FD, a bZIP transcription factor, to form a complex of FT/FD heterodimer. This then activates the floral meristem identity genes APETALA 1 (AP1) and FRUIT-FUL (FUL) to promote flowering [7,8,11,12].
Although FT plays a central and indispensable role to induce flowering, interpreting the exact roles of different FT paralogues is difficult, owing to variation in the structure and number of FT family members across plant taxa. For example in rice, at least three FT-like genes (OsFTL1-OsFTL3) promote flowering, although there are thirteen such genes within the genome, corresponding to eight in the ancestral grass genome [13][14][15][16]. It has been shown that the protein encoded by Hd3a, corresponding to OsFTL2, is a mobile signal in rice, and moves from the leaf to the SAM to induce flowering [17]. A complex situation also exists in barley, where five FT-like genes are found, with HvFT1 being highly expressed under long day conditions at the time of transition from vegetative to reproductive growth, and HvFT2 and HvFT4 expressed later in development. HvFT3 is a candidate gene for a major flowering-time QTL, and the expression level of HvFT5 is very low in short days, corresponding to a predicted stop codon within the protein at residue 69 [18]. Four FT-like cDNAs have been cloned in poplar, of which only PtFT2 has been shown to shorten juvenile phase and promote seasonal flowering [19]. Although Brassica species share a common ancestor with Arabidopsis and diverged at 14.5 to 20.4 MYA [20][21][22], no FT paralogues had previously been identified, apart from 3 RFLP probes from Arabidopsis FT having been mapped to the A2 (N2) and C6 (N16) linkage groups of B. napus [23].
Although there are five homologs of FT in Arabidopsis, i.e. TERMINAL FLOWER 1 (TFL1), TWIN SISTER OF FT (TSF), ARABIDOPSIS THALIANA CENTRORADIALIS (ATC), BROTHER OF FT AND TFL1 (BFT), and MOTHER OF FTAND TFL1 (MFT), their functions are very diverged from FT, and only share 56-82% identity at the amino acid sequence level, with TSF having the highest similarity [38]. This information allows us to identify FT orthologues from the B. napus genome unambiguously. We sought to characterize the role that ancestral segmental genome duplication events had played on the number and function of FT orthologues within B. napus, and to establish how this had contributed to the ability to adapt to contrasting agricultural systems, such as those associated with spring and winter seasonal crop types. This paper presents the results of cloning and characterizing the BnFT paralogues, and discusses the evolution of BnFT paralogues within B. napus.

Isolation and genetic mapping of BnFT paralogues
Thirty-five potential FT orthologue-containing BAC clones were identified by Southern blot screening of the JBnB BAC library, which was developed from the doubled haploid B. napus cv. Tapidor [30]. Eleven of these were verified by PCR amplification. These BACs were initially grouped according to intron length (4 groups), and then grouped according to the polymorphism within intron 2 (6 groups) (Additional file 1). Full-length genomic sequences of these six BnFT paralogues were isolated from representative BACs by amplification and PCR walking, with primers designed from one copy of BrFT (designated as BrA2.FT) located in chromosome A2 of B. rapa ( Figure  1, Table 1).
The six BnFT paralogues were mapped using PCR-derived markers to four linkage groups of the TNDH genetic map, two in the A genome and two in the C genome ( Figure 2). The BnFT paralogues within B. napus were named as follows: BnA2.FT, BnA7.FT.a, BnA7.FT.b, BnC2.FT, BnC6.FT.a and BnC6.FT.b according to their mapped chromosomes. In silico mapping indicates that all of the genomic regions containing BnFT paralogues corresponded to block E in chromosome 1 of Arabidopsis [26], although block E was inversely duplicated on linkage groups A7 and C6, where it forms inverted duplication blocks (IDBs). Based on marker identity, the BnFT paralogues in the IDBs were close to the junctions of each duplicated block ( Figure 2).
According to marker identity, ten IDBs were detected within the TNDH linkage map associated with eight chromosomes ( Figure 3, Additional file 2 ). The IDBs covered a quarter of the whole linkage groups of B. napus that could align with the genome of Arabidopsis (Table 2). Interestingly, blocks located in chromosome 3 and 5 of Arabidopsis [26,39] rarely corresponded with the IDBs in the genome of B. napus. The majority of IDBs were detected in the A genome and not in the corresponding regions of the C genome, since most markers had been developed from B. rapa BACs.

BnFT paralogues: candidate genes of QTL for flowering time?
Two BnFT paralogues, BnC6.FT.a and BnC6.FT.b, were mapped previously to a major QTL cluster region for flowering time, which was detected in all winter-cropped envi-ronments, with the Tapidor alleles contributing to accelerated flowering [39]. The BnA2.FT was newly identified to be located within the confidence interval region of the QTL cluster on chromosome A2 in two wintercropped environments, with the NY7 allele promoting flowering ( Figure 4A). To test whether the BnC6.FT.a and BnC6.FT.b represented candidate genes of QTL for flowering time, a DH line harbouring the complete C6 QTL cluster in Tapidor was backcrossed to NY7 for four generations, and two near isogenic lines (NILs) with different combinations of introgressed segments on C6, NI-5 and NI-9, were selected. Both NILs flowered much earlier than NY7 ( Figure 4B). NI-5, which had a small additional introgressed fragment containing two BnC6.FT alleles of Tapidor, significantly flowered earlier than NI-9 which lacked Tapidor alleles ( Figure 4B). To further confirm whether the BnFT candidates significantly affected flowering time in B. napus, BnC6.FT.a, BnC6.FT.b and BnA2.FT were subjected to association analysis with 35 spring and 20 winter cultivars (Additional file 3). The flowering time of the accessions was investigated in vernalization-free conditions, and the genotypes tested with the PCR markers. The NY7 and Tapidor alleles of three BnFT paralogues were significantly (P < 0.01) ubiquitous in the spring and winter type cultivars (Table 3). In particular, two paralogues, BnC6.FT.a and BnC6.FT.b, showed 95% similarity with Tapidor alleles in winter type cultivars ( Table 3). The functional differences observed between spring and winter cultivars implied that the expression of BnFT candidates was repressively regulated prior to vernalization.

Characterization of BnFT paralogues
The canonical organization of FT into four exons and three introns as found in Arabidopsis ( Figure 5B) is well conserved across all the FT-like genes identified in rice, poplar and barley genomes. However, some variation of intron size is observed, with the third intron of HvFT1 Distribution of primers used to amplify the BnFT fragments, with arrows indicating direction Figure 1 Distribution of primers used to amplify the BnFT fragments, with arrows indicating direction. The top primer pairs (AtFT1 and AtFT2) were used to amplify the probe for screening BnFT paralogues from BAC clones of the JBnB library. Lower primers were used to amplify different regions of BnFT paralogues from positive BACs: BrFT1/BrFT2 for exon1-exon2, BrFT2-3/BrFT4 for exon2-exon4; 5' and 3' UTR were amplified with BrFT1-5' and BrFT4-3', respectively, by PCR walking.
being absent. The number and exon size of all BnFT paralogues in B. napus were identical to their orthologues in Arabidopsis. However, the intron size of all these genes, especially intron 2, was found to differ with that in Arabidopsis ( Figure 5B). Thus a larger ORF is found for BnA2.FT and a smaller ORF for BnC2.FT. In both cases a single base deletion is present within intron 1 that disrupts the CArG box, the binding motif for FLC protein to repress FT expression prior to vernalization in Arabidopsis [10].
Over the coding sequence, the six BnFT paralogues had 85-87% identity with AtFT, 81-83% with AtTSF [Gen-Bank: NM_118156], and 92-99% with each other (Table  4). Interestingly, the paralogues within homeologous regions of the B. napus A and C genomes, such as BnA2.FT and BnC2.FT, showed the highest nucleotide identity (99%), whilst the paralogues present within inverse duplicated regions, such as BnC6.FT.a and BnC6.FT.b, showed higher identity (97%) than among other pairs of paralogues, such as BnA2.FT and BnC6.FT.a (92%). The degree of amino acid identity showed the same relationships as the coding sequences ( Table 4). The BnFT paralogues showed much higher similarity with AtFT than with AtTSF, indicating that conserved amino acids had been substituted at some sites, although the proteins are likely to perform similar functions.

Phylogenetic and evolutionary analysis of FT-like genes across plant taxa
The phylogenetic relationships among FT-like genes from B. napus, B. rapa, Arabidopsis, poplar, rice and barley were analyzed, with a phylogenetic tree constructed from the amino acid sequences of these genes. Four groups could be identified: Hd3a

Discussion
We isolated and mapped six BnFT paralogues from B. napus. Those regions containing BnFT spanned all E segments of Arabidopsis chromosome 1, as previously reported in B. napus [26,39,40]. Based on marker identity additional IDBs were identified in A7 and C6 [41,42]. However, our primary interest was to identify the BnFT paralogues within B. napus and to understand the evolutionary processes and functional consequences associated with their duplication.

Characterization of BnFT paralogues
Although the FT family members within Arabidopsis, Brassica, rice, barley and poplar have similar exon/intron structures (apart from barley HvFT1), the number of paralogous genes differs. The single FT gene in Arabidopsis has been ascribed a "florigen" function that responds to long days [6,12]. In contrast, rice, a short day plant, has thirteen FT-like genes, of which only the Hd3a protein has been shown to have a "florigen" function, with the roles of other paralogues largely unknown [13,14,17]. Five FTlike genes are found in barley and play different roles [18]. B. napus is an allotetraploid derived from interspecific hybridization of B. rapa and B. oleracea [43,44] and is

Primer name Primer sequence
For amplifying BnFT probe from Tapidor genome AtFT1 For isolating BnFT sequences from positive BACs BrFT1 more closely related to Arabidopsis. The six BnFT paralogues exhibit high levels of nucleotide identity to AtFT within coding sequences, which initially suggested that all BnFT may have a similar function contributing to the induction of flowering. It is of intrinsic and agronomic interest to determine how the presence of six BnFT paralogues may cooperate to regulate onset of flowering in B. napus. Based on the analysis of the cis and coding sequences of the closely related BnFT paralogues, it should be possible to dissect the relative timing and contribution of locus-specific paralogues via expression profiles. One may expect that the presence of multiple paralogous copies provides B. napus with additional capacity for exquisite tuning of the network of signals that are integrated from the pathways leading to flowering. This tuning will be mediated in different cultivars through differential expression of alleles at each paralogous locus, and such differentiation will be particularly pronounced between winter and spring type cultivars. More complex interactions with the growing environment may also arise from variation in the epigenetic status of the paralogous genes.
Multiple pathways are integrated by FT to control flowering [4], with FT being the direct target of CO, FLC and SVP proteins in Arabidopsis [9,10,45]. In Brassica species, several homologues of Arabidopsis flowering pathway genes have been identified, such as four BnCO copies (BnCOa1, BnCOa9, BnCOb1 and BnCOb9) in B. napus [46], five BoFLC paralogues (BoFLC1, BoFLC2, BoFLC3, BoFLC4 and BoFLC5) in B. oleracea [47,48], and four BrFLC copies (BrFLC1, BrFLC2, BrFLC3 and BrFLC5) in B. rapa [49]. Based on the evidence from B. rapa and B. oleracea, between eight or ten BnFLC paralogues are anticipated within the whole genome of B. napus. It is tempting to speculate that distinct BnFT copies may be the targets of one or more BnCO and BnFLC paralogues in B. napus. A characteristic CArG box, which acts as the FLC protein binding site, was detected in the first intron of four BnFT paralogues. However, it was absent in BnA2.FT, BnC2.FT and BrA2.FT ( Figure 5B). These difference in the structural features of the BnFT paralogues strongly suggested that they may have undergone functional differentiation and regulatory variation in the context of polyploidy. It has been suggested that redundancy may create subtle fitness advantages that might only be evident in particular stages of the life cycle, or under particular environmental conditions [50]. Force et al. [51] suggest that complementary degenerative mutations in different regulatory elements of duplicated genes can facilitate the preservation of both duplicates, thereby increasing the long-term opportunities for the evolution of new gene functions. Interestingly, the CArG box was also not detected in PtFT2, Hd3a and HvFT1 ( Figure 5B) indicating that polyploidy may provide such FT orthologues with the opportunity to explore sequence variation that enables more diverse or subtle Genetic mapping of BnFT paralogues in the TNDH linkage map functionality within the context of more complex regulatory mechanisms and pathways, harmonized within their own genome.
Earlier genetic mapping revealed that three RFLP markers probed with Arabidopsis FT mapped to the A2 (N2) and C6 (N16) linkage groups of B. napus [23], although this did not account for the full number of expected BnFT paralogues, nor the available sequence information. In this study, we established a preliminary allelic relationship between BnFT paralogues and flowering time QTL, and extended this interpretation through association mapping analysis. Firstly, the NI-5 that harbored a small introgres-sion fragment containing Tapidor BnFT alleles flowered much earlier than NI-9 ( Figure 4B). This indicated that Tapidor might possess alleles of BnFT with much stronger flower-promoting functional effect than those of NY7. Secondly, association mapping with three BnFT candidates indicated that the alleles of NY7 and Tapidor were significantly prevalent in the spring and winter type rapeseeds, respectively ( Table 3). The flowering time pathways in Arabidopsis are better defined than in B. napus, and it is well known that correct flowering time ensures the greatest chance of pollination, higher seed yield and oil content, and therefore reproduction of crop cultivars. Accurate predictive combination of flowering time QTLs,  including those associated with a Brassica "florigen" function will accelerate the selection of cultivars with appropriate flowering times for different regions, especially where there is variation in latitude and seasonal temperatures associated with winter environments. The different regulatory mechanism and pathway of flowering between winter and spring types are now able to be analyzed further.

Brassica FT genes and genome evolution
Comparative mapping between Arabidopsis and Brassica species led Lagercrantz et al. [24] to be the first to propose an ancestral segmental triplication affecting the complete Brassica genome. Subsequently, this view has been substantiated by compelling evidence from genetic and cytogenetic studies across most of the Brassiceae taxa, which indicate this was achieved by a series of distinct duplication events [25,32,33]. However, the detailed mechanisms underlying the process of sequential duplications are not yet well characterized. Based on the sequences of four paralogous B. rapa BAC clones and the homologous 124-kb segment of A. thaliana chromosome 5, Yang et al. [22] deduced that three paralogous subgenomes of B. rapa emerged through duplications 13 to 17 MYA, very soon after the Arabidopsis and Brassica divergence occurred at 17 to 18 MYA. Using BAC-FISH techniques, three or six copies of the contig have been identified from 18 species of Brassicaceae [25], and the process of ancestral allohexaploidy Brassica genome was further revealed as hybridization between genomes of the ancient diploid and tetraploid [27].
Here, we identified six BnFT paralogues in B. napus and determined that each of the A and C genomes contained three copies, which is consistent with the established view of  Figure 5A). Thus, we propose an evolutionary pathway of Brassica FT genes ( Figure 5C) which is in good agreement with the model of diploid Brassica genome evolution via hexaploidization proposed by Ziolkowski et al. [27]. It is not possible to define the evolutionary divergences accurately based solely upon FT family sequence comparison. However, the divergence time calculated by Ks values placed the likely evolutionary events within a distinct order which was in accordance with the established phylogenetic relationship of the FT genes.
Interestingly, the two chromosome fragments where Bx.FT-2a and Bx.FT-2b are located are within IDBs that appear to represent the most recent duplication event, as indicated by the high sequence identity observed between the orthologues. Moreover, the prevalence of IDBs throughout the genome of B. napus ( Figure 3, Table 2) suggests that IDBs have been a universal and efficient pathway in the evolutionary development of Brassica.
IDBs with its dosage effect should generate raw genetic materials for the evolution that can be modified subsequently by natural selection just as general DNA duplications do [52], and the occasionally happened pairing between opposite duplicating segments in the meiosis might bring new variations. The homeologous IDBs in different chromosomes would enhance the chance of homeologous reciprocal or nonreciprocal translocations which were already found in several Canadian and Australian cultivars of B. napus where IDBs exchanged between A7 and C6 [53,54]. Lyask et al. [25] found that the most frequent chromosome rearrangements involving the At4-b contig are inversions, whilst comparative mapping has indicated that 43% of Brassica genomicregions with homeology to Arabidopsis chromosomes involved inversions [35]. It has been suggested that rearrangements such as translocations or inversions might reduce or prevent undesirable pairing and recombination between homeologous chromosomes/chromosome regions and lead to reproductive isolation between populations, and eventually contribute to the speciation processes [55]. Our observation of duplication events leading to inverted duplication segments is in accordance with previous   [41,42]. However, this differs from other reports where only two E segments of Arabidopsis chromosome 1 were found in A and C genomes, respectively [40,56].

Conclusion
The six BnFT paralogues have very high identity between their coding sequences, but vary between their corresponding introns. The CArG box within intron 1 was absent from BnA2.FT and BnC2.FT, which may lead to functional divergence. However, BnA2.FT along with two paralogues on chromosome C6, BnC6.FT.a and BnC6.FT.b, was associated with two major QTL clusters for flowering time indicating that the "florigen" of B. napus may be functionally differentiated between winter and spring type cultivars. The BnFT paralogues share the same ancestral gene with the single FT of Arabidopsis, and have evolved via several duplications and divergence resulting from whole genome polyploidization and the formation of inverted duplication blocks. The characterization of the six BnFT paralogues in B. napus increases our understanding of Brassica genome evolutionary pathways involving genome triplication via multi-stage processes.

Plant materials
A B. napus doubled haploid [56] population, designated as TNDH and consisting of 202 lines, was generated from an F 1 resulting from a cross between a Tapidor DH line (hereafter referred to as Tapidor), a European winter cultivar, and Ningyou7 DH (hereafter referred to as NY7), a Chinese semi-winter cultivar [57]. Near isogenic lines (NILs) were developed with NY7 as the recurrent parent and Tapidor as the donor parent. Fifty-five B. napus cultivars (in 2006) and two NILs (in 2008) were sown in a field plot located in Gansu Province, one of the spring rapeseed regions in China, with 10 plants maintained for each line. NY7 was grown as a control in both years. The period from sowing the seeds to the appearance of the first flower for each accession was recorded as flowering time. The student's t test (Family wide error rate P < 0.05) was used to test the significance of the variation in flowering time for NILs and NY7. Winter cultivars did not flower under these field conditions.

BAC library screening and analysis of clones
A 692 bp FT genomic DNA sequence of (exon2 to exon4) with 87% of identity to FT of Arabidopsis for the coding sequence was isolated from Tapidor using primer pairs AtFT-1 and AtFT-2 (Table 1) designed from an mRNA sequence of FT (GenBank accession NM_105222). The 692 bp-FT-probe was used to screen the JBnB BAC library which was constructed from genomic DNA of Tapidor by Dr. Ian Bancroft, John Innes Centre, UK [30]. Positive BACs were verified with PCR amplification using the AtFT-1/AtFT-2 primer pairs. BnFT paralogues were isolated from six BACs using a set of primers (Figure 1, Table  The TNDH linkage map containing 786 markers [58] was used to map the BnFT paralogues with the primers designed from the BnFT sequences of Tapidor (Table 1). The mapped BnFT paralogues were assigned names according to their linkage group, and where two copies were located in the same linkage group, they were distinguished by suffixes with "a" or "b" corresponding to the order of the markers.

Gene nomenclature
In this paper, we abbreviate the full gene nomenclature for Brassica genes as outlined by Ostergaard & King [59], so that Bna.FT becomes BnFT. In order to distinguish between copies of genes located on specific chromosomes we on occasion indicate this thus: BnC6.FT.a is on chromosome C6.

Phylogenetic and evolutionary sequence analysis
The coding sequences of all the FT orthologues were aligned for further analysis. The fraction of synonymous substitutions (Ks) was obtained using K-Estimator version 6.1 [62]. To estimate the timing of evolution for the different duplication events, we used a median Ks value for each orthologous pair between two blocks. Calculations for the dating of the evolutionary events were carried out using a synonymous mutation rate of 1.4 × 10 -8 substitutions per synonymous site per year, which was established for the CHALCONE SYNTHASE gene in eudicots [63]. Divergence time (T) was estimated using the equation T = Ks/2 × 1.4 × 10 -8 [22].
The blocks associated with inverted duplications within the B. napus genome were identified using the TNDH linkage map and confirmed by the colinear array of Arabidopsis loci based on marker identity in the linkage groups http:/ /www.arabidopsis.org/wublast/index2.jsp.