Expansion of gene families by gene duplication is a common feature of evolutionary history and is expected to provide a major source of novel genetic material needed to facilitate phenotypic evolution [1–5]. While most duplicates are rapidly lost from the genome, some are retained because of increased dosage requirements, the acquisition of new functions (e.g. neofunctionilization) or the splitting of the ancestral function between the duplicate copies (e.g. subfunctionilization) [5, 6]. The genetic variation provided by gene duplication may be as important for adaptive evolution as replacement substitutions or changes in regulatory DNA [2, 4]. Genomic-level comparisons that are now possible for closely related species in a few groups have provided fine-scaled resolution of shifts in gene family sizes and revealed that rapid changes in gene family composition is pervasive [3, 7, 8]. The evolutionary pressures shaping the size and structure of gene families can vary substantially in different lineages. For instance, an analysis of the 12 Drosophila genomes estimated that approximately 10% of all gene families are specific to a single lineage within the genus . Precise mapping of the phylogenetic pattern of gains and losses in gene family structure and organization is necessary to understand the evolutionary factors driving these changes.
One gene complex that appears to be specific to Drosophila relative to other insects and may play an important role in the evolution of this genus is the Enhancer of split complex (E(spl)-C). This complex spans a 45 kb region in Drosophila melanogaster and comprises seven basic helix-loop-helix (bHLH) transcription factors (mδ, mγ, mβ, m3, m5, m7, m8), four Bearded (Brd) class genes (mα, m2, m4, m6) and a single gene (m1) thought to act as a protease inhibitor . All the bHLH and Brd genes play a role in neurogenesis and function as negative regulators in the Notch signaling pathway [10–17]. Their primary role is to limit the number of progenitor cells during neural specification. For instance, in the formation of the adult peripheral nervous system, small clusters of cells acquire neural cell fate potential through the expression of proneural proteins such as Achaete and Scute. Only one of these cells, the Sensory Organ Precursor (SOP) cell, will develop into the components of the adult bristle. In response to Notch signaling, the E(spl)-C proteins specify the identity of the SOP by suppressing proneural protein expression in all cells adjacent to the SOP, a process known as lateral inhibition. Large deletions within the E(spl)-complex produce excessive neuronal differentiation [14, 18], whereas elevated expression of the E(spl)-C proteins reduces sensory organ cells .
Despite the neural hyperplasia resulting from large deletions, it has been difficult to identify phenotypic defects caused by fine scale mutations within the complex and deletion of an entire gene is rarely lethal [17, 19]. This pattern suggests strong functional redundancy among the genes . Two other lines of evidence, however, indicate unique functional roles for each of the E(spl)-C genes. First, individual genes exhibit strong gene-specific expression patterns, particularly in the imaginal discs [10, 21, 22]. Second, comparisons between D. melanogaster and D. hydei indicate there have been no gene losses within the complex since the common ancestor of these species  suggesting that all of the genes are functionally important and maintained by stabilizing selection. Therefore, the expansion of the gene family may have been driven by selection pressures for greater complexity and specificity of N signaling in different tissues .
With respect to regulatory structure, the E(spl)-C genes are one of the best characterized loci in Drosophila. Although different members of the complex have distinct patterns of gene expression, they share many common features within their regulatory regions. The majority of genes in the cluster are regulated by Suppressor of Hairless (Su(H)) and several proneural genes. Numerous upstream cis-regulatory elements for these proteins have been identified [24, 25]. One regulatory feature in particular, an inverted pair of Su(H) elements separated by 17 basepairs (bp) and in close association with a proneural binding site, appears to have strong functional significance. This regulatory architecture (termed a SPS+A element: Su(H) Paired Sites + proneural bHLH Activator binding site) resides upstream of many genes in the complex and it's relative location is strongly conserved among Drosophila species [24, 25]. SPS+A elements have also been found in other non-dipteran insects and in other genes unrelated to the Enhancer of split genes . Functional assays indicate the SPS+A element is a crucial component of the synergistic signaling response mediated by Su(H), proneural proteins, and several co-repressors and activators [26–29]. Regulation of the E(spl)-complex is also affected by a series of 3' UTR motifs that are bound by micro-RNAs (miRNAs) post-transcriptionally . Similar to the cis-regulatory elements, these motifs occur in the majority of the E(spl)-C genes and exhibit strong conservation among Drosophila species [12, 16, 31, 32].
The E(spl)-complex is unusual among gene expansions in that it involves the coordinated duplication of two different types of genes that have no close paralogy, but have functional overlap and share common regulatory mechanisms. Several recent studies have examined the evolutionary history of the complex [25, 33, 34] but have focused primarily on non-dipteran taxa. The mosquito species, Anopheles gambiae and Aedes aegypti each contain a single homolog of the bHLH and Brd genes suggesting the expansion occurred after the split between Nematocera and Brachycera. Comparison across the Drosophila genomes indicates the gene composition and much of the regulatory organization has remained stable since the emergence of this genus, approximately 40-60 MYA. However, little is known about the structure and regulatory content of the complex in dipteran species that represent intermediate evolutionary steps between Nematocera and Drosophila. This information is crucial to understanding the evolutionary history of the complex and the selection pressures influencing its expansion. Therefore, using sequence data from several fosmid clones from a genomic library, we reconstructed the entire complex in the acalyptrate stalk-eyed fly, Teleopsis dalmanni. In addition, we probed the recently sequenced genome of the tsetse fly, Glossina morsitans, along with other dipterans with a well-represented EST database, in order to reconstruct the history of the complex within schizophoran flies.