Evolution of eukaryotic gene structure: remarkable conservation of intron positions in plants and vertebrates and massive, lineage-specific loss of introns

 

Eugene V. Koonin

 

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD, USA

 

Most of the eukaryotic protein-coding genes contain multiple introns that are spliced out of the pre-mRNA by a distinct, large RNA-protein complex, the spliceosome, which is conserved throughout the eukaryotic world. Anecdotal observations indicate that positions of some introns are conserved in orthologous genes from plants and animals. However, intron densities in different eukaryotic species differ widely and the location of introns in orthologous genes does not always coincide even in closely related species, and likely cases of intron insertion and loss have been documented. Thus, both intron loss and intron insertion clearly occur in eukaryotic evolution. It has been suggested that the proportion of shared intron positions decreased with increasing evolutionary distance, which potentially could make intron conservation a useful phylogenetic marker. However, evolutionary history of introns and the selective forces that shape intron evolution remain major mysteries. In particular, it is unclear whether the genome of the common ancestor of animals, plants and fungi was intron-rich or intron-poor, how many ancestral introns are retained in extant genomes and what are the relative contributions of intron loss and intron insertion in the evolution of eukaryotic genes. We addressed this problem taking advantage of the recently constructed collection of clusters of orthologous eukaryotic genes (COGs). Intron positions were analyzed in 1181 orthologous gene sets from six completely sequenced genomes of animals, plants and fungi and constructed parsimonious scenarios of evolution of the exon-intron structure for the respective genes. Paradoxically, humans share substantially more introns with the plant Arabidopsis thaliana than with fly or nematode. This is explained by postulating the presence of numerous introns in the common ancestor of animals, plants and fungi. Many of these ancestral introns are conserved in vertebrates and plants, in which they comprise up to 25% of all introns, but have been lost in fungi, nematodes and arthropods.