Orphan gene

Orphan genes (also called ORFans, especially in microbial literature)[1][2] are genes without homologues in other lineages.[2] Orphans are a subset of taxonomically-restricted genes (TRGs), which are unique to a specific taxonomic level (e.g. plant-specific).[3] In contrast to non-orphan TRGs, orphans are usually considered unique to a very narrow taxon, generally a species.

The classic model of evolution is based on duplication, rearrangement, and mutation of genes with the idea of common descent.[4][5] Orphan genes differ in that they are lineage-specific with no known history of shared duplication and rearrangement outside of their specific species or clade.[6] Orphan genes may arise through a variety of mechanisms, such as horizontal gene transfer, duplication and rapid divergence, and de novo origination,[2] and may act at different rates in insects, primates, and plants.[7] Despite their relatively recent origin, orphan genes may encode functionally important proteins.[8][9]

Definition

In order to be considered an orphan gene, the gene must be encoding a protein that lacks homology to any predicted peptide from other genomes of similar species.[7] Orphans are a subset of taxonomically-restricted genes (TRGs), which are unique to a specific taxonomic level (e.g. plant-specific).[3] In contrast, orphans are usually considered unique to a very narrow taxon, generally a species.

History of orphan genes

Orphan genes were first discovered when the yeast genome-sequencing project began in 1996.[2] Orphan genes accounted for an estimated 26% of the yeast genome, but it was believed that these genes could be classified when more genomes were sequenced.[3] Since there are between an estimated 1 and 20 million animal species in the world, the discovery was ignored for some time.[10] However, the cumulative number of orphan genes in sequenced genomes did not level off as time passed.[11] In the sequencing of Schizosaccharomyces pombe and Schizosaccharomyces cerevisiae in 2002, researchers found that 14 percent and 19 percent, respectively, of the protein encoding genes were totally unique to that specific species.[3] Unfortunately for the study of orphan genes, researchers were more interested in studying the similar gene sequences and not the unknown regions.[3]

It was not until 2003 that orphan genes were directly accessed. In a study of Caenorhabditis briggsae and related species, researchers compared over 2000 genes.[3] They proposed that these genes must be evolving too quickly to be detected and are consequently sites of very rapid evolution.[3] In 2005, Wilson examined 122 bacterial species to try to examine whether the large number of orphan genes in many species was legitimate.[11] The study found that it was legitimate and played a role in bacterial adaptation. The definition of taxonomically-restricted genes was introduced into the literature to make orphan genes seem less “mysterious.”[11]

In 2009, an orphan gene was discovered to regulate an internal biological network: the orphan gene, QQS, from Arabidopsis thaliana modifies plant composition. [12] The QQS orphan protein interacts with a conserved transcription factor, these data explain the compositional changes (increased protein) that are induced when QQS is engineered into diverse species. [13] This finding has broad implications for how orphan genes can function and affect biological processes.

In 2009, a study went into “‘the dark matter of protein space’’ to analyze the 2,200 domains of unknown function and concluded that they facilitated evolution of novel functions.[14] This was important because orphan genes were recognized to have a purpose at the level of proteins.

In 2011, a comprehensive genome-wide study of the extent and evolutionary origins of orphan genes in plants was conducted in the model plant Arabidopsis thaliana[15]

Estimates of the percentage of genes which are orphans varies enormously between species and between studies; 10-30% is a commonly cited figure.[3]

How to identify orphan genes

Genes can be tentatively classified as orphans if no orthologous proteins can be found in nearby species.[7]

One method used to estimate nucleotide or protein sequence similarity is the Basic Local Alignment Search Tool (BLAST). BLAST allows query sequences to be rapidly searched against large sequence databases.[16][17] Simulations suggest that under certain conditions BLAST is suitable for detecting distant relatives of a gene.[18] However, genes that are short and evolve rapidly can easily be missed by BLAST.[19]

Another method utilized to detect orphan genes is called phylostratigraphy.[20] Phylostratigraphy generates a phylogenetic tree in which the homology is calculated between all genes of a focal species and the genes of other species. The earliest common ancestor for a gene determines the age, or phylostratum, of the gene. Although phylostratigraphy is typically utilized to infer gene/protein age, it can easily be used to detect orphan genes. If an orphan gene is defined as being unique to one species, then orphan genes are those genes located within the highest phylostratum (i.e., those possessing homologs only in the focal species). However, if one chooses to define an "orphan gene" as being unique to an entire clade (not just a single branch), then orphan genes would include all those genes found within the range of phylostrata comprising the clade.

Where do orphan genes come from?

Orphan genes arise from multiple sources, predominantly through de novo origination, duplication and rapid divergence, and horizontal gene transfer.[2]

De Novo Origination

Novel orphan genes continually arise de novo from non-coding sequences.[21] These novel genes may be sufficiently beneficial to be swept to fixation by selection. Or, more likely, they will fade back into the non-genic background. That young genes are more likely to become extinct (become pseudogenes) has recently been confirmed in Drosophila.[22]

Duplication and Divergence

The duplication and divergence model for orphan genes involves a new gene being created from some duplication or divergence event and undergoing a period of rapid evolution where all detectable similarity to the originally duplicated gene is lost.[2] While this explanation is consistent with current understandings of duplication mechanisms,[2] the number of mutations needed to lose detectable similarity is large enough as to be a rare event,[2][18] and the evolutionary mechanism by which a gene duplicate could be sequestered and diverge so rapidly remains unclear.[2][23]

Horizontal Gene Transfer

Another explanation for how orphan genes arise is through a duplication mechanism called horizontal gene transfer, where the original duplicated gene derives from a separate, unknown lineage.[2] This explanation for the origin of orphan genes is especially relevant in bacteria and archaea, where horizontal gene transfer is common. At least in bacteria, there is no correlation between organism complexity and orphan genes percentage.[24] Likewise, in bacteria, there is no correlation between orphan percentage and genome length.[24]

Protein characteristics

Orphans genes tend to be very short (~6 times shorter than mature genes), and some are weakly expressed, tissue specific and simpler in codon usage and amino acid composition. [25] Orphan genes mostly encode intrinsically disordered proteins.[26] Of the tens of thousands of enzymes of primary or specialized metabolism that have been characterized to date, none are orphans, or even of restricted lineage; apparently, catalysis requires hundreds of millions of years of evolution. [27]

Biological functions

Some researchers have proposed that orphan genes drive morphological specification because they allow organisms to "adapt to constantly changing ecological conditions."[3] These all give more possibilities of differences within a population to help it survive in its environment, which can be helpful if it recently experienced a bottleneck.[2]

[28]==References==

  1. Fischer, D.; Eisenberg, D. (1999-10-01). "Erratum. Finding families for genomic ORFans". Bioinformatics 15 (10): 864–864. doi:10.1093/bioinformatics/15.10.864. ISSN 1367-4803.
  2. 1 2 3 4 5 6 7 8 9 10 11 Tautz, D.; Domazet-Lošo, T. (2011). "The evolutionary origin of orphan genes". Nature Reviews Genetics 12: 692–702. doi:10.1038/nrg3053. PMID 21878963.
  3. 1 2 3 4 5 6 7 8 9 Khalturin, K; Hemmrich, G; Fraune, S; Augustin, R; Bosch, TC (2009). "More than just orphans: are taxonomically-restricted genes important in evolution?". Trends in Genetics 25 (9): 404–413. doi:10.1016/j.tig.2009.07.006.
  4. Ohno, Susumu (2013-12-11). Evolution by Gene Duplication. Springer Science & Business Media. ISBN 9783642866593.
  5. Zhou, Qi; Zhang, Guojie; Zhang, Yue; Xu, Shiyu; Zhao, Ruoping; Zhan, Zubing; Li, Xin; Ding, Yun; Yang, Shuang (2008-09-01). "On the origin of new genes in Drosophila". Genome Research 18 (9): 1446–1455. doi:10.1101/gr.076588.108. ISSN 1088-9051. PMC 2527705. PMID 18550802.
  6. Toll-Riera, M.; Bosch, N.; Bellora, N.; Castelo, R.; Armengol, L.; Estivill, X.; Alba, M. M. (2009). "Origin of primate orphan genes: a comparative genomics approach". Molecular Biology and Evolution 26 (3): 603–612. doi:10.1093/molbev/msn281. PMID 19064677.
  7. 1 2 3 Wissler, L.; Gadau, J.; Simola, D. F.; Helmkampf, M.; Bornberg-Bauer, E. (2013). "Mechanisms and Dynamics of Orphan Gene Emergence in Insect Genomes". Genome Biology and Evolution 5 (2): 439–455. doi:10.1093/gbe/evt009. PMC 3590893. PMID 23348040.
  8. Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D. (2013-10-17). "De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences". PLoS Genet 9 (10): e1003860. doi:10.1371/journal.pgen.1003860. PMC 3798262. PMID 24146629.
  9. Suenaga, Yusuke; Islam, S. M. Rafiqul; Alagu, Jennifer; Kaneko, Yoshiki; Kato, Mamoru; Tanaka, Yukichi; Kawana, Hidetada; Hossain, Shamim; Matsumoto, Daisuke (2014-01-02). "NCYM, a Cis-Antisense Gene of MYCN, Encodes a De Novo Evolved Protein That Inhibits GSK3β Resulting in the Stabilization of MYCN in Human Neuroblastomas". PLoS Genet 10 (1): e1003996. doi:10.1371/journal.pgen.1003996. PMC 3879166. PMID 24391509.
  10. Carroll, S. B., Grenier, J., & Weatherbee, S. 2009. From DNA to diversity: Molecular genetics and the evolution of animal design. Blackwell Publishing: Oxford.
  11. 1 2 3 Wilson, G. A.; Bertrand, N.; Patel, Y.; Hughes, J. B.; Feil, E. J.; Field, D. (2005). "Orphans as taxonomically restricted and ecologically important genes". Microbiology 151 (8): 2499–2501. doi:10.1099/mic.0.28146-0.
  12. Li, L., Foster, C. M., Gan, Q., Nettleton, D., James, M. G., Myers, A. M. and Wurtele, E. S. (2009), Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves. The Plant Journal, 58: 485–498. doi:10.1111/j.1365-313X.2009.03793.x
  13. Li L, Zheng W, Zhu Y, Ye H, Tang B, Arendsee Z, Jones D, Li R, Ortiz D, Zhao X, Du C, Nettleton D, Scott P, Salas-Fernandez M, Yin Y, Wurtele ES. 2015. The QQS orphan gene regulates carbon and nitrogen partitioning across species via NF-YC interactions. Proc. Nat. Acad. Sci. ePub Nov 2015 doi:10.1073/pnas.1514670112
  14. Jaroszewski, L.; Li, Z.; Krishna, S. S.; Bakolitsa, C.; Wooley, J.; Deacon, A. M.; Godzik, A. (2009). "Exploration of uncharted regions of the protein universe". PLOS Biology 7 (9): 1–15. doi:10.1371/journal.pbio.1000205. PMC 2744874. PMID 19787035.
  15. Donoghue, M.T.A; Keshavaiah, C.; Swamidatta, S.H.; Spillane, C. (2011). "Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana". BMC Evolutionary Biology 11 (1): 47. doi:10.1186/1471-2148-11-47.
  16. Altschul, S. (1 September 1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research 25 (17): 3389–3402. doi:10.1093/nar/25.17.3389. PMC 146917. PMID 9254694.
  17. "NCBI BLAST homepage".
  18. 1 2 Alba, M; Castresana, J (2007). "On homology searches by protein BLAST and the characterization of the age of genes". BMC Evol. Biol. 7: 53. doi:10.1186/1471-2148-7-53.
  19. Moyers, B. A.; Zhang, J. (13 October 2014). "Phylostratigraphic Bias Creates Spurious Patterns of Genome Evolution". Molecular Biology and Evolution 32 (1): 258–267. doi:10.1093/molbev/msu286.
  20. Domazet-Lošo, Tomislav; Brajković, Josip; Tautz, Diethard (2007-01-11). "A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages". Trends in Genetics 23 (11): 533–539. doi:10.1016/j.tig.2007.08.014. ISSN 0168-9525. PMID 18029048.
  21. Carvunis, Anne-Ruxandra; Rolland, Thomas; Wapinski, Ilan; Calderwood, Michael A.; Yildirim, Muhammed A.; Simonis, Nicolas; Charloteaux, Benoit; Hidalgo, César A.; Barbette, Justin; Santhanam, Balaji; Brar, Gloria A.; Weissman, Jonathan S.; Regev, Aviv; Thierry-Mieg, Nicolas; Cusick, Michael E.; Vidal, Marc (24 June 2012). "Proto-genes and de novo gene birth". Nature 487 (7407): 370–374. doi:10.1038/nature11184. PMID 22722833.
  22. Palmieri, Nicola; Kosiol, Carolin; Schlötterer, Christian (19 February 2014). "The life cycle of orphan genes". eLife 3. doi:10.7554/eLife.01311.
  23. Lynch, Michael; Katju, Vaishali (2004-11-01). "The altered evolutionary trajectories of gene duplicates". Trends in genetics: TIG 20 (11): 544–549. doi:10.1016/j.tig.2004.09.001. ISSN 0168-9525. PMID 15475113.
  24. 1 2 Fukuchi, S.; Nishikawa, K. (2004). "Estimation of the number of authentic orphan genes in bacterial genomes". DNA Research 11 (4): 219–231. doi:10.1093/dnares/11.4.311.
  25. Arendsee, Zebulun W.; Li, Ling; Wurtele, Eve Syrkin (November 2014). "Coming of age: orphan genes in plants". Trends in Plant Science 19 (11): 698–708. doi:10.1016/j.tplants.2014.07.003.
  26. Mukherjee, S.; Panda, A.; Ghosh, T.C. (June 2015). "Elucidating evolutionary features and functional implications of orphan genes in Leishmania major". Infection Genetics and Evolution 32: 330–337. doi:10.1016/j.meegid.2015.03.031.
  27. Arendsee, Zebulun W.; Li, Ling; Wurtele, Eve Syrkin (November 2014). "Coming of age: orphan genes in plants". Trends in Plant Science 19 (11): 698–708. doi:10.1016/j.tplants.2014.07.003.
  28. Li, L., Foster, C. M., Gan, Q., Nettleton, D., James, M. G., Myers, A. M. and Wurtele, E. S. (2009), Identification of the novel protein QQS as a component of the starch metabolic network in Arabidopsis leaves. The Plant Journal, 58: 485–498. doi:10.1111/j.1365-313X.2009.03793.x

.

This article is issued from Wikipedia - version of the Saturday, April 23, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.