Canonical sequence

A canonical sequence is a sequence of DNA, RNA, or amino acids that reflects the most common choice of base or amino acid at each position. Many databases use or only give the canonical sequence. The UniProtKB/Swiss-Prot policy for example describes all the protein products encoded by one gene and uses the following criteria for the entry of a canonical sequence:[1]

  1. It is the most prevalent.
  2. It is the most similar to orthologous sequences found in other species.
  3. By virtue of its length or amino acid composition, it allows the clearest description of domains, isoforms, polymorphisms, post-translational modifications, etc.
  4. In the absence of any information, we choose the longest sequence.

See also


This article is issued from Wikipedia - version of the Tuesday, October 13, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.