The primary data of sequencing projects are DNA sequences. These become only really valuable through their annotation. Several layers of analysis with bioinformatics tools are necessary to arrive from a raw DNA sequence at an annotated protein sequences:
Some completely automated annotation systems have been developed (e.g., GENEQUIZ), which use a multitude of different programs and methods.
The protein sequences are further analysed to predict function. The function can often be inferred if a sequence of a homologous protein with known function can be found. Homology searches are the predominant bioinformatics application, and very efficient search methods have been developed . The often difficult distinction between orthologous sequences and paralogous sequences facilitates the functional annotation in the comparison of whole genomes. Several methods detect glycolysation, myristylation and other sites, and the prediction of signal peptides in the amino acid sequence give valuable information about the subcellular location of a protein.
The ultimate goal of sequence annotation is to arrive at a complete functional description of all genes of an organism. However, function is an ill-defined concept. Thus, the simplified idea of "one gene - one protein - one structure - one function" cannot take into account proteins that have multiple functions depending on context (e.g., subcellar location and the presence of cofactors). Well-known cases of "moonlighting" proteins are lens crystalline and phosphoglucose isomerase. Currently, work on ontologies is under way to explicitly define a vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing.
Families of similar sequences contain information on sequence evolution in the form of specific conservation patters at all sequence positions. Multiple sequence alignments are useful for
Many complete genomes of microorganisms and a few of eukaryotes are available . By analysis of entire genome sequences a wealth of additional information can be obtained. The complete genomic sequence contains not only all protein sequences but also sequences regulating gene expression. A comparison of the genomes of genetically close organisms reveals genes responsible for specific properties of the organisms (e.g., infectivity). Protein interactions can be predicted from conservation of gene order or operon organisation in different genomes. Also the detection of gene fusion and gene fission (i.e, one protein is split into two in another genome) events helps to deduce protein interactions.
© 2018 Biocyclopedia | All rights reserved.