Properties of Genetic Code
Following properties of the genetic code were proved by definite experimental evidence : (i) the code is
triplet, (ii) the code is degenerate, (iii) the code is
non-overlapping, (iv) the code is commaless, (v) the code is non-ambiguous and (vi) the code is
universal. Although it may not be necessary to present the experimental evidences which proved the validity of these properties, it may be useful to explain the meaning of these six properties of the genetic code listed above.
The code is triplet
As earlier outlined, singlet and doublet codes are not enough to code for 20 amino acids; it was pointed out that triplet code is the minimum required. But it could be a quadruplet code or of a higher order. As pointed out above, in a triplet code of 64 codons, there is an excess of 44 codons and, therefore, more than one codons are present for the same amino acid. This excess will be still greater if more than three-letter words are used. In the quadruplet code there will be 4x4x4x4 = 256 possible words.
The code is degenerate
In a triplet code, as pointed out earlier also, for a particular amino acid more than cine words (synonyms) can be used. This phenomenon is described by saying that the code is degenerate. A non-degenerate code would be one where there is one to one relationship between amino acids and the codons, so that 44 codons out of 64, will be useless or nonsense codons. It has been definitely shown that there are no nonsense codons. The codons which were earlier called nonsense codons are also now known to mean stop signals.
Fig. 30.2. Overlapping of codons due to one letter or two letters.
Fig. 30.3. Genetic code, without comma and with comma.
Fig. 30.4. Genetic code dictionary.
The code is non-overlapping
Non-overlapping code means that a base in a mRNA is not used for two different codons. In Figure 30.2 it is shown that an overlapping code can mean coding for four amino acids from six bases. In actual practice six bases code for not more than two amino acids. However, in overlapping genes described in
Organization of Genetic Material 3. Split Genes, Overlapping Genes and Pseudogenes, it is shown that the same base can be used for different codons, but only at different occasions in time and/or space, so that the same base can not be used for two different codons during synthesis of the same protein. This is exactly, what we mean by non-overlapping code.
Fig. 30.2. Overlapping of codons due to one letter or two letters.
A commaless code means that no punctuations are needed between any two words. In other words, we can say that after one amino acid is coded, the second amino acid will be automatically coded by the next three letters and that no letters are wasted for telling that one amino acid has been coded and that now second should be coded (Fig. 30.3).
Fig. 30.3. Genetic code, without comma and with comma.
The code is non-ambiguous
Non-ambiguous code means that there is no ambiguity about a particular codon. A particular codon will always code for the same amino acid, wherever it is found. In an ambiguous code, the same codon could have different meanings, or in other words, the same codon could code two or more than two different amino acids. Such is not the case.
While the same amino acid can be coded by more than one codons (the code is degenerate), the same codon shall never code for two different amino acids (non ambiguous). However, there is some element of ambiguity when AUG and GUG are considered; both may code for
methionine as. initiating codons, although GUG is meant for
valine (Fig. 30.4). Moreover, a different code exists in mitochondria of some eukaryotes, so that in cytoplasm and mitochondria same codon may code for different amino acids. This may not mean ambiguity, since in mitochondria there is a separate genetic code which differs from universal code in some essential respects.(Also see Recoding of the Genetic Code' later in this section).
Fig. 30.4. Genetic code dictionary.
The code is universal
Although the code has been worked out by using
in vitro systems prepared usually from micro-organisms, there is no doubt now that in all kinds of living organisms, micro or macro, plants or animals, the same genetic code is used. However, as will be seen later in this section, a different and more primitive genetic code exists in mitochondria of some organisms.
Codon Assignments
Although theoretical considerations in 1950's had suggested that the genetic code should be triplet in nature, it was not possible to say which codon of
the possible 64 codons should code for which of
the 20 essential amino acids. The first clue to this problem came when
M.W. Nirenberg (Nobel Prize winner with
H.G. Khorana and
R.W. Holley in 1968) used
in vitro system for the synthesis of a polypeptide using an artificially synthesized mRNA molecule. The use of artificial mRNA molecule has a definite advantage, since one could know something about its structure.
Assignment of codons with unknown sequence Polyuridylic acid or poly-U and other homopolymers. In the first place,
M.W. Nirenberg and
J.H. Matthaei synthesized RNA using only uracil, so that all -.long the length of mRNA, there was no other base and the only possible triplet was UUU. When such a poly-U
(RNA) was used in cell free synthesis of a polypeptide (using cell extracts from
E. coli and supplying all the required components of protein synthesizing machinery), only
polyphenylalanine was synthesized, meaning that the only amino acid coded was phenylalanine. It was, therefore, immediately concluded that the triplet UUU coded for the amino acid
phenylalanine. This news that the genetic code had been cracked splashed in the newspapers all over the world in 1961. Subsequently, poly A gave
polylysine and poly C gave polyproline. Therefore, AAA was assigned
to lysine and CCC to
proline. Similar experiments with poly G were not successful, because it attains secondary structure and thus cannot attach to ribosomes. The following three codons thus could be assigned, using homopolymers of RNA, without, any difficulty; UUU =
phenylalanine; AAA =
lysine; CCC =
proline.
Copolymers. Having successfully used the homopolymers, Nirenberg and his co-workers used RNA synthesized by using two or more bases. For instance, if only A and C are used, poly AC will consist of eight possible codons namely AAA, AAC, ACA, CAA, CCA, CAC, ACC and CCC. The proportion of these eight codons in the synthetic RNA can be calculated if the known quantities of A and C are used for the synthesis of poly AC. For instance, if A : C = 5 : 1, (5/6 is A and 1/6 is C), the calculated relative proportions of eight codons on random basis would be as given in Table 30.1.
The calculated relative proportions of codons were compared with the proportions in which different amino acids were present in the polypeptides synthesized using poly AC. For instance, if an amino acid is l/5th of lysine (coded by AAA), we can say that it should be coded by one of the three possible 2A1C codons (AAC, ACA) or CAA). Similar reasoning would allow assignments of 1A2C as well as 3C. However, using this technique, it was not possible to assign the three codons of the category 2A1C (i.e. AAC, ACA, CAA) to three amino acids, since these will be present in equal quantities. Therefore, the codons were initially assigned only with respect to base composition, ignoring the sequences of the bases in codons, as done in the above example. The assignments are given in Table 30.2.
Assignment of codons with known base sequence
A large number of synthetic copolymers of unknown sequences was used for preparing a dictionary where codon compositions were assigned to amino acids. The next problem and a serious one was to sort out the three codons in one base composition e.g. 2A1C or 1A2C. In other words, we can say that the next problem was to work out the sequences of bases in codons, whose composition was already known. This was done mainly by the following methods.
Binding technique of Nirenberg and Leder. M.W. Nirenberg and
P. Leder in 1964 found that if a synthetic trinucleotide for a known sequence (with known bases at 5' end and 3' end) is used with ribosome and a particular aminoacyl-tRNA (tRNA having its own specific amino acid attached), these will form a complex, provided the used codon codes for the amino acid attached to the given aminoacyl tRNA.
Codon
1 + Ribosome + AA
1-tRNA → Ribosome-Codon
1-AA
1 –tRNA
1
In a process such as above, if given AA
1 is used with a given codon
1 and the formation of the complex is detected, this would prove that the given codon codes for the given amino acid.
It was also observed that while the free AA-tRNA passes through nitrocellulose membrane, the
ribosome-codon-AA-tRNA complex adsorbs on such a membrane. If in a particular mixture only one of the amino acids is made radioactive, then the presence or absence of the radioactivity on the nitrocellulose membrane will show whether there is a relationship between the codon and the amino acid which was made radioactive.
For instance, 20 samples of a mixture of all 20 amino acids may be taken and in each sample one amino acid is made radioactive in such a manner that each and every amino acid is made radioactive in one sample or
the other, and no two samples have same radioactive amino acid. A particular sample would be then known by its radioactive amino acid. Now tRNAs and ribosomes are mixed with each sample and same codon is used for complex formation in all 20 cases. When the mixture is poured on the nitrocellulose membrane, radioactivity on membrane will be observed only when the radioactive amino acid is taking part in the formation of the complex.
Since in each sample the radioactive amino acid is known it would be possible to detect the amino acid coded by a given codon by the presence of radioactivity on the membrane. Such a treatment was given by Nirenberg and his co-workers to all the 64 synthetic codons, and their respective amino acids were identified. The binding of AA-tRNA was not equally efficient in all cases. Therefore, the sequences of bases in only about 45 codons could be worked out by this method.
Copolymers of repetitive sequences by H.G. Khorana. As outlined above, sufficient information regarding the sequences of bases in codons was available through the painstaking work of Nirenberg and co-workers. However,
H.G. Khorana, almost at the same time, devised an ingenious technique for the same purpose. Using synthetic DNA, Khorana and his co-workers could prepare polyribonucleotides (RNA) with known repeating sequences. A repeating sequence means that, if CU are two bases, these will be repeatedly present throughout the length as follows : CUCUCUCUCUCUCU
In a similar manner, if ACU are three bases they will be present repeatedly as follows : ACUACUACUACU
Such copolymers will direct the incorporation of amino acids in a manner which can be theoretically predicted. For instance, in (CU)
n = (CUC/UCU/CUC/UCU), only two codons are possible and these are CUC and UCU. Moreover, these codons are present in alternating sequence. The result would be that the polypeptide formed would have only two amino acids in alternating sequences. These two amino acids can be assigned to the two codons (Table 30.3).
We may similarly consider a repeating sequence of three bases e.g. (ACG)
n. Depending . upon where the reading is started, three kinds of homopolypeptides are expected (Table 30.4). Actual codon assignment i.e. to find out which of the three codons codes for which amino acid would depend upon the previous information available regarding the composition of bases in different codons coding for different amino acids. This information was discussed earlier in this section.
On the basis of the above techniques, a complete genetic code dictionary could be prepared which is given in Figure 30.4. It can be easily noticed that there are two codons namely AUG and GUG, which are designated as initiation codons and there are three codons UAA, UAG and UGA which are designated as termination codons.
Fig. 30.4. Genetic code dictionary.