Sequence Space and Fitness Landscapes
The concept of sequence space is used to illustrate the range of possible combinations
of amino acids that compose the polypeptide chain of a protein. Sequence
space is very large because there are 20 possible amino acids that could occupy
every position of the polypeptide chain. Thus, for an average-sized protein composed
of 300 amino acids, there are 20
300 possible combinations of sequences. This
number is so large that one can only ever sample a minute fraction of total
sequence space. A corollary to this is that most of sequence space is devoid of
function (see Fig. 2.3A) because many combinations of amino acids will not fold
into stable structures. Active stable structures thus appear as islands among a
sea of inactivity. Another way of looking at sequence space is to consider the
fitness landscape (Fig. 2.3B). This shows three enzymatic activities α, β, and γ that
correspond to the three sequences shown in Panel A. As we move across sequence
space, we track across peaks of activity for α, then β, and then γ. Note that between
activities α and b, there is an area of overlap in
|
FIGURE 2.3 (A) Sequence space; (B) Fitness
landscape. α, β, and γ represent enzymes with
different activities. |
which the enzyme is bifunctional, but that between activities β and γ there is no overlapping region. As noted above,
the fact that most enzymes evolve from existing enzymes, it is common for newly
evolved enzymes to be bifunctional with somewhat poorer activity for one or
other of the catalyzed reactions. Also, because of the tendency for duplicated
genes to become excised if there is no selection pressure on them, it is far more
likely for a gene to convert from function α to β because there is always function
that can be selected for, rather than from α or β to γ in which a functionless
intermediate must be maintained.