Sequence Space and Fitness Landscapes

The concept of sequence space is used to illustrate the range of possible combinations of amino acids that compose the polypeptide chain of a protein. Sequence space is very large because there are 20 possible amino acids that could occupy every position of the polypeptide chain. Thus, for an average-sized protein composed of 300 amino acids, there are 20300 possible combinations of sequences. This number is so large that one can only ever sample a minute fraction of total sequence space. A corollary to this is that most of sequence space is devoid of function (see Fig. 2.3A) because many combinations of amino acids will not fold into stable structures. Active stable structures thus appear as islands among a sea of inactivity. Another way of looking at sequence space is to consider the fitness landscape (Fig. 2.3B). This shows three enzymatic activities α, β, and γ that correspond to the three sequences shown in Panel A. As we move across sequence space, we track across peaks of activity for α, then β, and then γ. Note that between activities α and b, there is an area of overlap in

FIGURE 2.3 (A) Sequence space; (B) Fitness landscape. α, β, and γ represent enzymes with different activities.
FIGURE 2.3 (A) Sequence space; (B) Fitness landscape. α, β, and γ represent enzymes with different activities.

which the enzyme is bifunctional, but that between activities β and γ there is no overlapping region. As noted above, the fact that most enzymes evolve from existing enzymes, it is common for newly evolved enzymes to be bifunctional with somewhat poorer activity for one or other of the catalyzed reactions. Also, because of the tendency for duplicated genes to become excised if there is no selection pressure on them, it is far more likely for a gene to convert from function α to β because there is always function that can be selected for, rather than from α or β to γ in which a functionless intermediate must be maintained.