Recombination And/Or Introduction Ofsubsequent Mutations

Directed evolution experiments differ from traditional mutation-selection experiments in that they typically involve cycles of improvement. This can be done in a sequential fashion by identifying the most improved single variant and subjecting it to further cycles of mutagenesis and screening/selection until a variant that meets desired criteria is reached, or until further cycles fail to produce increases in the desired property. However, this method tends to be slow and laborious. A better method for recombining many improved variants is known as gene shuffling (Stemmer, 1994b). This method is a variation on the mutagenesis method described above in which genes are partially digested with the use of DNase and subsequently reassembled by primerless PCR, except that a pool of improved variants are used for the starting material rather than a single gene. The result of this procedure is to make a new library of variants in which mutations from different improved variants are recombined in many permutations and combinations. In some variants, different positive amino acid changes that independently improved the property of interest provide either additive or multiplicative improvements in performance. In other cases, positive mutations could be partially obscured by negative mutations, so the process of improvement involves both summation of positive mutations and, at the same time, elimination of negative mutations. The removal of negative mutations can also be achieved by backcross PCR. This technique is analogous to a traditional genetic backcross experiment, but in this case, the improved variant is recombined with a molar excess of parental gene, and the resulting variants screened for activity.
By performing several cycles of this procedure, typically 4–7 rounds of screening/ recombination of improved variants, improvements in performance of 101–104 have been documented. Examples of successes using this technique include conversion of a galactosidase into a fucosidase (Zhang et al., 1997), increasing the activity of a thermophylic enzyme at low temperatures (Merz et al., 2000), and the evolution of antibody-phage libraries (Crameri et al., 1996).

While single gene shuffling is capable of generating huge changes in activity with regard to specific substrates, it is still a fairly inefficient process. The reason for this is that the variability introduced into the initial gene is somewhat limited by the particular method of mutagenesis employed. A major advance in efficiency of gene shuffling was made by incorporating several genes rather than a single gene as a starting point, a process referred to as family shuffling (Crameri et al., 1998). In this method, several homologues (e.g., α, β, and γ of Fig. 2.3) with >50% sequence identity are shuffled together and the resulting variants screened for activity as described for single gene shuffling. An extension of this method is called synthetic shuffling in which information from sequence comparisons is used to generate oligonucleotides so that every amino acid from a set of parents is allowed to recombine independently (Ness et al., 2002).
Synthetic shuffling has the advantage that shuffling can be used to exploit sequence information and does not require all of the physical genes to be in hand to perform the experiment. Improvements employing these methods are shown to be more rapid and larger in magnitude. The rational for this is that each homologue represents a variant on the same protein fold and that during natural evolution each of the genes has accumulated different positive sequence attributes that contribute to the overall enzyme performance. During family shuffling, there is the potential to sum these positive attributes to produce rapid increases in performance. Another way to think of this is that a single gene shuffling experiment is essentially starting off at a single point and radiating from there in sequence space. For multigene shuffling, one starts with several independent points in sequence, space, and combinations of each of the genes cover a larger portion of sequence space than could be achieved from a single point (Fig. 2.3A).


FIGURE 2.3 (A) Sequence space; (B) Fitness landscape. α, β, and γ represent enzymes with different activities.
FIGURE 2.3 (A) Sequence space; (B) Fitness landscape. α, β, and γ represent enzymes with different activities.
Subsequent rounds of recombination and screening occur as before for single gene shuffling. An important and intriguing finding from family gene shuffling experiments is that when genes are shuffled, instead of getting activities that are intermediate between the members shuffled, new activities beyond the range of the individuals are identified. This is true for not only activities but also for qualitative parameters such as range of region specificities. This has far-reaching implications in that new diversity of biocatalysts can actually arise for parameters previously not found in nature. The necessary criteria for exploiting this phenomenon are to identify and recombine the optimal parental genes and to have in place a robust high throughput screen that has an excellent signal-to-noise ratio for the property of interest. In addition to DNaseI-based recombination techniques, there are other effective methods such as staggered extension process StEP PCR (Aguinaldo and Arnold, 2003).

An interesting variation on single and multiple gene shuffling is that of pathway and whole organism shuffling (Crameri et al., 1997; Zhang et al., 2002). These broader-scale methods allow changes in regulatory elements, in addition to changes in the coding regions to contribute to improved activity.