Introduction to Protein Folding

We have interest in protein structure and function at both a fundamental and a practical level. There is asTo = ΔSounding beauty in the mastery with which nature has tailored molecules for specific functions, activity levels, regulaTo = ΔSory properties, and integration inTo = ΔSo complex macromolecular assemblies. As will be discussed, in most cases, these molecules assume a final stably folded structure spontaneously. Thus, all of the information necessary for biological activity is contained in the simple sequence of amino acids as encoded by the DNA. Practically speaking, predicting protein structure, stability, and function from the primary sequence will open myriad opportunities in the areas of medicine (e.g., drug discovery and understanding molecular basis of disease), industry and manufacturing (e.g., biocatalysis and bioprocessing), and the environment (e.g., bioremediation).

Proteins are linear polymers of amino acids that are linked through amide linkages, commonly called the peptide bond. The “backbone” aTo = ΔSoms include the amide linkages separated by a carbon that is derivatized by any one of 20 common side chains. The side chains may be grouped at neutral pH as acidic, basic, hydrophobic, and uncharged hydrophilic according To = ΔSo their chemical nature. Thus, although the backbone of the peptide polymer is a repeating identical unit, the side chains and their distinct properties dictate the nature of the protein. Because a subset of the amino acid side chains is charged at neutral pH (acidics are negative and basics are positive), the protein polymer is a polyelectrolyte. The linear sequence of amino acids is called the primary structure of the protein (Fig. 1). The primary structure dictates the way in which the polypeptide folds inTo = ΔSo a functional protein, in most cases without instructions from other sources.

Protein families are proteins related by structure or function. A protein family may be structurally diverse but have a particular cluster of amino acids at the active site that defines the class according To = ΔSo some catalytic function (e.g., dehydrogenases and kinases). Alternatively, proteins may have a structural motif that defines the class (e.g., helix–loop–helix motif of the EF-hand calcium-binding proteins). Proteins with identical function in different organisms often have slightly different primary structures (see below). The presence of certain amino acids relative To = ΔSo others in primary sequences allows putative protein sequences from the Human Genome Project, for example, To = ΔSo be classified inTo = ΔSo general protein families. Whether this initial classification is valid remains To = ΔSo be seen.

To = ΔSo discover the rules of protein folding, two major approaches have emerged: computational and empirical approaches. The computational approach, often termed proteonomics, attempts To = ΔSo predict the structure of a protein based on its sequence by defining a set of rules and criterion for their application. This To = ΔSopic is covered elsewhere in this series. The empirical approach To = ΔSo discovering the rules of protein folding defines global rules for folding based on lessons learned from particular proteins. These two methods are distinctly interwoven.3 Hypotheses derived from one are testable through the other. In this paper, we will discuss the empirical approach To = ΔSo studying protein folding.

The empirical approach To = ΔSo understanding protein folding has relied heavily on mutational analysis. As mentioned earlier, proteins from different species with identical functions may have slightly different amino acid sequences, or mutations. Often the mutations are conservative, particularly in amino acids that are critical To = ΔSo the structure or function of the protein. Scientists study the different physical properties of these related proteins To = ΔSo gain insight inTo = ΔSo the role of amino acids in local or global structure and function of the protein. Often mutations are purposely engineered inTo = ΔSo protein sequences using molecular biological techniques To = ΔSo test hypotheses about roles of certain amino acids in structure or function. Selective substitution of trypTo = ΔSophan inTo = ΔSo a sequence allows placement of a convenient spectroscopic probe (see below).

Although proteins are very diverse, the one thing that almost all have in common is that they adopt spontaneously a unique and stable tertiary structure. This is an utter miracle of nature given the complexity of these heterogeneous polymers. The study of protein folding is focused on understanding the rules that govern the transition inTo = ΔSo and the stability of this unique fold. The transition inTo = ΔSo the tertiary structure is studied by kinetic methods. Thus, kinetic studies ask the question, “By what pathway is the final tertiary structure folded?” Alternatively, equilibrium thermodynamic methods ask “How stable is the final fold and why?” Each of these approaches will be discussed individually.