This branch of bioinformatics is concerned with computational approaches to predict and analyse the spatial structure of proteins and nucleic acids. Whereas in many cases the primary sequence uniquely specifies the three-dimensional (3D) structure, the specific rules are not well understood, and the protein folding problem remains largely unsolved. Some aspects of protein structure can already be predicted from amino acid content. Secondary structure can be deduced from the primary sequence with statistics or neural networks. When using a multiple sequence alignment, secondary structure can be predicted with an accuracy above 70%.
3D models can be obtained most easily if the 3D structure of a homologous protein is known (homology modelling, comparative modelling). A homology model can only be as good as the sequence alignment: whereas protein relationships can be detected at the 20% identity level and below, a correct sequence alignment becomes very difficult, and the homology model will be doubtful. From 40 to 50% identity the models are usually mostly correct; however, it is possible to have 50% identity between two carefully designed protein sequences with different topology (the so-called JANUS protein). Remote relationships that are undetectable by sequence comparisons may be detected by sequence-to-structure-fitness (or threading) approaches: the search sequence is systematically compared to all known protein structures. Ab initio predictions of protein 3D structure remains the major challenge; some progress has been made recently by combining statistical with force-field based approaches.
Membrane proteins are interesting drug targets. It is estimated that membrane receptors form 50 % of all drug targets in pharmacological research. However, membrane proteins are underrepresented in the PDB structure database. Since membrane proteins are usually excluded from structural genomics initiatives due to technical problems, the prediction of transmembrane helices and solvent accessibility is very important. Modern methods can predict transmembrane helices with a reliability greater than 70%.
Understanding the 3D structure of a macromolecule is crucial for understanding its function. Many properties of the 3D structure cannot be deduced directly from the primary sequence. Obtaining better understanding of protein function is the driving force behind structural genomics efforts, which can be thus understood as part of functional genomics. Similar structure can imply similar function. General structure-to-function relationships can be obtained by statistical approaches, for example, by relating secondary structure to known protein function or surface properties to cell location.
The increased speed of structure determination necessary for the structural genomics projects make an independent validation of the structures (by comparison to expected properties) particularly important. Structure validation helps to correct obvious errors (e.g., in the covalent structure) and leads to a more standardized representation of structural data, e.g., by agreeing on a common atom name nomenclature. The knowledge of the structure quality is a prerequisite for further use of the structure, e.g in molecular modelling or drug design.
In order to make as much data on the structure and its determination available in the databases, approaches for automated data harvesting are being developed. Structure classification schemes, as implemented for example in the SCOP, CATH, and FSSP databases, elucidate the relationship between protein folds and function and shed light on the evolution of protein domains.
Combined analysis of structural and genomic data will certainly get more important in the near future. Protein folds can be analysed for whole genomes. Protein-protein interactions predicted on the sequence level, can be studied in more detail on the structure level. Single Nucleotide Polymorphisms can be mapped on 3D structures of proteins in order to elucidate specific structural causes of disease.
More detailed aspects of protein function can be obtained also by force-field based approaches. Whereas protein function requires protein dynamics, no experimental technique can observe it directly on an atomic scale, and motions have to be simulated by molecular dynamics (MD) simulations. Also free energy differences (for example between binding energies of different protein ligands) can be characterized by MD simulations. Molecular mechanics or molecular dynamics based approaches are also necessary for homology modelling and for structure refinement in X-ray crystallography and NMR structure determination.
Drug design exploits the knowledge of the 3D structure of the binding site (or the structure of the complex with a ligand) to construct potential drugs, for example inhibitors of viral proteins or RNA. In addition to the 3D structure, a force field is necessary to evaluate the interaction between the protein and a ligand (to predict binding energies). In virtual screening, a library of molecules is tested on the computer for their capacities to bind to the macromolecule.