PROTEIN DESIGN
· Protein design is the rational design of new protein molecules to design novel activity, behavior, or purpose, and to advance basic understanding of protein function. Proteins can be designed from scratch (de novo design) or by making calculated variants of a known protein structure and its sequence (termed protein redesign). Rational protein design approaches make protein-sequence predictions that will fold to specific structures. These predicted sequences can then be validated experimentally through methods such as peptide synthesis, site-directed mutagenesis, or artificial gene synthesis.
· Rational protein design dates back to the mid-1970s. Recently, however, there were numerous examples of successful rational design of water-soluble and even transmembrane peptides and proteins, in part due to a better understanding of different factors contributing to protein structure stability and development of better computational methods.
· The goal in rational protein design is to predict amino acid sequences that will fold to a specific protein structure. Although the number of possible protein sequences is vast, growing exponentially with the size of the protein chain, only a subset of them will fold reliably and quickly to one native state. Protein design involves identifying novel sequences within this subset. The native state of a protein is the conformational free energy minimum for the chain. Thus, protein design is the search for sequences that have the chosen structure as a free energy minimum. In a sense, it is the reverse of protein structure prediction. In design, a tertiary structure is specified, and a sequence that will fold to it is identified. Hence, it is also termed inverse folding. Protein design is then an optimization problem: using some scoring criteria, an optimized sequence that will fold to the desired structure is chosen.
· When the first proteins were rationally designed during the 1970s and 1980s, the sequence for these was optimized manually based on analyses of other known proteins, the sequence composition, amino acid charges, and the geometry of the desired structure. The first designed proteins are attributed to Bernd Gutte, who designed a reduced version of a known catalyst, bovine ribonuclease, and tertiary structures consisting of beta-sheets and alpha-helices, including a binder of DDT.
· The first protein successfully designed completely de novo was done by Stephen Mayo and coworkers in 1997, and, shortly after, in 1999 Peter S. Kim and coworkers designed dimers, trimers, and tetramers of unnatural right-handed coiled coils. In 2003, David Baker's laboratory designed a full protein to a fold never seen before in nature. Later, in 2008, Baker's group computationally designed enzymes for two different reactions. In 2010, one of the most powerful broadly neutralizing antibodies was isolated from patient serum using a computationally designed protein probe.
Target structure
· Protein design programs use computer models of the molecular forces that drive proteins in in vivo environments. In order to make the problem tractable, these forces are simplified by protein design models. Although protein design programs vary greatly, they have to address four main modeling questions: What is the target structure of the design, what flexibility is allowed on the target structure, which sequences are included in the search, and which force field will be used to score sequences and structures.
Sequence space
· In rational protein design, proteins can be redesigned from the sequence and structure of a known protein, or completely from scratch in de novo protein design. In protein redesign, most of the residues in the sequence are maintained as their wild-type amino-acid while a few are allowed to mutate. In de novo design, the entire sequence is designed anew, based on no prior sequence.
· Both de novo designs and protein redesigns can establish rules on the sequence space: the specific amino acids that are allowed at each mutable residue position
Structural flexibility
· In protein design, the target structure (or structures) of the protein are known. However, a rational protein design approach must model some flexibility on the target structure in order to increase the number of sequences that can be designed for that structure and to minimize the chance of a sequence folding to a different structure.
· In the simplest models, the protein backbone is kept rigid while some of the protein side-chains are allowed to change conformations. However, side-chains can have many degrees of freedom in their bond lengths, bond angles, and χ dihedral angles. To simplify this space, protein design methods use rotamer libraries that assume ideal values for bond lengths and bond angles, while restricting χ dihedral angles to a few frequently observed low-energy conformations termed rotamers.
· Rotamer libraries are derived from the statistical analysis of many protein structures. Backbone-independent rotamer libraries describe all rotamers.[10] Backbone-dependent rotamer libraries, in contrast, describe the rotamers as how likely they are to appear depending on the protein backbone arrangement around the side chain.[11] Most protein design programs use one conformation (e.g., the modal value for rotamer dihedrals in space) or several points in the region described by the rotamer; the OSPREY protein design program, in contrast, models the entire continuous region.
· Although rational protein design must preserve the general backbone fold a protein, allowing some backbone flexibility can significantly increase the number of sequences that fold to the structure while maintaining the general fold of the protein.[13] Backbone flexibility is especially important in protein redesign because sequence mutations often result in small changes to the backbone structure. Moreover, backbone flexibility can be essential for more advanced applications of protein design, such as binding prediction and enzyme design. Some models of protein design backbone flexibility include small and continuous global backbone movements, discrete backbone samples around the target fold, backrub motions, and protein loop flexibility
Energy function
· The most accurate energy functions are those based on quantum mechanical simulations.
· --such simulations are too slow and typically impractical for protein design.
· Instead, many protein design algorithms use either physics-based energy functions adapted from molecular mechanics simulation programs, knowledge based energy-functions, or a hybrid mix of both. The trend has been toward using more physics-based potential energy functions.
As an optimization problem
· The goal of protein design is to find a protein sequence that will fold to a target structure. A protein design algorithm must, thus, search all the conformations of each sequence, with respect to the target fold, and rank sequences according to the lowest-energy conformation of each one, as determined by the protein design energy function. Thus, a typical input to the protein design algorithm is the target fold, the sequence space, the structural flexibility, and the energy function, while the output is one or more sequences that are predicted to fold stably to the target structure.
Algorithms
· Several algorithms have been developed specifically for the protein design problem. These algorithms can be divided into two broad classes: exact algorithms, such as dead-end elimination, that lack runtime guarantees but guarantee the quality of the solution; and heuristic algorithms, such as Monte Carlo, that are faster than exact algorithms but have no guarantees on the optimality of the results. Exact algorithms guarantee that the optimization process produced the optimal according to the protein design model. Thus, if the predictions of exact algorithms fail when these are experimentally validated, then the source of error can be attributed to the energy function, the allowed flexibility, the sequence space or the target structure (e.g., if it cannot be designed for).
With mathematical guarantees
Dead-end elimination
· The dead-end elimination (DEE) algorithm reduces the search space of the problem iteratively by removing rotamers that can be provably shown to be not part of the global lowest energy conformation (GMEC). On each iteration, the dead-end elimination algorithm compares all possible pairs of rotamers at each residue position, and removes each rotamer r′i that can be shown to always be of higher energy than another rotamer ri and
· Other powerful extensions to the dead-end elimination algorithm include the pairs elimination criterion, and the generalized dead-end elimination criterion. This algorithm has also been extended to handle continuous rotamers with provable guarantees.
· Although the Dead-end elimination algorithm runs in polynomial time on each iteration, it cannot guarantee convergence. If, after a certain number of iterations, the dead-end elimination algorithm does not prune any more rotamers, then either rotamers have to be merged or another search algorithm must be used to search the remaining search space. In such cases, the dead-end elimination acts as a pre-filtering algorithm to reduce the search space, while other algorithms, such as A*, Monte Carlo, Linear Programming, or FASTER are used to search the remaining search space.
Vaccine design
Rekombinant DNA technology
Rational
NGT
Epitop mapping
Adjuvant
Multi
Nucleic acid design
Termodinamic model
Optimizing affinity
Specificity
Sequence symmetry minimizing
Free-energy