Proteins: Structure, Function and Genetics 40, 662-674,
Recurrent oligomers in proteins -- an optimal scheme reconciling
accurate and concise backbone representations in automated folding and
design studies
Cristian Micheletti, Flavio Seno and Amos Maritan
Link to online article.
ABSTRACT
A novel scheme is introduced to capture the spatial correlations of
consecutive amino acids in naturally occurring proteins. This
knowledge-based strategy is able to carry out optimally automated
subdivisions of protein fragments into classes of similarity. The goal
is to provide the minimal set of protein oligomers (termed ``oligons''
for brevity) that is able to represent any other fragment. At variance
with previous studies where recurrent local motifs were classified,
our concern is to provide simplified protein representations that have
been optimised for use in automated folding and/or design attempts. In
such contexts it is paramount to limit the number of degrees of
freedom per amino acid without incurring in loss of accuracy of
structural representations. The suggested method finds, by
construction, the optimal compromise between these needs. Several
possible oligon lengths are considered. It is shown that meaningful
classifications cannot be done for lengths greater than 6 or smaller
than 4. Different contexts are considered were oligons of length 5 or
6 are recommendable. With only a few dozen of oligons of such length,
virtually any protein can be reproduced within typical experimental
uncertainties. Structural data for the oligons is made publicly
available.