E Sarti, S Zamuner, P Cossio, A Laio, F Seno, A Trovato,
BACHSCORE. A tool for evaluating efficiently and reliably the quality of large sets of protein structures
COMPUTER PHYSICS COMMUNICATIONS, 184, 2860 (2013)
In protein structure prediction it is of crucial importance, especially at the refinement stage, to score efficiently large sets of models by selecting the ones that are closest to the native state. We here present a new computational tool, BACHSCORE, that allows its users to rank different structural models of the same protein according to their quality, evaluated by using the BACH++ (Bayesian Analysis Conformation Hunt) scoring function. The original BACH statistical potential was already shown to discriminate with very good reliability the protein native state in large sets of misfolded models of the same protein. BACH++ features a novel upgrade in the solvation potential of the scoring function, now computed by adapting the LCPO (Linear Combination of Pairwise Orbitals) algorithm. This change further enhances the already good performance of the scoring function. BACHSCORE can be accessed directly through the web server: bachserver.pd.infn.it. Program summary Program title: BACHSCORE Catalogue identifier: AEQD_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEQD_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 3 No. of lines in distributed program, including test data, etc.: 130159 No. of bytes in distributed program, including test data, etc.: 24 687 455 Distribution format: tar.gz Programming language: C++. Computer: Any computer capable of running an executable produced by a g++ compiler (4.6.3 version). Operating system: Linux, Unix OS-es. RAM: 1 073 741 824 bytes Classification: 3. Nature of problem: Evaluate the quality of a protein structural model, taking into account the possible "a priori" knowledge of a reference primary sequence that may be different from the amino-acid sequence of the model; the native protein structure should be recognized as the best model. Solution method: Model quality is assessed through a scoring function defined is' a linear combination of two statistical potentials: a pairwise residue residue contact potential and a single-residue solvation potential. Potential parameters are determined with a probabilistic Bayesian analysis on a data set of protein native structures. 1. The contact potential scores the occurrence of any given type of residue pair in 5 possible contact classes (a-helical contact, parallel beta-sheet contact, anti-parallel beta-sheet contact, side-chain contact, no contact). 2. The solvation potential scores the occurrence of any residue type in 2 possible environments: buried and solvent exposed. Residue environment is assigned by adapting the LCPO algorithm. 3. Residues present in the reference primary sequence and not present in the model structure contribute to the model score as solvent exposed and as non contacting all other residues. Restrictions: Input format file according to the Protein Data Bank standard Additional comments: Parameter values used in the scoring function can be found in the file /folder-to-bachscore/BACH/examples/bach_std.par. Running time: Roughly one minute to score one hundred structures on a desktop PC, depending on their size. (C) 2013 Elsevier B.V. All rights reserved.