Fold recognition benchmark


benchmark.tar.gz, the full benchmark in a single file

Summary

A fold recognition benchmark is used to evaluate fold recognition methods. It contains a list of protein pairs with similar structures but with distant sequence homology. More pairs a fold recognition method can pick up, more sensitive it is.

The similarity of structure can be divided into several different levels according to the scop database. They are fold, superfamily and family. Generally, simple BLAST or PSI-BLAST can recognize the similarity level of family and superfamily. So for new fold recognition methods, only the prediction of similarity level of fold is of interests.

A benchmark here has been prepared from scop database version 1.53. The ASTRAL compendium provides all seuences of scop domains clustered at different sequence identities. This benchmark is calculated from scopd30, which containing 2417 sequences longer than 40 amino acids.

Preparation

An all-against-all BLAST comparison were performed for the sequences and pairs of scop domain with same scop fold type that could not be recognized with BLAST expect value batter than 0.1 were collected. The current benchmark contains 11,853 remotely related domain pairs.