Technical Highlight - July 2012
Short description: Membrane protein 3D folds are accurately predicted from evolutionary constraints derived from genomic sequencing.
Protein sequence families embody an evolutionary record of mutations that sustain protein structure and function over the course of species diversification. Sequence variation and functional integrity can be dually achieved via correlated mutations, whereby sets of amino acids engaged in long-range contacts mutate simultaneously, with retention of favorable interactions in a functional mutant. In a study both predictive and practical, Marks and colleagues hypothesize that nature imposes constraints on mutation sets to preserve contacts deemed critical to protein structure and function. The authors examine the possibility of identifying meaningful evolutionary constraints from genomic sequencing data for use in predicting three-dimensional (3D) structures of transmembrane proteins, which represent over 25% of all human proteins and over half of all drug targets.
The predictive algorithm, EVfold_membrane, uses a maximum entropy approach to derive evolutionary constraints from correlated mutations identified through multiple sequence alignments. The constraint set is supplemented with predicted secondary structural elements and filtered to remove contacts that conflict with transmembrane topology. Distance constraints are imposed on extended polypeptide chains, which undergo ab initio folding via distance geometry (DG) and simulated annealing. Because DG allows direct translation of constraints to 3D coordinates, the protocol trumps de novo protein folding strategies limited by massive conformational search space.
Performance was benchmarked by computing 3D folds for 25 established transmembrane proteins from diverse families. Comparison of coordinates from predicted versus crystal structures revealed unparalleled levels of agreement (template modeling scores > 0.5 in 22 cases). The method wielded strong predictive power for functionally relevant motifs; residues with multiple pair constraints were localized to substrate binding pockets, oligomeric interfaces, and/or involved in conformational changes. When applied to sequence families representing transmembrane proteins of unknown structure (with up to 14 helices), several predicted structures shared 3D folds with sequence-distant yet functionally related proteins. Challenges remain, including distinguishing intra- from intermonomer contacts, as well as couplings arising from distinct conformations. Its applications include complementing experimental structure determination methods, guiding rational drug design and functional mutation experiments, and engineering proteins. Predicting protein structure from evolutionary constraints encrypted in sequence families promises to harness the potential of the genomic age.