Structural Biology as a Data Rich Science

Proteins mediate the lion's share of the functions necessary to maintain life. The diversity of functions proteins can perform is truly amazing and it is by studying protein structures that we discover how these functions are facilitated. Additionally, the computational study of protein structures reveals evolutionary and biophysical findings. The exponential rise in the number and diversity of protein structures brought upon by structural genomics efforts is enabling us to ask new questions about the function and evolution of proteins. Structural biology has just crossed the threshold of its own genomic revolution, and we need to address a new science, using new questions and new tools. In the first part of my talk, I will discuss the functional richness of protein families, and the implications for the Protein Structure Initiative (PSI). PSI is a broad enterprise of several centers aiming to provide a complete coverage of protein structure space. Since it is not feasible to experimentally determine the structures of all proteins, it is generally agreed that the only viable strategy to achieve such coverage is to carefully select specific proteins (“targets”), determine their structure experimentally, and then use comparative modeling techniques to model the rest. We suggest that the structural genomics community, in addition to any adopted target selection strategy, should also take care to study representatives of families that are predicted to have significant functional variations within known structural fold groups. The reason being that a structural template may be falsely taken to be a functional template as well for modeling. A function-centric approach to target selection will provide us with a more accurate and comprehensive view of the link between protein structure and function.

I will show how this function identification approach can help us view a protein family's functional landscape in the same way the binocular focus dial is used to obtain a better view of a real landscape. Time permitting, in the second part of my talk I will discuss a novel method for the fast searching of structural databases, using a short fragment based representation of protein structures. I will show how these fragments allow us to describe secondary structures in a richer manner, and how fragment based alignments may be used for high throughput applications.