Structural Genomics: Target Selection and Assessment

The ultimate goal of structural genomics (SG) is to solve three-dimensional structures for all proteins in the genomes, either experimentally or computationally. This is currently not feasible due to technical limitations. Target selection, therefore, becomes a critical strategic issue confronting structural genomics projects. Within the context of selecting targets for Northeast Structural Genomics (NESG) Consortium, a major structural genomics center funded by NIH, I have developed two domain-dissecting methods and a protein family clustering algorithm. Applications of these methods have led to over 300 protein structures solved by NESG, most of which provided novel structural information for large sequence families. To assess the impact of SG, I introduced a measure, named ?novel leverage?, to evaluate the potential of experimental structures for providing novel structural models. My analysis demonstrated that structures from SG accounted for an increasingly large portion of the novel leverage of the entire PDB, and SG was much more cost efficient in obtaining novel leverage than traditional structural biology.
In this talk, I am also going to cover a separate topic on using support vector machines to distinguish protein-coding RNAs from non-coding RNAs, a collaborated work with the FANTOM Consortium for large scale mouse cDNA sequencing and annotation.