Structural Genomics: Target Selection and Assessment
The ultimate goal of structural genomics (SG) is to solve
three-dimensional structures for all proteins in the genomes, either
experimentally or computationally. This is currently not feasible due to
technical limitations. Target selection, therefore, becomes a critical
strategic issue confronting structural genomics projects. Within the
context of selecting targets for Northeast Structural Genomics (NESG)
Consortium, a major structural genomics center funded by NIH, I have
developed two domain-dissecting methods and a protein family clustering
algorithm. Applications of these methods have led to over 300 protein
structures solved by NESG, most of which provided novel structural
information for large sequence families. To assess the impact of SG, I
introduced a measure, named ?novel leverage?, to evaluate the potential
of experimental structures for providing novel structural models. My
analysis demonstrated that structures from SG accounted for an
increasingly large portion of the novel leverage of the entire PDB, and
SG was much more cost efficient in obtaining novel leverage than
traditional structural biology.
In this talk, I am also going to cover a separate topic on using support
vector machines to distinguish protein-coding RNAs from non-coding RNAs,
a collaborated work with the FANTOM Consortium for large scale mouse
cDNA sequencing and annotation.