My lab studies the folding of proteins via database modeling and simulation. Proteins contain in their primary sequence all of the information necessary to fold spontaneously into their respective three-dimensional structures. How is that information encoded? By applying data modeling techniques and machine learning, we have identified correlations between sequence patterns and 3D structural patterns (I-sites). These correlations can be used to predict the protein's three-dimensional structure locally or globally, using the I-sites server.

Proteins fold according to a hidden set of rules that define the folding pathway, much like the rules that govern the arrangments of words in a sentence. We have developed a hidden Markov model to describe the grammatical structure of evolutionarilly-conserved sequence patterns within proteins in general. The grammatical model (HMMSTR) can be used to predict protein three-dimensional local structure, secondary structure, to identify protein-coding ORFs, or to design a sequence to fit a structure.

Recently we have extended HMMSTR to 2-dimensions. HMMSTR-CM predicts the likelihood of pairwise inter-residue contacts -- a contact map. Contact maps can be projected into three-dimensions using distance geometry methods such as those used to solve NMR structures. HMMSTR-CM contains a set of common sense rules that describe how secondary structure units can arrange themselves in 3D.

A new program, SCALI, has recently been developed to find all ways of arranging secondary structure units in space, and to model them as HMMs. To do this task, we had to find a computationally feasible way to do non-sequential alignments of protein structures. The results of SCALI are better than current methods that cannot find non-sequential similarities in proteins. The ability to predict the structure of a protein from its sequence would potentially permit the prediction of the functions of genes and thereby greatly enrich the information obtainable from the sequencing of genomes.

Home People Projects Publications Teaching Links Servers Downloads

My research interests....