The language used by molecules in cells to communicate with each other is diverse. Molecular communication is achieved, in many cases, by the conformational complementarities between communicating molecules along the chains of signaling pathways. To understand the molecular communication we study the structures of proteins involved in the signal transduction pathways associated with cell growth, cell cycle, sensory perception and chemotaxis. We are also interested in discovering and designing drugs that inhibit these proteins for therapeutic purposes.
An analysis of the genomic sequences of many organisms indicates that a large fraction of the encoded proteins cannot be assigned a particular molecular and/or cellular function based on the gene or protein sequence alone. The molecular (biochemical and biophysical) function of a protein is tightly coupled to its three-dimensional structure, and the three-dimensional structure, in combination with sequence information, may provide important insight into its molecular function. Thus, the structural study of the proteins encoded by an entire genome or a cellular process—an approach often called “Structural Genomics” or “Structural Proteomics”—can provide an important foundation for the understanding of the biological processes in the whole organism. As one of the NIH supported centers of the Protein Structure Initiative, Berkeley Structural Genomics Center is involved in an effort to determine a near complete structural complement of the proteomes of “minimal organisms,” Mycoplasma pneumoniae and Mycoplasma genitalium, which have fewer than 500 and 700 genes, respectively. Two of the objectives are to discover the “basis set” of the protein architecture that is required to sustain Life, and to understand how protein structures may have evolved from having simple to complex architecture to accomplish various tasks essential for a living cell.
As a computational counterpart of the Structural Genomics described above, five aspects of computational biology are being pursued: (1) Knowledge-based protein fold prediction, where we apply rapid text searching algorithms, developed by computer scientists, to protein structures to discover similarities and differences between two protein structures, (2) Global mapping of conformations of all proteins and nucleic acids to understand the conformational “landscapes” of these two classes of molecules, (3) “Global mapping of the protein universe” to classify all proteins into protein fold families and to discover their evolutionary relationship among the families, (4) “Remote homologue” detection to discover how a pair of proteins with no sequence similarities can have the same or very similar structures and the same or related molecular functions. We are developing computational methods to predict such remote homologues by combining several powerful computational algorithms for text searching on protein structural features: treating the protein structure (a “text”) as a collection of local structural features (“words”). If successful, this method will dramatically change the way functional annotation is done for all genes and proteins, and (5) Functional mapping of protein structure universe to map the molecular function of the proteins on to the protein universe map to generate a “dictionary” of protein structure vs. function.
· Return to Professor Sung-Hou Kim's homepage.