Research Proposal模版1

My research interests concentrate on applying statistical data mining and machine learning techniques to system biology. I am especially interested in developing and applying statistical learning algorithms to identify patterns from large amounts of high dimensional data that reflect the states of the signal transduction system. As a pharmacologist, I am always intrigued by cellular signal transduction pathways and complexity of the system. Before my transition to the computational biology field two years ago, my research as a pharmacologist had mainly concentrated on individual pathways or protein molecules. It often occurred to me that the biomedical research of the last few decades had accumulated a wealth of knowledge at the molecular level, and it is time for one to take a step back and view the cellular signal transduction system as a full-fledged forest with most of the leaves painted colorfully. Advance in biological techniques, such as DNA microarray and high through-put screening, has produced large amounts of data regarding many aspects of cell. These data offer biologists opportunities to study the cellular system, but also pose challenges for conventional biologists. The transition from an experimental to computational biologist was quite natural for me because of my long-lasting interest and experience in scientific computing. Winning the National Library of Medicine training grant award provided me a great opportunity to extend my research ability in this direction. My study and research benefited greatly from the exceptionally excellent artificial intelligence and statistics community in Pittsburgh area.

My current research in computational biology falls in two major areas, which are described below:

The first is to develop a latent variable generative model, variational Bayesian cooperative vector quatizer (VBCVQ) model, to analyze the DNA microarray data and model the gene transcription regulation pathways. I have finished mathematical derivation and implementation of the model. In addition to its potential biological application, the model can be used in a wide range of applications, e.g. image processing, image compression and content-based image retrieval. The model closely simulates the gene expression regulation system. It can overcome some drawbacks of the commonly used existing techniques and address questions other models fail to address. Generally, the model has following advantages: (1) Data dimension reduction. (2) Identification of the key components of gene expression regulation pathways. (3) Capability of inferring the state of key components when given new microarray data. Such information can be useful for further exploring the mechanism of disease, drug effect or toxicity and the construction of diagnosis tools. Full Bayesian learning of the model allows us to address questions like ``what is the most efficient way to encode the information controlling gene transcription?'' or ``what are the key signal transduction components that control gene expression in a given kind of cell?'' Currently, I am testing the model with image encoding and mixed image separation. Once this stage finished, I will apply the model in microarray analysis.

The second area I am working on is to identify and predict the function of a protein motif using data mining approaches. The Gene Ontology is a set of annotations that describe the biological system in a hierarchical fashion. The current Gene Ontology database can also serve as a knowledge base to facilitate biological discovery because it contains a large amount of information regarding the molecular function, biological process and cellular location of proteins. To make effective use of such a knowledge base, a biologist would like to query the knowledge base in the following fashion: ``what is the protein motif that encodes a given molecular function?'' or ``what is the potential function of a conserved motif we identified?'' However, the current Gene Ontology database can not answer such queries due to the way of information being stored and the potential ambiguity caused by a conventional database query, even though the information is actually available. Working with collaborators at the University of Pittsburgh and Carnegie Mellon University, I have developed a general method to address the issue using data mining approaches. We have extracted a set of features that help to disambiguate the association of protein motifs and the Gene Ontology terms. Then, we trained a statistical classifier to determine whether a Gene Ontology term should be assigned to a protein motif, using probability to reflect the confidence or uncertainty. The method performs well when tested on known protein motifs from PROSITE. I will further extend the work in two directions: (1) To develop a system based on the method and make it available to the scientific community for data mining. (2) To study the evolution of protein sequence motifs by further exploiting the knowledge in Gene Ontology with hierarchical aspect models. These studies will help identify the key residues among the motifs, and allow us to address the questions like ``what amino acid plays the key role in proteins that act as kinase or reductase/oxidase?''

Overall, my training in both experimental and computational biology enables me to combine the knowledge of both fields without any communication gap. I foresee that my research will follow both directions of computational method development and biological discovery. As a computational biologist, I will extensively collaborate with both experimental biologists and computer scientists to solve interesting biological problems. My short term goal is to further extend my current research as described above. In the long run, I will continue to learn, identify, develop and apply computational methods in the fields of drug discovery, drug toxicity prediction and developing diagnostic tools based on biological data.
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值