Graph-Based Protein Function Prediction
Participants: Hon Nian Chua, Zhihui Li,
Guimei Liu, Wing-Kin Sung, Limsoon Wong
Background
Although sequence similarity search has been proven useful in many cases,
it has fundamental limitations. First, only a fraction of newly discovered
sequences have identifiable homologous genes in the current databases.
Second, the most prominent vertebrate organisms in GenBank have only a
fraction of their genomes present in finished sequences. New bioinformatics
methods allow inference of protein function using ``associative analysis’’
of functional properties to complement the traditional sequence
homology-based methods. Associative properties that have been used to
infer function not evident from sequence homology include: co-occurrence
of proteins in operons or genome context; proteins sharing common domains
in fusion proteins; proteins in the same pathway; proteins with correlated
gene expression patterns; etc.
In this project, we investigate and develop graph-based methods for
inferring protein functions without sequence homology. Most approaches
in predicting protein function from protein-protein interaction data
utilize the observation that a protein often share functions with
proteins that interacts with it (its level-1 neighbors). However,
proteins that interact with the same proteins (i.e. level-2 neighbors)
may also have a greater likelihood of sharing similar physical or
biochemical characteristics. We are interested to find out how
significant is functional association between level-2 neighbors and
how they can be exploited for protein function prediction. We will also
investigate how to integrate protein interaction information with other
types of information to improve the sensitivity and specificity of
protein function prediction, especially in the absence of sequence homology.
Objectives
In this project, we investigate and develop graph-based methods for
inferring protein functions without sequence homology. In particular,
- We find out how significant functional association between
level-2 neighbors is. For example, what proportion of proteins
has no functional association with their immediate neighbors
but have functional association with their level-2 neighbors?
- We investigate how they can be exploited for protein function
prediction in a graph-based framework. For example, how well
level-2 neighbors are used for function prediction in simple
methods like majority voting? How much further improvement can
be made in more sophisticated methods that take into account
reliability information of protein interactions or protein
function annotations?
- We investigate how to integrate protein interaction
information with other types of information to improve the
sensitivity and specificity of protein function prediction,
in a graph-based framework, especially in the absence of sequence
homology. For example, how does reliability information of protein
interaction help? How does knowledge of proteins being co-localized
help? How does knowledge of frequency of co-occurrence of proteins
in scientific literature help? How to incorporate these types of
information?
At the end of the project, we expect to have developed a robust and
powerful system to predict protein functions, even in the absence of
sequence homology.
Selected Publications
- Hon Nian Chua and Wing-Kin Sung.
A better gap penalty for pairwise SVM.
Proceedings of 3rd Asia-Pacific Bioinformatics Conference,
Singapore, pages 11-20, 17-21 January, 2005.
PDF
- Hon Nian Chua, Wing-Kin Sung, and Limsoon Wong.
Exploiting indirect neighbours and topological weight to
predict protein function from protein-protein interactions.
Bioinformatics, 22:1623-1630, 2006.
PDF,
FSWeight V1.0 Software
- Kang Ning, Hon Nian Chua.
Automated Identification of Protein Classification and
Detection of Annotation Errors in Protein Databases Using
Statistical Approaches.
LNBI 3886: Proceedings of PAKDD
2006 Workshop on Knowledge Discovery in Life Science
Literature (KDLL2006),
pages 123--138, Singapore, April 2006.
- Hon Nian Chua, Wing-Kin Sung, Limsoon Wong.
Using Indirect Protein Interactions for the Prediction of
Gene Ontology Functions.
BMC Bioinformatics, 8(Suppl 4):S8, May 2007.
PDF,
FSWeight V2.1 Software
- Hon Nian Chua, Wing-Kin Sung, Limsoon Wong.
An efficient strategy for extensive integration of diverse biological
data for protein function prediction.
Bioinformatics, 23(24):3364-3373, December 2007.
PDF,
Supplementary Info,
FSWeight V2.2 Software
- Hon Nian Chua, Limsoon Wong.
Predicting Protein Functions from Protein Interaction Networks.
Biological Data Mining in Protein Interaction Networks,
edited by See-Kiong Ng and Xiao-Li Li,
chapter XII, pages 204--223,
Medical Information Science Reference, May 2009.
PDF
Dissertations
- Hon Nian Chua,
Graph-based methods for protein function prediction.
PhD thesis, Graduate School Integrative Sciences and Engineering,
National University of Singapore, Singapore, 2007.
- Zhihui Li,
Pubmed Abstract Processing for Protein Function Prediction.
Honours Year Project Report, Faculty of Science,
National University of Singapore, Singapore, 2008.
Selected Presentations
- Hon Nian Chua.
Function Prediction from Protein Interactions.
Invited talk at I2R-SOC Joint Lab Seminars.
NUS SOC, 16 August 2005.
- Limsoon Wong.
Protein Function Prediction From Protein Interactions.
Invited talk at 1st International Symposium on Languages
in Biology and Medicine,
KAIST, Daejon, Korea, 24-26 November 2005.
- Limsoon Wong.
Protein Function Prediction From Protein Interactions.
Invited talk at "Figuring Out Life: NUS-Karolinska
Joint Symposium on Application of Mathematics in Biomedicine",
Institute for Mathematical Sciences, Singapore, 28-29 November 2005.
PPT
- Hon Nian Chua, Wing-Kin Sung, Limsoon Wong.
Exploiting Indirect Neighbours and Topological Weight to
Predict Protein Function from Protein-Protein Interactions.
Invited keynote at BioDM2006, Singapore, 9 April 2006.
Proc. PAKDD 2006 Workshop on Data Mining for Biomedical
Applications (BioDM2006), Singapore,
9 April 2006, page 1.
PPT
- Limsoon Wong.
Guilt by Association of Common Interaction Partners.
Invited talk at IMS Workshop on BioAlgorithmics,
Institute for Mathematical Sciences, Singapore,
12-14 July 2006.
PPT
- Hon Nian Chua.
A Graph-Based Approach to Inferring Protein Function From
Heterogeneous Data Sources.
Invited talk at IMS Workshop on BioAlgorithmics,
Institute for Mathematical Sciences, Singapore,
12-14 July 2006.
- Hon Nian Chua.
Guilt by Indirect Functional Association.
Plenary talk at Annual Meeting on Automated Function Prediction
(AFP2006), San Diego, CA,
30 August - 1 September 2006.
- Limsoon Wong.
Guilt by Association: A Tutorial on Protein Function Inference.
Tutorial at 5th Asia-Pacific Bioinformatics Conference (APBC2007),
Hong Kong, 15-17 January 2007.
PPT.
- Limsoon Wong.
Protein Function Inference Enhanced by Text Mining.
Invited talk at Forum on Advanced NLP and Text Mining (T-FaNT),
Tokyo, Japan, 11-13 March 2007.
PPT.
- Limsoon Wong.
Guilt by Association: A Tutorial on Data Mining Techniques for
Protein Function Inference.
Tutorial at 11th Pacific-Asia Conference on Knowledge Discovery
and Data Mining (PAKDD 2007),
Nanjing, China, 22-25 May 2007.
- Limsoon Wong.
Two Applications of Text Mining in Bioinformatics:
Enhancing Protein Function Prediction and
Enhancing Drug Pathway Inference.
Invited talk at 7th Korea-Singapore Workshop on Bioinformatics & NLP,
Seoul, Korea, 15 February 2008.
PPT
- Limsoon Wong.
Guilt by Association.
Invited keynote at 1st Japan-Taiwan Young Researchers Conference on
Computational and Systems Biology,
Hsinchu, Taiwan, 9-11 March 2008.
- Limsoon Wong.
Guilt by Association as a Search Principle.
Invited keynote at 31st Annual International ACM SIGIR Conference,
Singapore, 20-24 July 2008.
PPT
- Limsoon Wong.
Guilt by Association: A Tutorial on Data Mining Techniques
for Protein Function Inference.
Invited tutorial at IPM-NUS Workshop on Analysis and Application of
Protein Interaction Networks,
Shahid Behesti University, Tehran, 17-18 November 2008.
PPT
- Limsoon Wong.
Guilt by Association of Common Interaction Partners.
Invited talk at IPM-NUS Workshop on Analysis and Application
of Protein Interaction Networks,
Shahid Beheshti University,
Tehran, Iran, 17-18 November 2008.
PPT
- Limsoon Wong.
"Guilt by Association" as a Search Principle.
Invited keynote at BioSearch08: HCSNet Next-Generation Search Workshop
on Search in Biomedical Information,
Queensland University of Technology, Brisbane, Australia, 30 November 2008.
- Limsoon Wong.
Challenges in Understanding Pathways, Predicting Complexes, &
Inferring Protein Function.
Invited keynote at First International Workshop on Neuroinformatics,
Bioinformatics, and Cognitive Science,
South China University of Technology, Guangzhou, China, 4-6 June 2009.
PPT
Acknowledgements
This project is supported in part by
a A*STAR AGS scholarship (Chua: 8/03 - 7/07), the
I2R-SOC Joint Lab on Knowledge Discovery
from Clinical Data (Liu, Sung, Wong: 7/03 - 6/07), and a
URC grant R-252-000-274-112 (Liu, Sung, Wong: 10/06 - 9/09).
Last updated: 9/6/09, Limsoon Wong.