Data Mining Algorithms for Pharmacogenomics
Participants:
Pauline Chen,
Coral Lai, Tze Yun Leong, Lin Li,
Guimei Liu, Yue Wang, Limsoon Wong.
Background
Human genome harbors millions of common single nucleotide polymorphisms
(SNPs) and other types of genetic variations. These genetic variations
play an important role in understanding the correlation between
genetic variations and human diseases and the body's responses to
prescribed drugs. The discovery of such genetic factors contributing to
variations in drug response, efficiency, and toxicity has come to be
known as pharmacogenomics. In this project, we explore several
pharmacogenomic-related applications
of database and datamining technologies.
Objectives
-
We focus on SNPs and how they affect drug response. We target ethnic
diversity as an important aspect. We propose to integrate drug-enzyme
interaction data, enzyme-SNP data, and various HapMap-type data into
a web-based bioinformatics tool that allows users to search for
possible variations that may be significant in determining drug response.
We aim to be able to search for drug-enzyme relationships and
to supplement current incomplete databases with text mining.
-
Genotyping all SNPs are very expensive.
Fortunately, adjacent SNPs are often not independent. It is thus desirable
to select a subset of SNPs that are sufficient to infer all the other SNPs.
These selected SNPs are called tag SNPs.
We propose algorithms to select tag SNPs
based on multi-marker correlations. We aim to be many times faster,
consume much less memory, and also reduce the number of selected tag SNPs,
than existing tag SNP selection algorithms. At the same time, we also
develop techniques to use tagging rules (discovered in the process of
tag SNP selection) to impute untyped SNPs at significantly higher
accuracy and sensitivity than existing methods.
-
The identification of disease-causing gene locations is an important topic
that has significant impact on patient management decisions. The process of
finding disease gene locations through comparisons of marker allele
frequencies between disease chromosomes and control chromosomes is known
as linkage disequilibrium mapping.
We propose algorithms to infer disease gene location.
We aim to consistently produce good predictive accuracies under different
conditions, including extreme conditions where the occurrence of
disease samples with the mutation of interest is very low and very noisy.
We also aim to be fast and model free.
Selected Publications
- Li Lin, Limsoon Wong, Tzeyun Leong, Pohsan Lai.
LinkageTracker: A Discriminative Pattern Tracking Approach to
Linkage Disequilibrium Mapping.
Proceedings of 10th International Conference on Database Systems
for Advanced Applications,
pages 30--42, Beijing, China, April 2005.
PDF
- Li Lin, Limsoon Wong, Tzeyun Leong, Pohsan Lai.
ECTracker---An Efficient Algorithm for Haplotype Analysis and
Classification.
Proceedings of 12th Triennial Medinfo 2007 Congress,
pages 1270--1274, Brisbane, Australia, 21-24 August 2007.
- Guimei Liu, Yue Wang, Limsoon Wong.
FastTagger: An Efficient Algorithm for Genome-Wide Tag SNP Selection.
BMC Bioinformatics, 11:66, February 2010.
PDF,
FastTagger V1.0
- Li Lin, Limsoon Wong, Tze-Yun Leong, Poh San Lai.
Efficient Mining of
Haplotype Patterns for Linkage Disequilibrium Mapping.
Journal of Bioinformatics and Computational Biology,
8(Suppl. 1):127--146, December 2010.
PDF
- Zhengkui Wang, Yue Wang, Kian-Lee Tan, Limsoon Wong, Divyakant Agrawal.
CEO: A Cloud Epistasis cOmputing model in GWAS.
Proceedings of 4th IEEE International Conference on
Bioinformatics & Biomedicine,
pages 85--90, Hong Kong, December 2010.
PDF
- Zhengkui Wang, Yue Wang, Kian-Lee Tan, Limsoon Wong, Divyakant Agrawal.
eCEO: An efficient Cloud Epistasis cOmputing model in genome-wide
association study.
Bioinformatics, 27(8):1045--1051, April 2011.
PDF,
Supplementary Data,
eCEO V1.0.
- Yue Wang, Guimei Liu, Mengling Feng, Limsoon Wong.
An Empirical Comparison of Several Recent Epistatic Interaction
Detection Methods.
Bioinformatics, 27(21):2936--2943, November 2011.
Corrigendum.
Bioinformatics, 28(1):147--148, January 2012.
PDF
- Yue Wang, Wilson Goh, Limsoon Wong, Giovanni Montana.
Random forests on Hadoop for genome-wide studies of multivariate
neuroimaging phenotypes.
BMC Bioinformatics, 14(Suppl 16):S6, October 2013.
PDF
Dissertations
- Li Lin,
Efficient Mining of Haplotype Patterns for Disease Prediction.
PhD thesis, School of Computing, National University of Singapore, 2008.
- Wang Yue,
"Efficient Computational Techniques for Tag SNP Selection,
Epistasis Analysis, and Genome-Wide Association Study".
PhD thesis, NUS Graduate School of Integrative Sciences and Engineering,
National University of Singapore, 2012.
-
Jieqi Pauline Chen,
SNP Data Integration and Analysis for Drug-Response Biomarker Discovery.
Honours Year Project Report, School of Computing,
National University of Singapore, 2009.
Selected Presentations
- Limsoon Wong.
Tag SNP Selection and Disease Gene Location Inference.
Invited talk at Hong Kong University, Hong Kong, 12 May 2009.
PPT
- Limsoon Wong.
A Few Simple Ideas for Efficient Tag SNP Selection.
Invited talk at Peking University,
Beijing, China, 22 July 2009.
- Limsoon Wong.
Epistasis Testing on the Cloud.
Invited talk at 22nd FAOBMB Conference,
Biopolis, Singapore, 5 October 2011.
PPT
Acknowledgements
This project is supported in part by
NUS ARF grant R-252-050-238-101/133 (Wong: 11/05 - 11/08),
SERC PSF grant 072-101-0016 (Liu, Wong: 8/07 - 12/10), and
an NGS scholarship (Wang).
Last updated: 13/8/13, Limsoon Wong.