Dealing with Confounders in Omics Analysis
Participants:
Wilson Goh, Wong Limsoon
Overview
Statistical feature selection on high-throughput omics data
(e.g., genomics, proteomics, and transcriptomics) is commonly
deployed to help understand the mechanism underpinning disease
onset and progression. In clinical practice, these features are
critical as biomarkers for diagnosis (see Glossary), guiding treatment,
and prognosis. Unlike monogenic disorders, many challenging diseases
(e.g., cancer) are polygenic, requiring multigenic signatures to
counteract etiology and human variability issues. Unfortunately, in
the course of analyzing omics data, we commonly encounter universality
and reproducibility problems due to etiology and human variability,
but also batch effects, poor experiment design, inappropriate sample size,
and misapplied statistics.
Current literature mostly blames poor experiment design and overreliance
on the highly fluctuating P-value. In this project, we explore a deeper
rethink on the mechanics of applying statistical tests (e.g. hypothesis
statement construction, null distribution appropriateness, and
test-statistic construction), and design analysis techniques that are
robust on omics data.
Selected Publications
- Wilson Wen Bin Goh, Limsoon Wong.
Why batch effects matter in omics data, and how to avoid them.
Trends in Biotechnology, 35(6):498--507, June 2017.
- Wei Zhong Toh, Kwok Pui Choi, Limsoon Wong.
Redhyte: A self-diagnosing, self-correcting, and helpful
hypothesis analysis platform.
Journal of Information and Telecommunication, 1(3):241--258, July 2017.
PDF
- Limsoon Wong.
Big data and a bewildered lay analyst.
Statistics and Probability Letters, 136:73--77, May 2018.
- Wilson Wen Bin Goh, Limsoon Wong.
Dealing with confounders in -omics analysis.
Trends in Biotechnology, 36(5):488--498, May 2018.
- Wilson Wen Bin Goh, Limsoon Wong.
Why breast cancer signatures are no better
than random signatures explained.
Drug Discovery Today, 23(11):1818--1823, November 2018.
- Wilson Wen Bin Goh, Limsoon Wong.
Turning straw into gold: Building robustness into
gene signature inference.
Drug Discovery Today, 24(1):31--36, January 2019.
- Wilson Wen Bin Goh, Limsoon Wong.
The birth of bio-data science: Trends, expectations, and applications.
Genomics, Proteomics, & Bioinformatics, 18(1):5--15, February 2020.
PDF
- Sung Yang Ho, Limsoon Wong, Wilson Wen Bin Goh.
Avoid oversimplifications in machine learning:
Going beyond the class-prediction accuracy.
Patterns, 1(2):100025, May 2020.
PDF
- Sung Yang Ho, Sophia Tan, Chun Chau Sze, Limsoon Wong, Wilson Wen Bin Goh.
What can Venn diagrams teach us about doing data science better?.
International Journal of Data Science and Analytics,
11(1):1--10, January 2021.
PDF
- Yaxing Zhao, Limsoon Wong, Wilson Wen Bin Goh.
How to do quantile normalization correctly for
gene expression data analysis.
Scientific Reports, 10:15534, September 2020.
PDF
- Sung Yang Ho, Kimberly Phua, Limsoon Wong, Wilson Wen Bin Goh.
Extensions of the external validation for checking learned model
interpretability and generalizability.
Patterns, 1(8):100129, November 2020.
PDF
- Chern Han Yong, Shawn Hoon, Sanjay De Mel, Stacy Xu, Jonathan Adam Scolnick,
Xiaojing Huo, Michael Lovci, Wee Joo Chng, Limsoon Wong.
MapBatch: Conservative batch normalization for single cell RNA-sequencing
data enables discovery of rare cell populations in a multiple
myeloma cohort.
Blood, 138(Suppl. 1):2954-2954, November 2021.
PDF
- Li Rong Wang, Limsoon Wong, Wison Wen Bin Goh.
How Doppelganger effects in biomedical data confound machine learning.
Drug Discovery Today, 27(3):678--685, March 2022.
- Wilson Wen Bin Goh, Chern Han Yong, Limsoon Wong.
Are batch effects still relevant in the age of big data?
Trends in Biotechnology, 40(9):1029--1040, September 2022.
- Wilson Wen Bin Goh, Reuben Jyong Kiat Foo, Limsoon Wong.
What can scatterplots teach us about doing data science better?.
International Journal of Data Science and Analytics,
17:111--125, 2024.
DOI:
https://doi.org/10.1007/s41060-022-00362-9.
- Wei Xin Chan, Limsoon Wong.
Accounting for treatment during the development or validation
of prediction models.
Journal of Bioinformatics and Computational Biology,
20(6):2271001, December 2022.
PDF
- Wei Xin Chan, Limsoon Wong.
Obstacles to effective model deployment in healthcare.
Journal of Bioinformatics and Computational Biology,
21(2):2371001, April 2023.
PDF
- Wilson Wen Bin Goh, Harvard Wai Hann Hui, Limsoon Wong.
How missing value imputation is confounded with batch effects and
what you can do about it.
Drug Discovery Today, 28(9):103661, September 2023.
PDF
- Lakshmi Alagappan, Jia En Chu, Joanna Huixin Chua, Jia Wen Ding,
Ronghui Xiao, Zhe Yu, Kun Pan, Untzizu Elejalde, Kevin Junliang Lim,
Limsoon Wong.
Class-specific correction and classification of NIR spectra of
edible oils.
Chemometrics and Intelligent Laboratory Systems,
241:104977, September 2023.
PDF
- Wilson Wen Bin Goh, Mohammad Neamul Kabir, Sehwan Yoo, Limsoon Wong.
Ten quick tips for ensuring machine learning model validity.
PLoS Computational Biology, 20(9):e1012402, September 2024.
PDF
- Xizi Luo, Andre Huikai Lin, Song Yi Amadeus Chi,
Limsoon Wong, Chowdhury Rafeed Rahman.
Benchmarking recent computational tools for DNA-binding
protein identification.
Briefings in Bioinformatics, accepted.
Selected Presentations
- Limsoon Wong.
Some issues that are often overlooked in big data analytics.
Invited talk at University of Malaya Symposium on Data Science,
Kuala Lumpur, Malaysia, 25 October 2016.
PPT
- Limsoon Wong.
A logician-engineer's adventures in data science and analytics.
Invited keynote at International Conference on Intelligent Computing,
Instrumentation & Control Technologies,
Vimal Jyothi Engineering College, Kannur, Kerala, India, 6 - 7 July 2017.
PPT
- Limsoon Wong.
Anna Karenina and the careless null hypothesis in omics data analysis.
Invited talk at IPM- Workshop on Systems Biology,
Institute for Research in Fundamental Sciences,
Tehran, Iran, 2 - 3 August 2017.
PPT
- Limsoon Wong.
Some simple tactics for deriving a deeper analysis of data.
Keynote at 4th International Conference on
Computational Science and Technology,
Kuala Lumpur, Malaysia, 29 - 30 November 2017.
PPT
- Limsoon Wong.
Big data and a bewildered lay analyst.
Talk at BUET-NUS Computer Science Workshop,
Dhaka, Bangladesh, 2 March 2018.
PPT
- Limsoon Wong.
From bewilderment to enlightenment: Logic in cancer research.
Plenary talk at 5th NCIS Annual Research Meeting (NCAM2018),
National University Hospital, Singapore, 3 August 2018.
PPT
- Limsoon Wong.
Dealing with confounders in omics data analysis.
Plenary talk at 14th International Conference on Intelligent Computing,
Wuhan, China, 15-18 August 2018.
-
Limsoon Wong.
From bewilderment to enlightenment: Logic in cancer research.
Keynote address at International Multi-Conference on Engineering
and Technology Innovation (IMETI 2018),
Taoyuan, Taiwan, 2-6 November 2018.
PPT
- Limsoon Wong.
Dealing with confounders in omics data analysis.
Invited talk at 9th International Conference on Computational
Systems Biology and Bioinformatics (CSBio2018),
Bangkok, Thailand, December 2018.
PPT
- Limsoon Wong.
From bewilderment to enlightenment in cancer research... hopefully.
Invited talk at GeCo Workshop on Challenges in Data-Driven Genomic
Computing, Villa del Grumello, Como, Italy, 6-8 March 2019.
PPT
- Limsoon Wong.
Some opinion and advice on machine learning in population-based
genomic medicine.
Invited talk at NUS-Cambridge Joint Research Symposium n Population-based
Genomic Medicine,
National University of Singapore,
24-25 October 2019.
PPT
- Limsoon Wong.
Conservative batch-effect correction for single-cell RNA-seq data
enables discovery of rare cell populations.
Invited keynote at 1st International and 10th National Iranian
Conference on Bioinformatics (ICB10), Kish Island, Iran,
22-24 February 2022.
- Limsoon Wong.
Some bad practices in data analysis and machine learning.
Invited keynote at 9th IEEE International Conference on Data
Science and Advanced Analytics (DSAA2022),
Shenzhen, China, 13-16 October 2022.
PPT
- Limsoon Wong.
Single-cell RNA-seq dataset integration without loss of
unique rare cell populations.
Invite talk at 5th China-ASEAN Forum on Health Cooperation,
Nanning, Guangxi, China, 26 - 27 May 2023.
MP4
- Limsoon Wong.
The hidden truths of principal component analysis.
Distinguished lecture at Hong Kong Baptist University,
18 October 2024.
Acknowledgements
This project is supported in part by
a Kwan Im Thong Hood Cho Temple Chair Professorship, and in part by
two AI Singapore grants (AISG-100E-2019-027 and AISG-100E-2019-028)
and a Singapore Ministry of Education tier-2 grant (MOE2019-T2-1-042).
Last updated: 16 November 2024, Limsoon Wong.