Computational Biology

The modern biology researcher faces vast amounts of data produced by high-throughput analytical technologies for DNA, RNA, and proteins.​

This rich and complex mix of data is also confounded by a variety of biological and nonbiological factors, which makes it difficult, inefficient, and inaccurate to draw the right research conclusions.

By understanding and exploiting properties of the underlying biology, instruments, technologies, and experiment designs, we develop advanced methods to process and analyse these large amounts of complex, biological data – which help solve problems in biology, biotechnology, and medicine.

What We Do

Develop effective and efficient algorithmic techniques used in the acquisition, storage, analysis, and dissemination of biological data.
Design innovative and elegant computational approaches to decipher the structure, function, and behaviour of cells, and explore how changes in these affect phenotypes.

Sub Areas

Our Research Projects

Protein folding and protein structure prediction

ZHANG Yang

Deciphering the structure and function of proteins is pivotal in modern biology and medicine. Our work focuses on pioneering computational methods, integrating advanced AI techniques and physics-based force fields, to precisely model protein structure and function. A key objective is uncovering the fundamental interplay between protein sequence, structure, and function.

  • Bioinformatics Algorithms

AI-based protein design and drug discovery

ZHANG Yang

Nature proteins exhibit limited structural folds and functions, shaped over billions of years of evolution. This project seeks to leverage AI and deep learning to craft novel protein sequences, surpassing natural constraints. The computationally designed proteins and peptides hold promise as drugs, offering innovative treatments for diverse human diseases, including cancer and Alzheimer's.

  • Bioinformatics Algorithms

Enabling more sophisticated proteomic profile analysis

WONG Lim Soon

Quantitative comparison of samples is central to proteomics. However, biomarkers identified in one batch are quite often not consistent and not reproducible in another batch of samples. We developed techniques based on biological networks to more reproducibly and consistently identify biomarkers and achieve more reliable proteomic-based diagnosis.


From iteration on multiple collections in synchrony to fast general interval joins

WONG Lim Soon

Synchrony iterator captures a programming pattern for synchronized iterations. It is a conservative extension that enhances the repertoire of algorithms expressible in comprehension syntax. In particular, efficient general synchronized iterations, e.g. linear-time algorithms for low-selectivity database non-equijoins, become expressible naturally in comprehensinon syntax.

  • TRL 4

Recovering missing proteins based on biological complexes

WONG Lim Soon

We propose a novel ranking strategy for missing-protein recovery based on protein complexes. We postulate that protein complexes provide a good context for making inference of a protein's presence and its abundance. Notably, it is applicable for predicting whether a protein is present even when there is only one sample.

  • TRL 4

Dealing with confounders in omics analysis

WONG Lim Soon

Universality and reproducibility problems are commonly encountered in analyzing omics data due to etiology and human variability, but also batch effects, poor experiment design, inappropriate sample size, and misapplied statistics. Here, we explore a deeper rethink on the mechanics of applying statistical tests, and design analysis techniques that are robust on omics data.


Transcription factor interaction prediction and classification

WONG Lim Soon

Regulatory mechanisms often involve several transcription factors (TF), binding together and attaching to the DNA as a single complex. But only a fraction of the regulation partners of each TF is currently known. We developed techniques for predicting the physical interaction between TFs, as well as for predicting the nature of their interactions (i.e. co-operative, competitive, or others).


Our Research Groups