Director
Institute of Data Science, National University of Singapore. email: idsbox@ids.nus.edu.sg Provost's Chair Professor Department of Computer Science, School of Computing. email: whsu@comp.nus.edu.sg fax: (65) 6779-4580 |
·
AI in Health Grand Challenge
·
Social Media Analytic
·
Collaborative Machine Learning
·
AI for Customer Service Automation
·
GeoVisualization of Spatio-Temporal Disease Spread
·
Flagship Project on Ocular
Imaging
·
SiRIAN:
The SiRIAN programme, funded by ASTAR SBIC, is focused on
linking retinal image features with demographic and clinical data for risk
prediction. This project involves collaboration between Centre of Eye Research
Australia (CERA), I2R and NUS
Spatio-temporal applications are gaining momentum
especially in the last few years. The availability of spatio-temporal
databases introduces the possibility of mining a new class of rules that
captures changes and movements. We have designed and developed new spatio-temporal rule mining algorithms that capture the
trends and behavior of spatio-temporal
data. Related publications can be found here. The code for
mining interval-based patterns is also available for download at https://dl.dropboxusercontent.com/u/15522119/Sigmod_code.zip.
A retina
image provides a window into what is happening inside the human body. In
particular, changes in the vascular structure of retina image have been shown
to accurately reflect the cardio-vascular states of the body. The project aims
to extract the vascular structure from the 2-dimensional digital retinal images
and tag them with customized XML tags to enable physicians to query the changes
that have occurred in the retina images. An automated spatio-temporal
miner will be designed to highlight the interesting changes that occur in these
vascular structures. Related publications can be found here.
This
is an I2R-SoC joint research project, funded by AStar,
aimed at developing new knowledge discovery technologies for biological and
clinical data. A suite of ``challenge'' databases and knowledge discovery
systems for selected problems in biological and clinical data analysis are
constructed. Among them, the work on protein-protein interaction network
reliability and motif finding, called IRAP, is available for free
download here.
Data
cleaning refers to a series of processes used to improve data quality. Existing
approaches in detecting and correcting defective data are highly manual,
tedious and incomplete, primarily focusing on a small subset of variables
within a database. In many biomedical applications, the linkages among various
data repositories such as biobank, clinical data,
risk factors, clinical outcomes and imaging data, provide a rich source of
knowledge for identifying likely erroneous data or records. This project will
adopt a holistic approach to leverage on the data linkages for the
identification of data artifacts. We will utilize
data mining techniques to discover the context, trend and correlation in the
data. The objective is to improve the quality of data for higher accuracy in
analysis and preventing percolation of errors.
RETINA
is a joint collaboration between the National Healthcare Group Polyclinics,
Images
are powerful means of conveying information to human. As a result, many
real-life applications involve processing and analyzing a large number of
images. In spite of the widespread use of images, there is no effective
techniques to mine interesting patterns from images. In this project, we
investigate the unique characteristics of image data and design algorithms to
automatically discover interesting image patterns. This project is funded by
the Academic Research Fund at National University of Singapore.
Data
mining has been recognized as an important technology for businesses
internationally. Locally, there are many companies in
My
publications from the DBLP Bibliography Server
AY2021/2022
· CS6220 Advanced Topics in Data Mining (Causal Inference)
AY2019/2020
· CS6216 Advanced Topics in Machine Learning (Knowledge Graphs)
AY2019/2020
· CS6216 Advanced Topics in Machine Learning (Knowledge Representation)
· CS2309 CS Research Methodology
· CS1010 Introduction to Programming
Methodology
· CS6208 Advanced Topics in Artificial
Intelligence
· CS6220 Advanced Topics in Data
Mining
· CS1010 Introduction to
Programming Methodology
· CS6220 Advanced Topics in Data
Mining
· CS1101Y Introduction to
Programming Methodology
· CS5228
Knowledge Discovery in Databases