Gregory Kang Ruey Lau
Department of Computer Science, National University of Singapore
I am a PhD student in the School of Computing at NUS, advised by Bryan Kian Hsiang Low and supported by the AI Singapore-CNRS@Create Descartes Joint PhD Scholarship.
My research adopts a data-centric approach to tackling critical bottlenecks in the practical deployment of AI systems. I anchor my work around a basic question: what is the impact of each data point on model behavior? By developing principled methods in algorithmic data selection and data provenance, I aim to lay the data-centric foundations for autonomous AI systems capable of driving the next generation of scientific discovery.
Previously, I completed my Bachelor of Science in Physics and in Economics at MIT, where I had worked with Wolfgang Ketterle, Eric Hudson and Dave Donaldson . I also obtained my Master of Finance at MIT Sloan and Master of Business Administration at Quantic. Before starting my PhD, I was a policymaker in the Singapore government, leading efforts in diverse areas such as data strategy, labour market policy, industry development, and social policy. I also spent some time as an entrepreneur, working on tech start-ups focused on education and career development.
Here is my CV. Please reach out if you are interested in collaborating!
news
| Jun 2, 2026 | My co-first authored paper, Watershed: A Unified Benchmark for End-to-End Data Provenance Evaluation, got accepted to the ICML 2026 AI4Good Workshop workshop. |
|---|---|
| May 26, 2026 | My co-first authored paper, TIGER: Bridging the Multimodal Reasoning-Access Gap via Modality Counterfactuals, got accepted to the ICML 2026 Foundations of Deep Generative Models Workshop (FoGen) workshop. |
| May 26, 2026 | The paper Rethinking Bayesian Optimization for Co-Optimizing LLM Training Configurations which I co-authored has been accepted to the ICML 2026 Decision-making From Offline Datasets to Online Adaptation (DEMO) Workshop as an oral paper. |
| Jan 26, 2026 | My co-first authored paper WaterDrum: Watermark-based Data-centric Unlearning Metric got accepted to ICLR 2026. |
| Jan 26, 2026 | The paper DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks which I co-authored has been accepted to ICLR 2026. |
| Dec 25, 2025 | My co-first authored paper README: Rapid Equation Discovery with Multimodal Encoders got accepted to the NeurIPS2025-AI4Science workshop. |
| Sep 20, 2025 | The position paper Position Paper: Uncover Scaling Laws for Large Language Models via Inverse Problems which I co-authored is accepted to Findings of EMNLP 2025 . |
| Sep 20, 2025 | My co-first authored paper, Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks, got accepted to EMNLP 2025. |
| Sep 6, 2025 | I am visiting the University of Washington from Sep-Dec 2025. |
| Jul 30, 2025 | I received the NUS School of Computing Research Achievement Award, which is awarded to PhD students who have achieved outstanding research performance over the past academic year. |
| Jul 9, 2025 | My co-first authored paper, README: Rapid Equation Discovery with Multimodal Encoders, got accepted to the ICML 2025 AI4Math Workshop workshop. |
| Jul 1, 2025 | My co-first authored paper, Uncertainty Quantification for MLLM, got accepted to the ICML 2025 Workshop on Reliable and Responsible Foundaation Models (R2-FM’25) workshop. |
| Jun 11, 2025 | My co-first authored paper, WaterDrum: Watermarking for Data-centric Unlearning Metric, got accepted to the ICML 2025 Workshop on Machine Unlearning for Generative AI (MUGen’25) workshop. |
| Jun 6, 2025 | I am visiting the University of Oxford Department of Statistics from Jun-Aug 2025. |
| Apr 9, 2025 | My co-first authored paper, PIED: Physics-Informed Experimental Design For Inverse Problems got accepted to the AI4X 2025 conference for oral presentation. |
| Mar 6, 2025 | The paper DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks which I co-authored has been accepted to the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM). |
| Mar 5, 2025 | My co-first authored paper, Uncertainty Quantification for MLLMs, got accepted to the ICLR 2025 Quantify Uncertainty and Hallucination in Foundation Models (QUESTION) workshop. |
| Jan 21, 2025 | My co-first authored paper PIED: Physics-Informed Experimental Design for Inverse Problems got accepted to ICLR 2025. |
| Oct 18, 2024 | I received the EMNLP 2024 D&I Award. |
| Oct 9, 2024 | My co-first authored paper, Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks, got accepted to the NeurIPS MINT 2024 workshop. |
| Sep 20, 2024 | My co-first authored paper, Waterfall: Framework for Robust and Scalable Text Watermarking, got accepted to EMNLP 2024. |
| Sep 20, 2024 | The position paper Data-centric AI in the Age of Large Language Models which I co-authored is accepted to Findings of EMNLP 2024. |
| Aug 5, 2024 | I received the NUS School of Computing Research Achievement Award, which is awarded to PhD students who have achieved outstanding research performance over the past academic year. |
| Jul 26, 2024 | PINNACLE was awarded the Best Paper award (out of 225 submissions) at the ICML2024 AI4Science workshop. |
| Jul 3, 2024 | My co-first authored paper, Waterfall: Framework for Robust and Scalable Text Watermarking, got accepted to the ICML2024-FM-Wild workshop. |
| Jun 27, 2024 | I was one of the 3 CS PhD students selected for the NUS School of Computing Teaching Fellowship Scheme award, which is given to those with excellent performance as a tutor. |
| Jun 19, 2024 | My co-first authored paper, Protecting Text IP in the Era of LLMs with Robust and Scalable Watermarking, got accepted to the ICML2024-GenLaw workshop. |
| Jun 17, 2024 | Two of my co-first authored papers got accepted to the ICML2024-AI4Science workshop: PINNACLE: PINN Adaptive ColLocation and Experimental points selection (oral) and PIED: Physics-Informed Experimental Design For Inverse Problems. |
| Jan 15, 2024 | My co-first authored paper PINNACLE: PINN Adaptive ColLocation and Experimental points selection got accepted to ICLR 2024 for spotlight presentation. |
| Dec 22, 2023 | I passed my PhD Qualifying Examinations. |
selected works
- ICML WorkshopProtecting Text IP in the Era of LLMs with Robust and Scalable WatermarkingIn ICML2024 Workshop on Generative AI and Law, 2024
- ICML WorkshopTIGER: Bridging the Multimodal Reasoning-Access Gap via Modality CounterfactualsIn ICML 2026 Foundations of Deep Generative Models Workshop (FoGen), 2026