I study the search, extraction, collection and cleaning, integration and fusion, analysis, anonymization and dissemination of data. This is a non-exhaustive list of projects that I have led.
Privacy Risk Assessment and Prevention
Data is a commodity. The availability of data from social and mobility networks is exploding due to the prevalence of mobile devices such as cell phones and tablets, connected things in the Internet of Things, and the corresponding service. Users and objects are constantly connected and interacting. Social and mobility traces are routinely collected at a large scale, for example, by cellular network operators, location-based services and location enabled social network platforms. While these data have great potential value, their careless trading is a threat to privacy and security. Yet careless trading of data threatens the privacy of individuals and of organizations. Are there ways to guarantee that data published or shared do not involuntarily breach privacy? The statistics and the computer science research and engineering communities have been aware of and have been working on the issue for a long time. Yet the problem has been exacerbated by the emergence of a data driven economy in the late 1990s and early 2000s. The seminal work of Latanya Sweeney in 2002 on k-anonymity triggered a series of research efforts to propose various notions of anonymity and to devise anonymization algorithms. These efforts culminated in the work of Cynthia Dwork and her co-authors in 2006 on differential privacy and continue today fuelled by the increasing needs to publish and share data.
We study privacy risk and we devise prevention, protection and access control mechanisms.
Environment Sustainability Solutions for Megacities
The "Environment Sustainability Solutions for Megacities programme" (E2S2) is a research programme supported under the Campus for Research Excellence and Technological Enterprise (CREATE) framework of the Singapore National Research Foundation. E2S2 is a collaboration between Shanghai Jiao Tong University and the National University of Singapore to study sustainable solutions for coupled problems targeted at already-stressed megacities and to serve as inputs for strategic policy making and near real-time environmental monitoring and response.
Liánchéng is a Community Cloud. It has been designed, implemented and deployed to service the community of scientists collaborating in the E2S2 programme. Liánchéng combines two main functionalities: a file sharing service and a Workflow-as-a-Service service model.
Each Liánchéng registered user has a private account on which she can upload, download, organize and manage files either through a Web interface or seamlessly via a synchronization agent on her personal computer. The main sharing mechanism consists in a simple directory Access Control List. When a user decides to share a directory, she decides which users and groups have which access to its contents. There are three mechanisms for accessing files from outside the system: basic access, public randomized access and published access. Under basic access authentication, only the file owner can access her files using her credentials. Under public randomized access and published access, the file is accessible without access control with a randomised URL or a human-readable URL, respectively. The URL can be revoked by its creator at any time.
The Workflow-as-a-Service (WaaS) service model, as we define it here, intends to fill the gap between the programmable IaaS and the key-in-hand SaaS. While IaaS and PaaS are fully customizable, they require computing skills and efforts that constitute unnecessary obstacles. SaaS, while easy and immediate to use, may not provide a sufficient level of customization. WaaS realizes a compromise by offering a workflow language to compose domain specific operators. A user interactively designs a workflow in a Web graphical editor. The interaction and visualization are reminiscent of that of Yahoo Pipes and other graphical workflow design software. Workflows are first class objects and are stored, managed and shared in the same way as files. A Liánchéng workflow is a directed acyclic graph whose vertices represent operators and whose edges represent data flow. An operator can have an arbitrary number of parameters and has at least one input or output interfaces that we call a hook. By convention, the input hooks are drawn above and the output hooks are drawn below. The user drags connectors from one hook to the next. A workflow can be abstracted into a single operator for convenience and reuse. For each application domain that Liánchéng supports, a toolbox of operators is available. For instance, Liánchéng supports operators from the public Mothur Toolbox for metagemonics applications, from the proprietary Optiliner and Assist toolboxes for maritime applications, etc.
Maritime Energy Efficiency
International shipping is a modest contributor to Global Greenhouse Gas (GHG) emissions, responsible for approximately 2.7% (i.e. 870 million tons) of global CO2 emissions in 2007 (UNEP, 2009). Yet, as the world economy's reliance on the global trade of goods, materials, and petroleum continues to rise, shipping sector of CO2 is expected to climb to between 2500 and 3650 million tons by 2050 (UNEP, 2009) the environmental and sociological objective of reducing emissions meets the economical and sociological objective of optimizing energy efficiency .
The Automatic Identification System (AIS) is a system of transceivers and receivers together with a data exchange protocol used on ships and by base stations for identification, location and information exchange. Vessels broadcast their identity together with information about their status, speed, heading, draught etc. over air waves. A simple VHF receiver is able to listen to AIS data streams for ships within around 40 nautical miles. IMO SOLAS Agreement and related agreements currently require more than 250,000 vessels large and smaller vessels to use AIS transceivers. Further regulations may generalize the practice. AIS was initially designed to communicate with nearby ships and base stations but the development of the Internet and the emergence of Satellite AIS make it a candidate global ship monitoring and tracking system as well as a potentially general medium for e-navigation information exchange.
We design and implement an infrastructure leveraging AIS and we devise modelling, simulation, analysis and optimization techniques and tools for emission monitoring, control and optimization of energy consumption.