Our code repository currently contains the implementation of E3 - a programming framework for simplifying scalable heterogeneous data processing on large clusters. The key feature of E3 is that instead of enforcing the analyst to write the whole analytical program in a single interface (like MapReduce), it allows the analyst to reuse existing data analytical programs to process data sets of each type and coordinate those data processing programs to produce the final results.
E3 introduces an Actor-like concurrency programming model to achieve the above goal. An Actor is an independent execution unit which processes a fixed number of input messages and produces output messages for other actors to further process. Using E3, the analysts simply run data analytical programs inside a set of actors and coordinate them for parallel execution by message passing. Currently, we have a working E3 core runtime system and MapReduce extensions for running MapReduce programs inside actors.
The video shows a tool we developed to visualize the distribution of Starhub mobile users in Singapore in one day. The darker the color is, the more people are gathered. The tool can help us observe the spatial and temporal related features of user movement, so that assist in further researches such as travel mode detection, congestion prediction and so on.