GEMINI is a GEneralizable Medical Information aNalysis and Integration Platform. The objective is to design and implement an integrative healthcare analytic system to address various kinds of healthcare problems. The systems integrated in GEMINI cover data acquisition, data cleaning, data integration, data processing, data analytics and data visualization steps of the big data analytic pipeline.
Figure 1: Overview of GEMINI
Our GEMINI platform is scalable, flexible and easy to use. When the raw data is first fed into our GEMINI platform, DICE, our data cleaning and integration system, cleans the raw data. DICE cooperates with CDAS, our crowdsourcing platform that selects the best question for doctors to answer in order to improve the data cleaning quality. In the next step, the cleaned data is input to epiC, a big data processing system which extracts relevant patients and features for the next-step processing. Then EMR-T system is used to transform the EMR data to the form that can be processed by machine learning/deep learning models. The output of EMR-T is then fed into SINGA, our deep learning platform, for analytics. In the meantime, the data can be fed into CohAna for cohort analysis. In the last step, we use iDat to visualize the analytic results. In terms of infrastructure, we design ForkBase to manage the underlying storage and the CPU-GPU cluster supports and accelerates the training of deep learning models.