A Git-like end-to-end ML life-cycle management system
View our publications in ICDE 2021 and VLDB 2023:
MLCask: Efficient Management of Component Evolution in Collaborative Data Analytics Pipelines
Enabling Secure and Efficient Data Analytics Pipeline Evolution with Trusted Execution Environment
MLCask is a Git-like end-to-end ML life-cycle management system. In real-world machine learning (ML) applications, maintaining an ML pipeline in a collaborative environment is significant and challenging. The costs of frequent retraining and asynchronous component update by different users need to be taken into consideration. MLCask supports both linear and non-linear version control semantics for efficient management of ML pipelines.
Project MLCask is a Git-like end-to-end ML life-cycle management system. MLCask supports both linear and non-linear version control semantics for efficient management of ML pipelines. The system abstracts an ML life-cycle with two key concepts: component and pipeline. A component refers to any computational units in the ML pipeline, while a pipeline is the minimal unit that represents an ML task.