NUS Computing Associate Professor Ilya Sergey and students won an ACM SIGPLAN Distinguished Paper Award at the 44th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
PLDI is the leading forum in the field of programming languages and programming systems research, covering areas of design, implementation, theory, applications and performance.
Prof Sergey, along with PhD student Kiran Gopinathan and year 4 Computer Science undergraduate Mayank Keoliya, presented their paper titled “Mostly Automated Proof Repair for Verified Libraries,” at the conference on 21 June 2023. The paper received praise from PLDI reviewers, in particular, because “it targets the Achilles’ heel of Interactive Theorem Proving.”
“Mostly Automated Proof Repair for Verified Libraries” delves deep into one of the most pressing yet unsolved challenges in formal software verification: how to make sure that the formal proofs for once verified code remain valid as the code evolves—a problem known as proof repair.
Prof Sergey illustrates a scenario: “Imagine you have implemented a function (e.g., sorting an array) and verified it against some specification (e.g., that it returns a permutation of the original array that is sorted). Sometime later, you changed the program (e.g., to make it consume less memory). Its specification remained the same, but you want the proof to be automatically updated. However, keeping machine-checked proofs valid requires a lot of care.” While this is a familiar scenario to many programmers, it is a persistent problem that needs to be resolved before formal verification becomes truly mainstream.
To address this problem, the team looked into an interactive proof assistant, Coq. Coq can verify libraries against rigorous logical specifications by allowing developers to write so-called proof scripts. Although Coq can ensure that human-written proof scripts represent mathematically valid proofs of program correctness, it offers no automated support to keep them valid in case if one changes a program.
Through their study, the team discovered they can apply program synthesis and program testing techniques to facilitate the maintenance of Coq proofs for evolving code. “One can efficiently synthesise parts of the new proof and then test them for validity by converting lemmas (i.e., logical facts that the new proof most likely would rely upon) into testing procedures—the novel idea, which we call proof-driven testing” explained Prof Sergey.
The team then developed Sisyphus: a mostly automated proof repair tool for code written higher-order programming language OCaml and verified in Coq. Sisyphus was evaluated on 10 widely used OCaml programs from popular open-source repositories (which provided the history of those programs’ evolution) and returned with positive results: it was able to reconstruct the proofs of the programs’ updated versions with very small amount of human-provided “hints”. The implementation of Sisyphus has also been evaluated by PLDI Artefact Evaluation Committee and awarded the “reusable artefact” badge, which is the highest degree of artefact quality.
“The idea for this work came from our past effort on certified program synthesis, in which we have shown how to synthesise correct-by-construction programs along with their correctness proofs from formal specifications. Yet, those proofs would not be robust if someone decided to modify the program,” shared Prof Sergey.
“Implementing Sisyphus has been a truly Herculean effort, spearheaded by Kiran Gopinathan for more than a year, with the help of Mayank Keoliya. In particular, Kiran takes credit for discovering the idea of proof-driven testing and for implementing most of the Sisyphus’ core functionality, while Mayank was responsible for developing novel proof synthesis algorithms and evaluating the tool on the industrial case studies,” he added on the scale of the project.
Prof Sergey also leads the VERSE research lab at the School of Computing. The lab is looking for interns and collaborators who are interested in applying programming language techniques for building trustworthy software systems.