Can LLMs Enable Verification in Mainstream Programming?

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Although formal methods are capable of producing reliable software, they have seen minimal adoption in everyday programming. Automatic code generation using large language models is becoming increasingly widespread, but it rarely considers producing strong correctness guarantees. In this study, we explore the ability of LLMs to produce verified code in three verification languages (Dafny, Nagini, and Verus). To do so, we use manually curated datasets derived from the state-ofthe-art Python benchmark, HumanEval. We also assess what types of information are sufficient to achieve good-quality results.

Related collections

Author and article information

Journal

Publication date Created: 18 March 2025

Article

ArXiV ID: 2503.14183

SO-VID: d4792f34-adad-4dfb-b228-ef02ad56b8bc

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Categories cs.SE cs.AI cs.PL

ScienceOpen disciplines: Software engineering,Programming languages,Artificial intelligence

Data availability:

ScienceOpen disciplines: Software engineering, Programming languages, Artificial intelligence

Can LLMs Enable Verification in Mainstream Programming?

Read this article at

Abstract

Related collections

Smart Contracts Programming Languages

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 23