Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Karl Tamberg, Hayretdin Bahsi

Research output: Contribution to journalArticlepeer-review

Abstract

Despite various approaches being employed to detect software vulnerabilities, the number of reported software vulnerabilities shows an upward trend over the years. This suggests the problems are not caught before the code is released, which could be caused by many factors, like lack of awareness, limited efficacy of the existing vulnerability detection tools or the tools not being user-friendly. To help combat some issues with traditional vulnerability detection tools, we propose using large language models (LLMs) to assist in finding vulnerabilities in source code. LLMs have shown a remarkable ability to understand and generate code, underlining their potential in code-related tasks. The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies, allowing extraction of the best value from the LLMs. We leverage findings from prompting-focused research, benchmarking approaches like chain of thought, tree of thought and self-consistency for vulnerability detection use-cases. We provide an overview of the strengths and weaknesses of the LLM-based approach and compare the results to those of traditional static analysis tools. We find that LLMs can pinpoint more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores. However, LLMs are more prone to generate false positive classifications than traditional tools. The experiments are conducted using the Java programming language and the results should benefit software developers and security analysts responsible for ensuring that the code is free of vulnerabilities.

Original languageEnglish (US)
Pages (from-to)29698-29717
Number of pages20
JournalIEEE Access
Volume13
DOIs
StatePublished - 2025

Keywords

  • Benchmarking
  • LLM
  • large language models
  • prompting
  • software vulnerabilities
  • static code analyser

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study'. Together they form a unique fingerprint.

Cite this