Introduction
Remote attackers can crash or take control of production AI inference servers with a single HTTP request. CVE-2025-23311 exposes a critical stack-based buffer overflow in NVIDIA Triton Inference Server, a platform widely used to deploy machine learning models at scale. This vulnerability allows unauthenticated remote exploitation, threatening the integrity and availability of AI workloads in enterprise and cloud environments.
About NVIDIA Triton Inference Server: NVIDIA Triton Inference Server is an open-source software platform that streamlines the deployment of AI models on GPUs, CPUs, and other hardware. It is used by major enterprises and cloud providers to serve machine learning models for applications in healthcare, finance, autonomous vehicles, and more. Triton is a cornerstone of NVIDIA's AI software ecosystem, powering production inference for thousands of organizations globally.
Technical Information
CVE-2025-23311 is a stack-based buffer overflow vulnerability (CWE-121) affecting all versions of NVIDIA Triton Inference Server up to and including 25.06. The flaw resides in the HTTP request handling logic, specifically in the use of the alloca()
function for stack allocation.
The vulnerable code path is triggered when the server processes HTTP requests using libevent's evbuffer_peek()
function. This function returns the number of buffer segments (iovec structures) required to represent the HTTP request. The segment count is then used as the size argument to alloca()
, allocating an array of evbuffer_iovec
structures on the stack. If an attacker can influence the number of segments, they can control the size of the stack allocation.
HTTP chunked transfer encoding allows clients to send data in many small chunks. By crafting a request with thousands of small chunks, an attacker can cause libevent to fragment the request into a large number of segments. Trail of Bits researchers demonstrated that a 3MB HTTP request with many small chunks can exhaust stack space and trigger a segmentation fault, crashing the server. Each 6-byte chunk results in 16 bytes of stack allocation, so the amplification effect is significant.
Endpoints affected include:
- Inference requests
- Repository index requests
- Model load and unload operations
- Trace and logging configuration
- Shared memory registration
Authentication is optional and disabled by default for most endpoints, so unauthenticated attackers can exploit this issue remotely.
This vulnerability was present in the Triton codebase for over five years and was identified using both static analysis tools (such as Semgrep) and manual review. The root cause is the lack of bounds checking when allocating stack memory based on untrusted input.
Patch Information
NVIDIA has addressed multiple critical vulnerabilities in the Triton Inference Server by releasing version 25.07. This update mitigates issues that could lead to remote code execution, denial of service, information disclosure, and data tampering. Users are strongly advised to upgrade to version 25.07 to secure their systems. The updated version is available on the Triton Inference Server Releases page on GitHub. (nvidia.custhelp.com)
Affected Systems and Versions
- NVIDIA Triton Inference Server versions up to and including 25.06 are affected.
- All deployments exposing HTTP endpoints are vulnerable, especially those with authentication disabled (the default configuration).
- The vulnerability impacts multiple endpoints, including inference, repository index, and administrative operations.
Vendor Security History
NVIDIA has a history of addressing security issues in both hardware and software products. Previous vulnerabilities have affected GPU drivers, container runtimes, and AI software platforms. The company responded rapidly to coordinated disclosures for Triton Inference Server, releasing a patch within the coordinated timeline. However, the discovery of multiple related vulnerabilities in the same release cycle highlights the need for improved secure coding practices and more rigorous security testing in Triton's development lifecycle.