Introduction
AI infrastructure in production environments faces a critical security challenge with the recent discovery of a stack buffer overflow in NVIDIA Triton Inference Server. This vulnerability enables unauthenticated remote attackers to potentially execute arbitrary code, disrupt inference workloads, and access sensitive AI models and data.
About NVIDIA and Triton Inference Server: NVIDIA is a global leader in GPU computing and artificial intelligence, with a market capitalization exceeding $2 trillion. Its Triton Inference Server is a widely used open-source platform for deploying AI models at scale, supporting major frameworks like TensorFlow, PyTorch, and ONNX. The platform is integral to enterprise AI pipelines, powering inference workloads across cloud, edge, and on-premises environments.
Technical Information
CVE-2025-23310 is a stack buffer overflow vulnerability in NVIDIA Triton Inference Server for both Windows and Linux. The root cause is the unsafe use of the alloca
function in the server's HTTP request handling code. Specifically, when processing HTTP requests with chunked transfer encoding, the server allocates stack memory based on untrusted input. By sending a specially crafted HTTP request with a large number of chunks (for example, a 3MB request), an attacker can cause the server to allocate excessive stack memory, resulting in a buffer overflow.
This vulnerability affects several critical API endpoints, including repository index requests, inference requests, model management, and shared memory registration. Authentication is optional and disabled by default for most endpoints, increasing the attack surface. The vulnerability allows attackers to trigger segmentation faults (crashing the server), and in some cases, achieve remote code execution, information disclosure, or data tampering. The issue was discovered by Will Vandevanter of Trail of Bits and is tracked as CWE-121 (Stack-based Buffer Overflow).
No vulnerable code snippets have been published in public sources as of this writing.
Patch Information
NVIDIA has addressed several critical vulnerabilities in the Triton Inference Server by releasing version 25.07. This update includes patches for issues that could potentially allow remote code execution, denial of service, information disclosure, and data tampering.
One of the key vulnerabilities, CVE-2025-23319, involved an out-of-bounds write in the Python backend. To mitigate this, the patch introduces stricter input validation and bounds checking to prevent unauthorized memory access. Additionally, the shared memory management has been enhanced to enforce limits and prevent excessive resource consumption, addressing CVE-2025-23320. Furthermore, error handling mechanisms have been improved to avoid unintended information disclosure, as seen in CVE-2025-23334.
Users are strongly advised to upgrade to Triton Inference Server version 25.07 to benefit from these security enhancements. The updated version is available on NVIDIA's GitHub repository. For detailed guidance on secure deployment practices, refer to NVIDIA's Secure Deployment Considerations Guide.
Patch references:
- https://nvidia.custhelp.com/app/answers/detail/a_id/5687
- https://thehackernews.com/2025/08/nvidia-triton-bugs-let-unauthenticated.html
Affected Systems and Versions
- NVIDIA Triton Inference Server for Windows and Linux
- All versions prior to 25.07 are affected
- Vulnerable configurations include default installations where authentication is not enabled for API endpoints
Vendor Security History
NVIDIA has a dedicated Product Security Incident Response Team (PSIRT) and a history of rapid response to critical vulnerabilities. Previous issues in Triton Inference Server include memory corruption vulnerabilities and unauthenticated attack vectors. The company’s open-source approach increases transparency but also exposes the codebase to greater scrutiny, as demonstrated by the independent discovery of multiple critical vulnerabilities by Trail of Bits and Wiz Research.