Introduction
A single malformed API request can crash a production LLM inference server or potentially allow remote code execution on the host. CVE-2025-62164 exposes a critical memory corruption vulnerability in vLLM, a widely used open source engine for serving large language models. This issue directly affects organizations running vLLM versions 0.10.2 through 0.11.0 with the Completions API exposed.
About vLLM: vLLM is a high-performance inference and serving engine for large language models, adopted by research labs and industry for its throughput and efficiency. The project is maintained by a large open source community and became a PyTorch Foundation project in 2025, reflecting its central role in the LLM ecosystem.
Technical Information
The vulnerability is triggered when the vLLM Completions API endpoint receives user-supplied prompt embeddings in the form of serialized PyTorch tensors. These tensors are deserialized using torch.load() without sufficient validation. Starting with PyTorch 2.8.0, sparse tensor integrity checks are disabled by default, so malformed tensors with corrupted index arrays can bypass validation.
When vLLM calls .to_dense() on such a tensor, PyTorch dereferences attacker-controlled index arrays. If these indices are out of bounds, the operation writes outside the allocated buffer, causing memory corruption. This can crash the vLLM process (denial of service) and, under certain conditions, may allow remote code execution if the attacker can control the overwritten memory.
The attack requires only network access to the Completions API endpoint. By default, this endpoint does not require authentication. An attacker crafts a malicious sparse tensor, serializes it with torch.save(), base64-encodes the result, and submits it as the prompt_embeds parameter in an API request. The vulnerability is classified under:
- CWE-20: Improper Input Validation
- CWE-123: Write-what-where Condition
- CWE-502: Deserialization of Untrusted Data
- CWE-787: Out-of-bounds Write
Root Cause:
- vLLM accepts serialized tensor input from untrusted sources via the API
- torch.load() is used without validating tensor structure or indices
- PyTorch 2.8.0 disables sparse tensor integrity checks by default
- Malicious tensors can trigger out-of-bounds writes during to_dense()
Affected Systems and Versions
- vLLM versions 0.10.2 through 0.11.0 are affected
- Only configurations exposing the Completions API endpoint with prompt embedding support are vulnerable
- The vulnerability is present when running with PyTorch 2.8.0 or later (due to disabled integrity checks)
Vendor Security History
- vLLM has previously addressed security issues such as HTTP header injection and unsafe eval() usage through timely point releases (see v0.10.1.1)
- Security advisories are published promptly and patches are made available quickly
- The project demonstrates a mature approach to vulnerability disclosure and remediation



