NVIDIAScape: Breaking Container Isolation with CVE-2025-23266 in NVIDIA Container Toolkit

CVE-2025-23266 (NVIDIAScape) exposes a critical container escape flaw in NVIDIA Container Toolkit, allowing attackers to gain root on the host via OCI hook misconfiguration. We detail the technical root cause, PoC, detection, and patching strategies for this high-impact vulnerability affecting AI/ML and cloud GPU environments.
CVE Analysis

12 min read

ZeroPath Security Research

ZeroPath Security Research

2025-07-17

NVIDIAScape: Breaking Container Isolation with CVE-2025-23266 in NVIDIA Container Toolkit

NVIDIAScape: Breaking Container Isolation with CVE-2025-23266 in NVIDIA Container Toolkit

Introduction

A single misconfigured container can now take down an entire AI cluster. CVE-2025-23266, dubbed "NVIDIAScape," exposes a devastating flaw in NVIDIA's Container Toolkit that allows attackers to escape container isolation and seize root control of the host. In practical terms, this means a malicious container running on a vulnerable GPU node can compromise every workload, every model, and every dataset on that node—jeopardizing the backbone of modern AI and cloud infrastructure.

About NVIDIA and Its Container Toolkit: NVIDIA is the undisputed leader in GPU hardware and software, powering over 90% of AI, machine learning, and high-performance computing workloads worldwide. Its Container Toolkit is the de facto standard for enabling GPU acceleration in containerized environments, from Docker to Kubernetes, across public clouds and enterprise data centers. With millions of deployments and critical roles in research, finance, healthcare, and autonomous systems, vulnerabilities in this ecosystem have far-reaching consequences.

Technical Information

CVE-2025-23266 (NVIDIAScape) is rooted in the NVIDIA Container Toolkit's handling of Open Container Initiative (OCI) hooks—specifically, the createContainer hook. This hook is responsible for preparing the container environment and is executed as a privileged process on the host. The vulnerability arises because this hook inherits environment variables from the container image, a design flaw that allows attackers to manipulate the hook's behavior from within an untrusted container.

The attack centers on the LD_PRELOAD environment variable, which instructs the dynamic linker to load a specified shared library before any others. If an attacker can set LD_PRELOAD to point to a malicious .so file included in their container image, and then trigger the vulnerable hook, the hook will load and execute the attacker's code with root privileges on the host. This breaks the fundamental isolation boundary containers are supposed to enforce.

Root Cause:

  • The nvidia-ctk binary, invoked by the createContainer hook with the enable-cuda-compat argument, runs as root on the host.
  • It inherits environment variables (like LD_PRELOAD) from the container image.
  • An attacker can place a malicious .so file in the container and set LD_PRELOAD to reference it.
  • When the container starts, the hook loads the attacker's shared library, executing arbitrary code as root on the host.

This flaw is especially dangerous because it requires no special privileges beyond the ability to run a container on a vulnerable system—a common scenario in shared GPU clusters and cloud environments.

Proof of Concept

The following Dockerfile demonstrates how an attacker could exploit CVE-2025-23266 (NVIDIAScape):

FROM busybox ENV LD_PRELOAD=/proc/self/cwd/poc.so ADD poc.so /
  • FROM busybox initializes a minimal container image.
  • ENV LD_PRELOAD=/proc/self/cwd/poc.so sets the environment variable to load the attacker's shared library.
  • ADD poc.so / copies the malicious .so file into the container.

When this container is launched on a system with a vulnerable NVIDIA Container Toolkit, the nvidia-ctk hook will load and execute poc.so as root on the host, granting the attacker full control.

Reference: Wiz Research - NVIDIAScape

Patch Information

NVIDIA has released critical updates to address CVE-2025-23266:

  • Container Toolkit: Upgrade to version 1.17.8 or later.
  • GPU Operator (Linux): Upgrade to version 25.3.1 or later.

Mitigation for Legacy Container Runtime: Edit /etc/nvidia-container-toolkit/config.toml and add:

[features] disable-cuda-compat-lib-hook = true

Mitigation for GPU Operator with Helm: When deploying or upgrading, include:

--set "toolkit.env[0].name=NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES" \ --set "toolkit.env[0].value=disable-cuda-compat-lib-hook"

To manually specify the patched Container Toolkit version during GPU Operator upgrades:

--set "toolkit.version=v1.17.8-ubuntu20.04"

Reference: SecurityOnline.info - NVIDIA Plugs Critical Flaws

Detection Methods

Detecting exploitation attempts of NVIDIAScape requires close monitoring of container and host interactions:

  • Monitor Environment Variables: Watch for containers setting LD_PRELOAD to unexpected shared libraries. This is a key indicator of exploitation attempts.
  • Track Shared Library Loads: Use system call monitoring to detect execution of unauthorized .so files, especially those not present in standard images or host libraries.
  • Audit Environment Variables: Regularly inspect running containers for suspicious environment variable settings.
  • File Integrity Monitoring: Check for creation or modification of shared libraries in directories accessible to containers.
  • Log Analysis: Ensure detailed logging of process executions and environment variable usage within containers. Analyze for patterns matching exploitation techniques.
  • Anomaly Detection: Deploy behavioral analytics to flag deviations from normal container operations, such as unexpected library loads or environment variable changes.

Reference: Wiz Research - NVIDIAScape

Affected Systems and Versions

  • NVIDIA Container Toolkit: All versions up to and including 1.17.7 are vulnerable.
  • NVIDIA GPU Operator (Linux): All versions up to and including 25.3.0 are vulnerable.
  • Vulnerable Configurations: Any deployment where the enable-cuda-compat hook is active and the above versions are in use. This includes Docker, containerd, and Kubernetes environments leveraging NVIDIA GPU acceleration.

Vendor Security History

NVIDIA has faced multiple critical container escape vulnerabilities in recent years:

  • CVE-2024-0132: A TOCTOU race condition in the Container Toolkit.
  • CVE-2025-23359: Unsafe file operations leading to privilege escalation.

The company has responded promptly to disclosures, releasing patches within weeks. However, the recurrence of similar flaws suggests persistent architectural challenges in securing container privilege boundaries and hook execution contexts.

References

Source: This report was created using AI

If you have suggestions for improvement or feedback, please reach out to us at [email protected]

Detect & fix
what others miss