When Containers Break the Rules: CVE-2025-23267 in NVIDIA Container Toolkit and the Perils of Link Following

A critical flaw in NVIDIA's Container Toolkit (CVE-2025-23267) allows attackers to escape container boundaries and tamper with host files via a link following bug in the update-ldcache hook. This post dissects the technical root cause, affected versions, and how to patch before attackers strike.
CVE Analysis

13 min read

ZeroPath Security Research

ZeroPath Security Research

2025-07-17

When Containers Break the Rules: CVE-2025-23267 in NVIDIA Container Toolkit and the Perils of Link Following

When Containers Break the Rules: CVE-2025-23267 in NVIDIA Container Toolkit and the Perils of Link Following

Introduction

Imagine a world-class AI training job, running on a multi-million-dollar GPU cluster, suddenly sabotaged because a single container image let an attacker overwrite critical files on the host. This is not a hypothetical: CVE-2025-23267, a high-severity vulnerability in NVIDIA's Container Toolkit, exposes precisely this risk. With a CVSS score of 8.5 and affecting all Toolkit versions up to 1.17.7, this flaw enables attackers to break container boundaries and tamper with host files—potentially resulting in data loss, denial of service, or even container escape.

About NVIDIA and the Container Toolkit: NVIDIA is the undisputed leader in GPU computing, powering everything from AI research and autonomous vehicles to cloud gaming and scientific simulations. The NVIDIA Container Toolkit is the backbone for running GPU-accelerated workloads in containers, used by enterprises, cloud providers, and research institutions worldwide. Its security is foundational to the modern AI and HPC ecosystem.


Technical Information

Vulnerability Mechanism

CVE-2025-23267 is a classic example of a CWE-59: Improper Link Resolution Before File Access ("link following") bug. The vulnerability resides in the update-ldcache hook of the NVIDIA Container Toolkit. This hook is responsible for updating the dynamic linker cache (ld.so.cache) inside containers to ensure GPU libraries are discoverable at runtime.

The problem arises because the hook invokes the host's ldconfig binary with the -r (chroot) option, pointing it at the container's root filesystem. However, the hook does not properly sandbox or validate symbolic links within the container image. If an attacker crafts a container image containing symbolic links that point outside the container's intended root (e.g., to /etc/ld.so.cache or other sensitive host files), the ldconfig process may follow these links and overwrite or tamper with files on the host.

Exploitation Flow

  1. Attacker crafts a malicious container image containing symbolic links targeting sensitive host files (e.g., /etc/ld.so.cache).
  2. Image is deployed and run in an environment using a vulnerable version of the NVIDIA Container Toolkit.
  3. update-ldcache hook triggers during container initialization, invoking ldconfig with the container's root as the chroot target.
  4. ldconfig follows the malicious symbolic links, writing to the host's files instead of the container's.
  5. Result: Host file tampering, potential denial of service, and possible container escape if the attacker can subvert the host's dynamic linker configuration.

Technical Root Cause

The root cause is the lack of symbolic link validation and insufficient sandboxing around the ldconfig invocation. The toolkit assumes the container's root filesystem is isolated, but malicious links can break this isolation, especially when hooks run with elevated privileges or in environments with shared mounts.

Attack Vectors

  • Multi-tenant Kubernetes clusters: Malicious users can submit images to shared GPU clusters, targeting host files.
  • CI/CD pipelines: Automated build systems running untrusted images risk host compromise.
  • Cloud environments: Public cloud GPU offerings using vulnerable Toolkit versions are at risk of cross-tenant attacks.

Affected Code Pattern

While the exact code is not published, the vulnerable pattern is:

// Pseudocode illustrating the issue func updateLdCache(rootfs string) error { // No validation of symlinks in rootfs cmd := exec.Command("ldconfig", "-r", rootfs) return cmd.Run() }

Patch Information

NVIDIA has released updates to address critical vulnerabilities in its Container Toolkit and GPU Operator software. Users are strongly advised to upgrade to the latest versions to mitigate these security risks.

Updated Versions:

  • NVIDIA Container Toolkit: Upgrade to version 1.17.8.
  • NVIDIA GPU Operator: Upgrade to version 25.3.1.

Mitigation Steps:

For users unable to immediately upgrade, NVIDIA provides a temporary mitigation by disabling the vulnerable enable-cuda-compat hook.

For NVIDIA Container Runtime in Legacy Mode:

Edit the /etc/nvidia-container-toolkit/config.toml file to include:

[features] disable-cuda-compat-lib-hook = true

For NVIDIA GPU Operator:

When deploying with Helm, include the following arguments:

--set "toolkit.env[0].name=NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES" \ --set "toolkit.env[0].value=disable-cuda-compat-lib-hook"

For GPU Operator versions prior to 25.3.1, deploy NVIDIA Container Toolkit 1.17.8 by adding:

--set "toolkit.version=v1.17.8-ubuntu20.04"

Note: For Red Hat Enterprise Linux or Red Hat OpenShift, use the v1.17.8-ubi8 tag.

Implementing these updates and mitigations is crucial to protect systems from potential exploits associated with these vulnerabilities.

Patch Sources:


Affected Systems and Versions

  • NVIDIA Container Toolkit: All versions up to and including 1.17.7 are vulnerable.
    • For CDI mode, only versions prior to 1.17.5 are affected.
  • NVIDIA GPU Operator: All versions prior to 25.3.1 are affected if using a vulnerable Container Toolkit version.
  • Vulnerable configurations:
    • Default installations (legacy and GPU Operator modes)
    • Systems where the update-ldcache hook is enabled (default in most deployments)

Vendor Security History

NVIDIA has a generally strong track record for security response, with rapid patch releases and detailed advisories. However, 2025 has seen two major container-related vulnerabilities (CVE-2025-23266 and CVE-2025-23267), highlighting the complexity and risk in GPU virtualization stacks. In this case, NVIDIA released a patch (v1.17.8) and advisory within a week of disclosure, reflecting a mature and responsive security posture. The company credits external researchers (Lei Wang and Min Yao, Nebula Security Lab, Huawei Cloud) for responsible disclosure.


References

Source: This report was created using AI

If you have suggestions for improvement or feedback, please reach out to us at [email protected]

Detect & fix
what others miss