ByteDance verl CVE-2026-6878: Unsafe eval() in ML Training Pipeline Enables Remote Code Execution via Indirect Prompt Injection — Quick Look with Public PoC

A brief summary of CVE-2026-6878, an unsafe eval() vulnerability in ByteDance's verl reinforcement learning framework that allows remote code execution through indirect prompt injection of training data. A public PoC exploit is available.

CVE Analysis

7 min read

ZeroPath CVE Analysis
ZeroPath CVE Analysis

2026-04-23

ByteDance verl CVE-2026-6878: Unsafe eval() in ML Training Pipeline Enables Remote Code Execution via Indirect Prompt Injection — Quick Look with Public PoC
Experimental AI-Generated Content

This CVE analysis is an experimental publication that is completely AI-generated. The content may contain errors or inaccuracies and is subject to change as more information becomes available. We are continuously refining our process.

If you have feedback, questions, or notice any errors, please reach out to us.

[email protected]

Introduction

An unsafe eval() call buried in a math grading utility within ByteDance's verl framework means that anyone who can influence a training dataset can achieve arbitrary code execution on the machine running model evaluation. For organizations training large language models with reinforcement learning, this vulnerability turns a routine reward scoring step into a remote code execution vector.

ByteDance's verl is an open source reinforcement learning training framework for LLMs, supporting algorithms like PPO and GRPO. With over 19,000 GitHub stars, it has become a widely adopted tool in the ML research community. Its role in the training pipeline makes it a high value target: compromise here can affect model integrity, training infrastructure, and downstream deployments.

Technical Information

Root Cause: Unsanitized eval() on Model Output

The vulnerability resides in the math_equal() function within verl/utils/reward_score/prime_math/grader.py, specifically at lines 298 through 301. This function is responsible for comparing a model's generated answer against a ground truth reference during reward scoring. When the ground truth answer is a matrix type problem (indicated by the presence of \begin{pmatrix} in the reference string) and the model's extracted prediction starts with [ and ends with ], the code passes the prediction string directly into Python's built in eval() function.

Here is the vulnerable code:

elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"): if isinstance(eval(prediction), list): # ← SINK: directly eval untrusted input pred_matrix = eval(prediction) # ← second eval

There is no input sanitization, no allowlisting of permitted characters, and no sandbox isolation. The eval() call will execute any valid Python expression contained in the prediction string.

Attack Flow: Indirect Prompt Injection to RCE

The exploitation path is indirect prompt injection, a technique where an attacker does not interact with the vulnerable system directly but instead poisons the data that flows into it. The attack proceeds as follows:

  1. Dataset Poisoning: The attacker embeds malicious instructions into a public math training dataset. These instructions are designed to coerce the language model into outputting a specific payload string when it encounters a matrix type math problem.

  2. Model Output Manipulation: During training or evaluation, the model processes the poisoned data and produces an output containing the attacker's payload, formatted to look like a list literal (starting with [ and ending with ]).

  3. Trigger Conditions Met: The reward scoring pipeline calls match_answer(), which extracts the prediction. For the vulnerable branch to execute, three conditions must be satisfied simultaneously:

    • The ground truth reference must contain \begin{pmatrix} (a matrix type problem).
    • The extracted prediction must start with [ and end with ].
    • The prediction must not contain an underscore _, because handle_base() would truncate the string and raise a ValueError.
  4. Code Execution: When math_equal() reaches the matrix comparison branch, it calls eval(prediction). The attacker's payload, embedded within the list brackets, executes as arbitrary Python code on the training server.

Bypassing Restrictions

The underscore restriction is notable because it prevents the use of Python dunder methods like __import__(). Attackers bypass this by using exec() instead, which does not require underscores. The match_answer() function also requires at least one digit character in the response, which is a trivial constraint to satisfy within a math problem context.

Additional eval() Instances

The vulnerability report also identifies another instance of unsafe dynamic code evaluation in the handle_pi() function at line 82 of the same grader.py file, suggesting a broader pattern of unsafe coding practices in this module.

Proof of Concept

A fully verified, end to end public PoC exploit exists for CVE-2026-6878. It was published by ZAST.AI in their vulnerability reports repository on GitHub and is referenced by both NVD and VulDB. A companion video demonstration is also available. The vendor issue (verl-project/verl#5331) remains open.

The PoC script (verl_rce.py) performs the following steps:

  1. Launches Ollama with the qwen2.5:14b-instruct model.
  2. Sends a sequence of prompt injection payloads (system override, fake math problem, few shot guidance) to induce the model to output a crafted answer string.
  3. The crafted RCE payload is:
RCE_PAYLOAD = 'exec("import os; os.system(\'echo PWNED1 by ZAST.AI > /tmp/poc/verl-rce-proof.txt\')")'
  1. The model's output (e.g., The answer is [exec("import os; os.system('echo PWNED1 by ZAST.AI > /tmp/poc/verl-rce-proof.txt')")]) is fed into verl's reward scoring pipeline via match_answer() then math_equal().
  2. Inside math_equal(), the matrix comparison branch calls eval(prediction), executing the embedded exec() call, which runs os.system() and writes a proof file to /tmp/poc/verl-rce-proof.txt.
  3. The script checks for the existence and content of the proof file to confirm successful RCE.

Reproduction steps:

# 1. Install Ollama and pull the model curl -fsSL https://ollama.com/install.sh | sh ollama pull qwen2.5:14b-instruct # 2. Clone verl and install git clone https://github.com/verl-project/verl.git cd verl pip install -e . # 3. Run the PoC python3 verl_rce.py

If the prompt injection strategies against the local LLM all fail, the script includes a deterministic fallback that manually constructs the malicious model output string (The answer is [<payload>]) and feeds it into math_equal() directly, confirming the eval() sink is exploitable regardless of the injection step. The PoC was verified on macOS with Ollama and verl 0.7.0.

Affected Systems and Versions

The vulnerability affects the following versions of ByteDance's verl framework:

VendorProductAffected Versions
ByteDanceverl0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7.0

The main branch was also confirmed to be vulnerable at the time of the initial analysis. Any deployment using verl up to and including version 0.7.0 that processes matrix type math problems through the reward scoring pipeline is affected.

Vendor Security History

ByteDance was added as a CVE Numbering Authority on January 13, 2026, for vulnerabilities in their own products. Despite this, the vendor was contacted early about CVE-2026-6878 but did not respond in any way. VulDB acted as the CNA for this vulnerability in the absence of vendor engagement. The open GitHub issue (verl-project/verl#5331) remains unresolved.

References

Detect & fix
what others miss

Security magnifying glass visualization