Brief Summary: CVE-2026-41066 — lxml XXE Vulnerability Enables Local File Disclosure via Default Parser Configuration - ZeroPath Blog

Introduction

A default configuration oversight in the lxml Python library left two commonly used XML parser entry points vulnerable to XML External Entity (XXE) injection, allowing any attacker who can supply XML input to silently read local files from the server. With over 316 million downloads in the past month and more than 13.4 million daily downloads on PyPI, lxml is one of the most widely depended upon libraries in the Python ecosystem, making the blast radius of this issue substantial for any organization running Python services that parse XML.

Technical Information

Root Cause: Inconsistent Default Hardening Across Parser Subclasses

The vulnerability traces back to an incomplete security hardening effort. When lxml 5.0 was released, the main XMLParser() and HTMLParser classes had their resolve_entities default changed from True to 'internal', which tells the underlying libxml2 engine to only expand entities defined inline within the document's own DTD. However, two other parser entry points were not updated at the same time:

etree.iterparse()
etree.ETCompatXMLParser()

Both of these continued to ship with resolve_entities=True as their default parameter value in all lxml versions prior to 6.1.0. This inconsistency meant that any application using these specific entry points in their default configuration was silently exposed to XXE.

The following table summarizes the default behavior before and after the fix:

Parser Entry Point	Default Before 6.1.0	Default After 6.1.0	Risk if resolve_entities=True
`etree.iterparse()`	True	internal	Local file disclosure
`etree.ETCompatXMLParser()`	True	internal	Local file disclosure
`XMLParser()`	internal (since 5.0)	internal	Safer default by design

How the Attack Works

When resolve_entities=True, libxml2 resolves all entity declarations during parsing, including external entities that use a SYSTEM or PUBLIC identifier pointing to a URI. An attacker who can supply XML input to an application using one of the vulnerable parsers can craft a payload that defines an external entity referencing a local file via a file:// URI. The parser expands the entity, embedding the contents of the targeted file into the parsed XML output.

The attack flow is straightforward:

The attacker identifies an application endpoint that accepts XML input and parses it using etree.iterparse() or etree.ETCompatXMLParser() with default settings.
The attacker submits XML containing a DOCTYPE declaration with an external entity definition, such as <!ENTITY e SYSTEM "file:///etc/hostname">, and references that entity in the document body.
The parser resolves the external entity, reads the contents of the specified local file, and substitutes it into the parsed element text.
The application returns or processes the parsed data, which now contains the file contents, completing the exfiltration.

The CVSS 3.1 vector string AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N reflects the characteristics of this attack: it is network exploitable, requires low complexity, needs no privileges or user interaction, and results in high confidentiality impact with no integrity or availability impact.

The no_network Trap

A critical nuance that defenders must understand: setting no_network=True on the parser does not mitigate this vulnerability. While no_network=True prevents the parser from making network requests (blocking http:// or https:// entity URIs), it does not prevent resolution of local file:// URIs or equivalent local path references. This is a common misconfiguration that can give a false sense of security.

Impact Considerations

The actual files an attacker can read depend on the runtime environment. The parsing process's effective user privileges determine filesystem access. Container boundaries, chroot jails, and mandatory access controls such as SELinux or AppArmor profiles can limit the scope of readable files. However, in many deployments, the parsing process runs with sufficient privileges to read sensitive configuration files, credentials, or application source code.

Patch Information

The vulnerability was patched in lxml 6.1.0 via commit ab431ea0b9a7357d968f1d1c5c614649e9aaf358, authored by Stefan Behnel (scoder) on April 10, 2026, and committed on April 12, 2026. The commit message captures the intent directly: "Set resolve_entities='internal' as default for all parser subclasses."

The patch is minimal by design, touching just two Cython source files with 9 additions and 7 deletions. It aligns the previously overlooked iterparse() and ETCompatXMLParser() code paths with the security posture that the main XMLParser already adopted in lxml 5.0.

In src/lxml/iterparse.pxi, the iterparse class had resolve_entities=True in both its docstring signature and its __init__() parameter list. Both were changed:

- compact=True, resolve_entities=True, remove_comments=False,
+ compact=True, resolve_entities='internal', remove_comments=False,

The documentation comment was also updated:

-     - resolve_entities: replace entities by their text value (default: True)
+     - resolve_entities: replace entities by their text value
+       (default: 'internal' only)

In src/lxml/parser.pxi, the same default swap was applied in three places: the XMLParser docstring signature, the ETCompatXMLParser docstring signature, and the ETCompatXMLParser.__init__() parameter list:

- remove_blank_text=False, resolve_entities=True,
+ remove_blank_text=False, resolve_entities='internal',

The behavioral difference is straightforward but critical. When resolve_entities='internal', lxml instructs libxml2 to only expand entities that are defined inline within the document's own DTD (internal subset entities). External entity declarations, those with a SYSTEM or PUBLIC identifier pointing to a URI like file:///etc/hostname, are no longer resolved. This eliminates the local file read XXE vector while preserving the ability to use ordinary internal entity shortcuts like &copyright; that are defined within the XML document itself.

As the maintainer noted on the Launchpad bug tracker, this fix is a backwards incompatible change to the default argument, which is why it could not be shipped as a point release (e.g., 6.0.4) and instead required a new minor version.

Mitigation Options

For organizations that cannot immediately upgrade, the following table summarizes available controls:

Mitigation Option	Stops Local File Reads	Deployment Effort	Notes
Upgrade to lxml 6.1.0	Yes	Medium	Fixes defaults and standardizes behavior across all parsers
Set `resolve_entities='internal'`	Yes	Low	Vendor recommended workaround; restricts resolution to internal definitions only
Set `resolve_entities=False`	Yes	Low	Completely disables entity resolution
Set `no_network=True`	No	Low	Ineffective against local file disclosure via `file://` URIs

Development teams should audit their codebases specifically for instances of iterparse and ETCompatXMLParser. CI/CD pipelines should be configured to flag pull requests that explicitly set resolve_entities=True when handling untrusted XML input.

Affected Systems and Versions

All versions of lxml prior to 6.1.0 are affected when using etree.iterparse() or etree.ETCompatXMLParser() with the default resolve_entities=True configuration. Specifically:

Affected versions: lxml < 6.1.0
Patched version: lxml 6.1.0 (released April 18, 2026)
Vulnerable configurations: Any application that parses untrusted XML using etree.iterparse() or etree.ETCompatXMLParser() without explicitly setting resolve_entities='internal' or resolve_entities=False
Not directly affected: Applications using only XMLParser() or HTML parsers, which were hardened in lxml 5.0

Given lxml's role as a transitive dependency in many Python packages, organizations should check not only for direct usage but also for indirect inclusion through dependency chains.

Vendor Security History

The lxml project has a track record of addressing complex parsing and sanitization edge cases. Reviewing recent advisories provides useful context on adjacent attack surfaces:

Component	Issue Summary	Severity	Date Published
lxml core	Default configuration allows XXE to local files (CVE-2026-41066)	High (7.5)	April 18, 2026
lxml_html_clean	`<base>` tag injection hijacks relative URLs (CVE-2026-28350)	Moderate (6.1)	March 2, 2026
lxml HTML Cleaner	Crafted SVG embedded scripts pass through sanitizer	Moderate	December 12, 2021

The project maintains a formal security policy requesting private vulnerability disclosure with a 90 day remediation window. The rapid turnaround on CVE-2026-41066 demonstrates responsive maintenance. That said, the recurring pattern of sanitization bypasses suggests that organizations using lxml for security critical parsing or cleaning should maintain defense in depth strategies rather than relying solely on the library's defaults.

Brief Summary: CVE-2026-41066 — lxml XXE Vulnerability Enables Local File Disclosure via Default Parser Configuration

CVE Analysis

Experimental AI-Generated Content

Introduction

Technical Information

Root Cause: Inconsistent Default Hardening Across Parser Subclasses

How the Attack Works

The no_network Trap

Impact Considerations

Patch Information

Mitigation Options

Affected Systems and Versions

Vendor Security History

References

Follow ZeroPath

Related Articles

CVE Analysis

Brief Summary: CVE-2026-24303 — Critical Elevation of Privilege in Microsoft Partner Center

CVE Analysis

Microsoft Purview eDiscovery CVE-2026-26150: Brief Summary of a High Severity SSRF Vulnerability

CVE Analysis

Brief Summary: Microsoft Power Apps CVE-2026-32172 Uncontrolled Search Path Leading to Remote Code Execution

Detect & fix
what others miss

Product

Platform

Services

Solutions

By Team

By Industry

Company

Resources

By Company Type

Legal

Brief Summary: CVE-2026-41066 — lxml XXE Vulnerability Enables Local File Disclosure via Default Parser Configuration

On this page

ZeroPath finds bugs before advisories. Get a free scan.

Brief Summary: CVE-2026-41066 — lxml XXE Vulnerability Enables Local File Disclosure via Default Parser Configuration

CVE Analysis

Experimental AI-Generated Content

Introduction

Technical Information

Root Cause: Inconsistent Default Hardening Across Parser Subclasses

How the Attack Works

The no_network Trap

Impact Considerations

Patch Information

Mitigation Options

Affected Systems and Versions

Vendor Security History

References

Follow ZeroPath

Related Articles

CVE Analysis

Brief Summary: CVE-2026-24303 — Critical Elevation of Privilege in Microsoft Partner Center

CVE Analysis

Microsoft Purview eDiscovery CVE-2026-26150: Brief Summary of a High Severity SSRF Vulnerability

CVE Analysis

Brief Summary: Microsoft Power Apps CVE-2026-32172 Uncontrolled Search Path Leading to Remote Code Execution

Detect & fixwhat others miss

Brief Summary: CVE-2026-41066 — lxml XXE Vulnerability Enables Local File Disclosure via Default Parser Configuration

On this page

ZeroPath finds bugs before advisories. Get a free scan.

Detect & fix
what others miss