Brief Summary: CVE-2026-41066 — lxml XXE Vulnerability Enables Local File Disclosure via Default Parser Configuration

A short review of CVE-2026-41066, a high severity XXE vulnerability in the lxml Python library where default parser configurations allow local file reads. Includes patch analysis and mitigation guidance.

CVE Analysis

8 min read

ZeroPath CVE Analysis
ZeroPath CVE Analysis

2026-04-24

Brief Summary: CVE-2026-41066 — lxml XXE Vulnerability Enables Local File Disclosure via Default Parser Configuration
Experimental AI-Generated Content

This CVE analysis is an experimental publication that is completely AI-generated. The content may contain errors or inaccuracies and is subject to change as more information becomes available. We are continuously refining our process.

If you have feedback, questions, or notice any errors, please reach out to us.

[email protected]

Introduction

A default configuration oversight in the lxml Python library left two commonly used XML parser entry points vulnerable to XML External Entity (XXE) injection, allowing any attacker who can supply XML input to silently read local files from the server. With over 316 million downloads in the past month and more than 13.4 million daily downloads on PyPI, lxml is one of the most widely depended upon libraries in the Python ecosystem, making the blast radius of this issue substantial for any organization running Python services that parse XML.

Technical Information

Root Cause: Inconsistent Default Hardening Across Parser Subclasses

The vulnerability traces back to an incomplete security hardening effort. When lxml 5.0 was released, the main XMLParser() and HTMLParser classes had their resolve_entities default changed from True to 'internal', which tells the underlying libxml2 engine to only expand entities defined inline within the document's own DTD. However, two other parser entry points were not updated at the same time:

  • etree.iterparse()
  • etree.ETCompatXMLParser()

Both of these continued to ship with resolve_entities=True as their default parameter value in all lxml versions prior to 6.1.0. This inconsistency meant that any application using these specific entry points in their default configuration was silently exposed to XXE.

The following table summarizes the default behavior before and after the fix:

Parser Entry PointDefault Before 6.1.0Default After 6.1.0Risk if resolve_entities=True
etree.iterparse()TrueinternalLocal file disclosure
etree.ETCompatXMLParser()TrueinternalLocal file disclosure
XMLParser()internal (since 5.0)internalSafer default by design

How the Attack Works

When resolve_entities=True, libxml2 resolves all entity declarations during parsing, including external entities that use a SYSTEM or PUBLIC identifier pointing to a URI. An attacker who can supply XML input to an application using one of the vulnerable parsers can craft a payload that defines an external entity referencing a local file via a file:// URI. The parser expands the entity, embedding the contents of the targeted file into the parsed XML output.

The attack flow is straightforward:

  1. The attacker identifies an application endpoint that accepts XML input and parses it using etree.iterparse() or etree.ETCompatXMLParser() with default settings.
  2. The attacker submits XML containing a DOCTYPE declaration with an external entity definition, such as <!ENTITY e SYSTEM "file:///etc/hostname">, and references that entity in the document body.
  3. The parser resolves the external entity, reads the contents of the specified local file, and substitutes it into the parsed element text.
  4. The application returns or processes the parsed data, which now contains the file contents, completing the exfiltration.

The CVSS 3.1 vector string AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N reflects the characteristics of this attack: it is network exploitable, requires low complexity, needs no privileges or user interaction, and results in high confidentiality impact with no integrity or availability impact.

The no_network Trap

A critical nuance that defenders must understand: setting no_network=True on the parser does not mitigate this vulnerability. While no_network=True prevents the parser from making network requests (blocking http:// or https:// entity URIs), it does not prevent resolution of local file:// URIs or equivalent local path references. This is a common misconfiguration that can give a false sense of security.

Impact Considerations

The actual files an attacker can read depend on the runtime environment. The parsing process's effective user privileges determine filesystem access. Container boundaries, chroot jails, and mandatory access controls such as SELinux or AppArmor profiles can limit the scope of readable files. However, in many deployments, the parsing process runs with sufficient privileges to read sensitive configuration files, credentials, or application source code.

Patch Information

The vulnerability was patched in lxml 6.1.0 via commit ab431ea0b9a7357d968f1d1c5c614649e9aaf358, authored by Stefan Behnel (scoder) on April 10, 2026, and committed on April 12, 2026. The commit message captures the intent directly: "Set resolve_entities='internal' as default for all parser subclasses."

The patch is minimal by design, touching just two Cython source files with 9 additions and 7 deletions. It aligns the previously overlooked iterparse() and ETCompatXMLParser() code paths with the security posture that the main XMLParser already adopted in lxml 5.0.

In src/lxml/iterparse.pxi, the iterparse class had resolve_entities=True in both its docstring signature and its __init__() parameter list. Both were changed:

- compact=True, resolve_entities=True, remove_comments=False, + compact=True, resolve_entities='internal', remove_comments=False,

The documentation comment was also updated:

- - resolve_entities: replace entities by their text value (default: True) + - resolve_entities: replace entities by their text value + (default: 'internal' only)

In src/lxml/parser.pxi, the same default swap was applied in three places: the XMLParser docstring signature, the ETCompatXMLParser docstring signature, and the ETCompatXMLParser.__init__() parameter list:

- remove_blank_text=False, resolve_entities=True, + remove_blank_text=False, resolve_entities='internal',

The behavioral difference is straightforward but critical. When resolve_entities='internal', lxml instructs libxml2 to only expand entities that are defined inline within the document's own DTD (internal subset entities). External entity declarations, those with a SYSTEM or PUBLIC identifier pointing to a URI like file:///etc/hostname, are no longer resolved. This eliminates the local file read XXE vector while preserving the ability to use ordinary internal entity shortcuts like &copyright; that are defined within the XML document itself.

As the maintainer noted on the Launchpad bug tracker, this fix is a backwards incompatible change to the default argument, which is why it could not be shipped as a point release (e.g., 6.0.4) and instead required a new minor version.

Mitigation Options

For organizations that cannot immediately upgrade, the following table summarizes available controls:

Mitigation OptionStops Local File ReadsDeployment EffortNotes
Upgrade to lxml 6.1.0YesMediumFixes defaults and standardizes behavior across all parsers
Set resolve_entities='internal'YesLowVendor recommended workaround; restricts resolution to internal definitions only
Set resolve_entities=FalseYesLowCompletely disables entity resolution
Set no_network=TrueNoLowIneffective against local file disclosure via file:// URIs

Development teams should audit their codebases specifically for instances of iterparse and ETCompatXMLParser. CI/CD pipelines should be configured to flag pull requests that explicitly set resolve_entities=True when handling untrusted XML input.

Affected Systems and Versions

All versions of lxml prior to 6.1.0 are affected when using etree.iterparse() or etree.ETCompatXMLParser() with the default resolve_entities=True configuration. Specifically:

  • Affected versions: lxml < 6.1.0
  • Patched version: lxml 6.1.0 (released April 18, 2026)
  • Vulnerable configurations: Any application that parses untrusted XML using etree.iterparse() or etree.ETCompatXMLParser() without explicitly setting resolve_entities='internal' or resolve_entities=False
  • Not directly affected: Applications using only XMLParser() or HTML parsers, which were hardened in lxml 5.0

Given lxml's role as a transitive dependency in many Python packages, organizations should check not only for direct usage but also for indirect inclusion through dependency chains.

Vendor Security History

The lxml project has a track record of addressing complex parsing and sanitization edge cases. Reviewing recent advisories provides useful context on adjacent attack surfaces:

ComponentIssue SummarySeverityDate Published
lxml coreDefault configuration allows XXE to local files (CVE-2026-41066)High (7.5)April 18, 2026
lxml_html_clean<base> tag injection hijacks relative URLs (CVE-2026-28350)Moderate (6.1)March 2, 2026
lxml HTML CleanerCrafted SVG embedded scripts pass through sanitizerModerateDecember 12, 2021

The project maintains a formal security policy requesting private vulnerability disclosure with a 90 day remediation window. The rapid turnaround on CVE-2026-41066 demonstrates responsive maintenance. That said, the recurring pattern of sanitization bypasses suggests that organizations using lxml for security critical parsing or cleaning should maintain defense in depth strategies rather than relying solely on the library's defaults.

References

Detect & fix
what others miss

Security magnifying glass visualization