Introduction
A default configuration oversight in the lxml Python library left two commonly used XML parser entry points vulnerable to XML External Entity (XXE) injection, allowing any attacker who can supply XML input to silently read local files from the server. With over 316 million downloads in the past month and more than 13.4 million daily downloads on PyPI, lxml is one of the most widely depended upon libraries in the Python ecosystem, making the blast radius of this issue substantial for any organization running Python services that parse XML.
Technical Information
Root Cause: Inconsistent Default Hardening Across Parser Subclasses
The vulnerability traces back to an incomplete security hardening effort. When lxml 5.0 was released, the main XMLParser() and HTMLParser classes had their resolve_entities default changed from True to 'internal', which tells the underlying libxml2 engine to only expand entities defined inline within the document's own DTD. However, two other parser entry points were not updated at the same time:
etree.iterparse()etree.ETCompatXMLParser()
Both of these continued to ship with resolve_entities=True as their default parameter value in all lxml versions prior to 6.1.0. This inconsistency meant that any application using these specific entry points in their default configuration was silently exposed to XXE.
The following table summarizes the default behavior before and after the fix:
| Parser Entry Point | Default Before 6.1.0 | Default After 6.1.0 | Risk if resolve_entities=True |
|---|---|---|---|
etree.iterparse() | True | internal | Local file disclosure |
etree.ETCompatXMLParser() | True | internal | Local file disclosure |
XMLParser() | internal (since 5.0) | internal | Safer default by design |
How the Attack Works
When resolve_entities=True, libxml2 resolves all entity declarations during parsing, including external entities that use a SYSTEM or PUBLIC identifier pointing to a URI. An attacker who can supply XML input to an application using one of the vulnerable parsers can craft a payload that defines an external entity referencing a local file via a file:// URI. The parser expands the entity, embedding the contents of the targeted file into the parsed XML output.
The attack flow is straightforward:
- The attacker identifies an application endpoint that accepts XML input and parses it using
etree.iterparse()oretree.ETCompatXMLParser()with default settings. - The attacker submits XML containing a DOCTYPE declaration with an external entity definition, such as
<!ENTITY e SYSTEM "file:///etc/hostname">, and references that entity in the document body. - The parser resolves the external entity, reads the contents of the specified local file, and substitutes it into the parsed element text.
- The application returns or processes the parsed data, which now contains the file contents, completing the exfiltration.
The CVSS 3.1 vector string AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N reflects the characteristics of this attack: it is network exploitable, requires low complexity, needs no privileges or user interaction, and results in high confidentiality impact with no integrity or availability impact.
The no_network Trap
A critical nuance that defenders must understand: setting no_network=True on the parser does not mitigate this vulnerability. While no_network=True prevents the parser from making network requests (blocking http:// or https:// entity URIs), it does not prevent resolution of local file:// URIs or equivalent local path references. This is a common misconfiguration that can give a false sense of security.
Impact Considerations
The actual files an attacker can read depend on the runtime environment. The parsing process's effective user privileges determine filesystem access. Container boundaries, chroot jails, and mandatory access controls such as SELinux or AppArmor profiles can limit the scope of readable files. However, in many deployments, the parsing process runs with sufficient privileges to read sensitive configuration files, credentials, or application source code.
Patch Information
The vulnerability was patched in lxml 6.1.0 via commit ab431ea0b9a7357d968f1d1c5c614649e9aaf358, authored by Stefan Behnel (scoder) on April 10, 2026, and committed on April 12, 2026. The commit message captures the intent directly: "Set resolve_entities='internal' as default for all parser subclasses."
The patch is minimal by design, touching just two Cython source files with 9 additions and 7 deletions. It aligns the previously overlooked iterparse() and ETCompatXMLParser() code paths with the security posture that the main XMLParser already adopted in lxml 5.0.
In src/lxml/iterparse.pxi, the iterparse class had resolve_entities=True in both its docstring signature and its __init__() parameter list. Both were changed:
- compact=True, resolve_entities=True, remove_comments=False, + compact=True, resolve_entities='internal', remove_comments=False,
The documentation comment was also updated:
- - resolve_entities: replace entities by their text value (default: True) + - resolve_entities: replace entities by their text value + (default: 'internal' only)
In src/lxml/parser.pxi, the same default swap was applied in three places: the XMLParser docstring signature, the ETCompatXMLParser docstring signature, and the ETCompatXMLParser.__init__() parameter list:
- remove_blank_text=False, resolve_entities=True, + remove_blank_text=False, resolve_entities='internal',
The behavioral difference is straightforward but critical. When resolve_entities='internal', lxml instructs libxml2 to only expand entities that are defined inline within the document's own DTD (internal subset entities). External entity declarations, those with a SYSTEM or PUBLIC identifier pointing to a URI like file:///etc/hostname, are no longer resolved. This eliminates the local file read XXE vector while preserving the ability to use ordinary internal entity shortcuts like ©right; that are defined within the XML document itself.
As the maintainer noted on the Launchpad bug tracker, this fix is a backwards incompatible change to the default argument, which is why it could not be shipped as a point release (e.g., 6.0.4) and instead required a new minor version.
Mitigation Options
For organizations that cannot immediately upgrade, the following table summarizes available controls:
| Mitigation Option | Stops Local File Reads | Deployment Effort | Notes |
|---|---|---|---|
| Upgrade to lxml 6.1.0 | Yes | Medium | Fixes defaults and standardizes behavior across all parsers |
Set resolve_entities='internal' | Yes | Low | Vendor recommended workaround; restricts resolution to internal definitions only |
Set resolve_entities=False | Yes | Low | Completely disables entity resolution |
Set no_network=True | No | Low | Ineffective against local file disclosure via file:// URIs |
Development teams should audit their codebases specifically for instances of iterparse and ETCompatXMLParser. CI/CD pipelines should be configured to flag pull requests that explicitly set resolve_entities=True when handling untrusted XML input.
Affected Systems and Versions
All versions of lxml prior to 6.1.0 are affected when using etree.iterparse() or etree.ETCompatXMLParser() with the default resolve_entities=True configuration. Specifically:
- Affected versions: lxml < 6.1.0
- Patched version: lxml 6.1.0 (released April 18, 2026)
- Vulnerable configurations: Any application that parses untrusted XML using
etree.iterparse()oretree.ETCompatXMLParser()without explicitly settingresolve_entities='internal'orresolve_entities=False - Not directly affected: Applications using only
XMLParser()or HTML parsers, which were hardened in lxml 5.0
Given lxml's role as a transitive dependency in many Python packages, organizations should check not only for direct usage but also for indirect inclusion through dependency chains.
Vendor Security History
The lxml project has a track record of addressing complex parsing and sanitization edge cases. Reviewing recent advisories provides useful context on adjacent attack surfaces:
| Component | Issue Summary | Severity | Date Published |
|---|---|---|---|
| lxml core | Default configuration allows XXE to local files (CVE-2026-41066) | High (7.5) | April 18, 2026 |
| lxml_html_clean | <base> tag injection hijacks relative URLs (CVE-2026-28350) | Moderate (6.1) | March 2, 2026 |
| lxml HTML Cleaner | Crafted SVG embedded scripts pass through sanitizer | Moderate | December 12, 2021 |
The project maintains a formal security policy requesting private vulnerability disclosure with a 90 day remediation window. The rapid turnaround on CVE-2026-41066 demonstrates responsive maintenance. That said, the recurring pattern of sanitization bypasses suggests that organizations using lxml for security critical parsing or cleaning should maintain defense in depth strategies rather than relying solely on the library's defaults.
References
- NVD: CVE-2026-41066
- CVE Record: CVE-2026-41066
- GitHub Security Advisory: GHSA-vfmq-68hx-4jfw (lxml/lxml)
- GitHub Advisory Database: GHSA-vfmq-68hx-4jfw
- Launchpad Bug Report #2146291
- Patch Commit: ab431ea0b9a7357d968f1d1c5c614649e9aaf358
- Patch Commit API
- lxml on PyPI
- lxml Download Statistics
- Tenable: CVE-2026-41066
- lxml API Documentation: iterparse
- lxml Security Overview
- Related Advisory: lxml_html_clean base tag injection (GHSA-xvp8-3mhv-424c)



