ZeroPath Outperforms Mythos In Real World Test

When Anthropic's Mythos-powered Glasswing scanner re-analyzed curl, it surfaced one low-severity bug — months after ZeroPath helped Joshua Rogers ship fixes for nearly 170. The harness around the model matters more than the model itself.

Product

3 min read

John Walker
John Walker

2026-05-11

ZeroPath Outperforms Mythos In Real World Test

At the end of 2025, Security Researcher Joshua Rogers used ZeroPath and other AI-powered SAST scanners to analyze curl. The project fixed nearly 170 unique issues because of his work, and its maintainer Daniel Stenberg published a blog about how the experience changed his mind about AI-powered vulnerability reports.

Recently, Anthropic used its Mythos-powered vulnerability scanner Glasswing to take another look at curl. The net result according to Daniel Stenberg was just 1 new low severity vulnerability.

This is hopeful news for those of us who have been worried what Mythos' release will mean for appsec and vulnerability management… while the model is undoubtedly impressive, existing products are already delivering comparable results. This is not to minimize the challenges maintainers face keeping up with the the torrent of vulnerability reports they've been dealing with – they are real and serious – but the world has not ended so far, and datapoints like curl suggest that the Mythos alone is unlikely to make the problem orders of magnitude worse, except potentially by encouraging more people to scan their code using modern SAST tools.

This is not to say that Mythos won't be impressive… from what we've seen, it likely does deliver a substantial incremental bump in raw vuln-finding capability… but when it comes to discovering flaws reliably and exhaustively at scale, the harness around the model is really a bigger part of the story than the model itself.

We touched on this in an earlier post where we put Opus 4.6 through its paces detecting real CVEs in single C functions using a fairly naive single shot strategy mirroring what you might do in a coding agent or chat bot: showing it the sample and asking it if there were any vulns in it. The model found around 28.5% of the vulns in the dataset – impressive since every one of these made it past human review and into production – but it got these numbers with a massive false positive rate, and extremely unstable and inconsistent results run over run.

At ZeroPath, we're intimately familiar with these sort of quirks because of the work we've done to build a complex harness around commodity LLMs to mitigate them, in order to produce results that are:

  • Stable run over run
  • Low in false positives
  • Exhaustive and auditible (its the vulns from your entire codebase not a random selection)

When Mythos becomes public, we intend to try integrating it into our stack. When we do, it won't be surprising if it improves performance… but the curl comparison highlights that the model alone is the wrong thing to focus on. ZeroPath used 6 month old models in a harness we've spent years perfecting to achieve the same results as Mythos in a more naive configuration.

The "vulnpocalpyse" is already here. Current frontier models with strong harnesses are already leading to tens of thousands of vulnerability reports. Keeping up with this torrent has been stretching open source maintainers, but the world hasn't ended. Mythos' release doesn't change that, it just calls attention to something that's already happening.

Detect & fix
what others miss

Security magnifying glass visualization