Predict, Don’t Enumerate

admin

3 hours ago

A third of the way into a security-operations guide that Anthropic published in April 2026, wedged between a recommendation to patch CISA's Known Exploited Vulnerabilities list and a suggestion to automate your deployment pipeline is a small recommendation: “Use EPSS to prioritize the rest.” For anyone who has worked on a vulnerability backlog in the last decade, the sentence is an acknowledgment of a widely felt but often unspoken fact about security programs: They have become machine-scale problems of signal to noise.

EPSS (Exploit Prediction Scoring System) is a statistical model that takes a known software flaw, runs it through a set of signals about what attackers are actually doing across the internet, and returns a probability that the flaw will be exploited in the next 30 days. It isn’t an LLM, and it does no reasoning or prompt engineering. It predicts. The company endorsing it is the same company whose newest model can surface thousands of novel, exploitable vulnerabilities in production software, many of them two or three decades old, most of them still unpatched.

As far as we can tell, this is the first time a frontier AI lab has publicly endorsed a purpose-built predictive model as the right tool for a defensive problem. LLM labs usually recommend LLMs. That Anthropic did not is worth noting, but the recommendation itself isn’t news to the practitioners it’s aimed at. It’s a description of what they’ve been doing.

The quiet consensus

The volume problem isn’t new. Anyone running a scanner against a large enterprise estate in 2015 was already generating hundreds of thousands of findings per month. Anyone running one against a cloud environment in 2020 was generating millions. Enterprises have spent the better part of a decade staring at dashboards where the number of open critical findings was larger than the capacity of the team supposed to fix them. In other words, cybersecurity has become machine scale.

Risk-based vulnerability management, as a product category, has existed since around 2018. EPSS, as a public resource, has been usable since 2021. More than 120 vendors embed it today into their products. The field has had access to a predictive baseline for years.

What has been missing is an external justification to change the status quo recommendations from auditors, model risk management teams, and even boards. Auditors want a clear set of expectations, making grading more objective and therefore easier to evaluate. Compliance frameworks like CVSS (Common Vulnerability Scoring System) because CVSS is easy, but implementing something more efficient has historically required that aforementioned external push. A working CISO could tell you she had stopped treating every vulnerability scored a severity 9.8/10 by CVSS as an emergency in 2019, but she would also tell you she still kept CVSS in the report.

Anthropic's guidance is useful because it makes the private consensus public. Patch what you know to be exploited, then use EPSS above a threshold based on the team’s capacity or risk tolerance. DHS CISA’s practice of publishing known exploited vulnerabilities since November of 2021 is just additional proof that the existing methodologies were being overwhelmed by scale and lack of signal.

Why prediction, stated plainly

In 2014, at Black Hat, Dan Geer, then the chief information security officer of In-Q-Tel, asked the first principles question: Are vulnerabilities in software sparse or dense? Sparse meant finite, meaning every fix measurably shrank the attack surface. Dense meant weeds in a field. Geer could not answer the question because the data were not in.

Eight years later, Jonathan Spring at Carnegie Mellon's Software Engineering Institute tied vulnerability enumeration to the halting problem and showed, in theory, that for any sufficiently complex piece of deployed software, there are always more undiscovered flaws.

The AI-driven discovery results of the last 18 months have made the density argument impossible to wave off even in a compliance review. A 27-year-old bug in OpenBSD. A 16-year-old bug in FFmpeg that five million fuzzing runs never caught. Disclosed findings, by the developers' own accounting, are less than 1% of what has been found. But again, the volume was already a problem. With the coming release of its newest model, Mythos, Anthropic is telling teams to plan for an order of magnitude more findings over the next 24 months.

Static severity scoring can’t survive the volume problem, because it’s a human-scale solution for a machine scale problem. Neither can any process that treats every critical finding as an emergency. The threshold for action has to be probabilistic, measurable, and defensible. That’s what a predictive model is for, and that’s what working teams have been using in noisy large enterprise environments.

Pointing machines and knowing machines

Geer returned to his 2014 question in the summer of 2025, writing with Dave Aitel in Lawfare. The piece gives the industry a vocabulary for a distinction it has been fudging:

A vulnerability in the code isn’t automatically a threat. A buffer overflow is a hazard. It becomes a risk only if an attacker can exploit it reliably, in this environment, against these controls, through this traffic. Bugs are abundant but the ability to weaponize a particular bug against a particular target is much rarer.

The industry, they wrote, has built a pointing machine. It enumerates.

Even children learn early to point and name—but knowing the word “dog” doesn’t reveal whether the animal might bite. In cybersecurity, we’ve built systems that similarly point and name vulnerabilities without understanding whether they’re truly dangerous. By embracing AI solely for pattern recognition, we’ve created a powerful “pointing machine” that identifies possible threats but does not comprehend their actual impact. What we need instead is a “knowing machine,” capable of understanding how code functions within complex, real-world environments, recognizing not just hazards but the full context of how and whether those hazards might become genuine risks.

A knowing machine is a system that understands how code behaves in a particular environment and recognizes the context that turns a hazard into a risk. A predictive model is how you build a knowing machine. EPSS is the clearest public example: It covers every published CVE and is updated daily.

Global isn’t local

EPSS is a global model. It sees what attackers are doing across the whole of the internet. It picks up patterns in exploitation activity that severity scores never could. What it can’t see is any particular organization's environment. It doesn’t know which assets carry the data the business actually cares about. It doesn’t know what compensating controls are in place, where remediation is risky, or how your telemetry and history change the odds.

A 9.8 with a 97% global probability of exploitation and a 9.8 with a 0.1% probability are not the same animal. Neither are two organizations applying the same EPSS threshold to the same CVE on different assets. One has the vulnerable code path exposed to the internet, behind a web application firewall that doesn’t inspect the relevant protocol. The other has the same CVE on an internal system that accepts authenticated input from a single service account. A scanner can’t tell them apart. A global model can’t tell them apart. Their actual risk profiles are orders of magnitude apart.

Local context is where most security teams have been stuck the entire time, and where the next decade of the field is going to be fought.

What a local knowing machine actually requires

Pair a better pointing machine with a faster remediation engine and all you’ve done is increase the speed at which you produce churn, breakage and wasted effort. You’ll also spend a king's ransom in agent tokens fixing vulnerabilities that were never dangerous in your environment.

In contrast to an omniscient scanner, a local model trains on the specific environment being defended: asset inventory, application topology, reachability, deployed controls, attack telemetry observed on-site, and the history of the organization's own remediations and their outcomes. The model produces probabilities specific to the enterprise. Most organizations already have the inputs, scattered across CMDBs, endpoint agents, firewall logs, ticketing systems and scanner output. This context is precisely what attackers (whether they’re using good old fashioned metasploit or Mythos with an infinite budget) are lacking in their models. The context becomes an asymmetrical advantage for defenders, perhaps the only one that exists.

The policy shifts that actually matter

The interventions that will decide whether a security program survives the next 24 months aren’t purely technical. A CISO can put most of them in motion without buying anything.

Rewrite the SLA. Most vulnerability-management SLAs are organized by severity. Criticals in 15 days, highs in 30, mediums in 90. That structure was built for a world where the count of open criticals was small enough to matter. It’s now actively harmful, because it forces teams to spend the same effort on a 9.8 nobody is exploiting and a 7.5 that’s under active attack. SLAs should be rewritten in terms of probability of exploitation and asset exposure, not severity. A CISO who can’t get that past her GRC team can at least add a second tier that makes the probability-based cut enforceable alongside the severity-based one.

Change what the board sees. If the monthly security report counts the numbers of vulnerabilities, exposures or findings in different buckets (“critical,” “open past 30 days,” etc.), the organization is being managed to the wrong metric. The metric should be exploitability-weighted exposure over time, with a second line for predicted versus observed exploitation. Boards will accept this once somebody explains it. This beats showing them a number that has no relationship to risk and is growing exponentially as new LLM models are released. More to the point: A great team can do amazing volumes of remediation work, and risk can still rise because they’re measuring and remediating the wrong thing. An efficient, context-rich team can do far less work and meaningfully move the probability of an event down.

Invest in telemetry. The single most valuable instrument a security program can build is a feedback loop between what was prioritized and what was exploited. If the loop shows you were wrong, the model improves. If the loop does not exist, you will keep being wrong indefinitely (or just not being aware of misses).

Fix the compliance conversation. The reason CVSS survives is regulatory inertia. PCI, HIPAA, and most state breach-notification frameworks still reference severity. The CISOs who will come out of the next two years in the best shape are the ones who engage their auditors now, in writing, about what a probabilistic prioritization framework looks like under the existing rules.

Staff for the bottleneck, which isn’t scanning. The industry has spent a decade hiring people to find bugs. The bottleneck now is deciding which bugs matter, getting the fixes deployed, and measuring whether the prioritization was correct. The job descriptions should reflect this. A security-data engineer may be able to increase efficiency to meet SLAs more than increasing capacity would.

None of this requires a new product. All of it requires a CISO willing to say, out loud, that the old dogma is broken and that the new one will be managed by data and probabilities. That is the shift Anthropic's five-word sentence was really announcing. The technology is available and the models are here—both the LLM-based ones to find the vulnerabilities and the predictive knowing machines to prioritize efficiently.

Source link