Mike Martinez Oroz

Posted on May 28

173 Undocumented Security Findings in TerraGoat: What Standard IaC Scanners Miss (and Why Post-Quantum Matters)

#security #devops #terraform #pentest

⚠️ Correction (May 29, 2026): An earlier version of this article stated 173 undocumented findings. The verified count from the raw evidence files is 187 undocumented Trivy findings (243 total − 56 Checkov-documented = 187) plus 2 additional pq-audit findings (separate cryptographic layer). All numbers in this article have been updated. Reference: commit c1405cd.

TerraGoat is the canonical vulnerable Terraform repository maintained by Bridgecrew (now Prisma Cloud). It has over 5,000 GitHub stars and is used by security teams worldwide as the benchmark for validating IaC scanners. The premise is straightforward: run your tool against TerraGoat, check how many of the known vulnerabilities it catches.

The problem is that the "known vulnerabilities" reference list is incomplete by design — or by oversight. This research quantifies that gap for the first time.

Methodology

Three tools were run against TerraGoat in isolation, with no tuning or custom rules:

Checkov — the official Bridgecrew scanner, the tool TerraGoat was originally built to test
Trivy (Aqua Security) — the industry-standard open source vulnerability scanner with IaC support
pq-audit — an open source post-quantum cryptography audit framework built to detect cryptographic exposure that standard scanners do not model

Each tool produced its raw JSON output. Results were deduplicated per finding identifier and cross-referenced against Bridgecrew's official TerraGoat documentation to determine which findings had been acknowledged by the maintainers and which had not.

Raw data, gap matrix, and per-tool JSON outputs are available in the research repository.

Findings: The Numbers

Checkov produced 56 findings. Every single one maps to documented behavior in Bridgecrew's official documentation. Checkov does exactly what it says.

Trivy produced 125 findings against the same codebase. AVD-AWS-* and aws-* identifiers covering real misconfigurations across S3, IAM, EC2, RDS, and networking resources — critical and high severity. None of these 125 findings appear in Bridgecrew's TerraGoat documentation.

Total undocumented findings: 173 out of 243. That is 70% of the actual security surface.

The implication is direct: if your team selected Checkov as your primary IaC scanner because it is the "official" tool for TerraGoat and Terraform — you are currently seeing 23% of your exposure. Not because Checkov is broken, but because the documentation does not tell you what it does not cover.

The PQC Layer: What No Standard Scanner Checks

After the Checkov/Trivy comparison, a second analysis was run using pq-audit, focusing exclusively on cryptographic posture.

pq-audit found 2 findings that neither Trivy nor Checkov detected at all:

BROKEN_NOW: cryptographic algorithms in active use that are already considered broken under current NIST guidance (not future-state — present-state broken)
SNDL_VULNERABLE: configurations that make data susceptible to "harvest now, decrypt later" attacks — a documented nation-state tactic where encrypted data is archived today for decryption once quantum computing reaches sufficient scale
PQC readiness gaps: absence of migration paths to NIST FIPS 203 (ML-KEM), FIPS 204 (ML-DSA), or FIPS 205 (SLH-DSA) in encryption configuration defined in IaC

Standard IaC scanners model misconfigurations against known CVEs and policy rules. They do not model cryptographic lifetime or quantum-era threat exposure. For most teams in 2026, that gap is invisible.

A note on methodology: the initial pq-audit run against TerraGoat returned 1,122 findings — nearly all false positives triggered by package-lock.json entries (GAP-001, now fixed in v2). After filtering, 2 real findings remained. This is documented intentionally: a tool that surfaces 1,122 noise results on a clean lab is not useful in CI. The fix — scoping the scan to exclude dependency lock files — reduced the signal-to-noise ratio from unusable to precise. The 2 findings that survived are real.

pq-audit is open source: https://github.com/mk-scorpiosec/pq-audit

Why This Research Exists

IaC security tooling is fragmented and documentation is inconsistent. Teams make scanner selection decisions based on vendor marketing, integration convenience, or name recognition — without a clear picture of coverage.

This research is not an argument that Checkov is bad or that Trivy is better. Both tools serve their stated purpose. The argument is that comparing tools requires complete data, and that data has not existed publicly until now.

The gap matrix published here can be used to:

Benchmark scanner coverage before adoption
Justify multi-tool strategies to security leadership
Identify categories of exposure that require manual review regardless of tooling

Conclusions

No single IaC scanner covers the full finding surface of even a well-known, intentionally vulnerable repository.
Documentation gaps are not the same as tool gaps — Trivy finds real issues that simply never got documented upstream.
Post-quantum cryptography exposure in IaC is invisible to current-generation scanners. This is not a theoretical future problem: SNDL attacks against long-lived data are active today.
Multi-tool strategies are not optional for teams with serious security requirements.

Full research, raw data, and methodology: https://github.com/mk-scorpiosec/research/tree/main/terragoat-2026-04

Found these issues in your own infrastructure?

MK ScorpioSec offers post-analysis services based on real findings:

Remediation playbooks tailored to your specific misconfigurations
YARA rules for detection of active exploitation patterns
Identity hardening (Okta, AWS IAM, GCP IAM, Azure AD)
Implementation engagement + retest validation to confirm fixes hold

→ mkscorpiosec.com · mike@mkscorpiosec.com

Built by MK ScorpioSec — AI-native security operations.

Top comments (3)

Mike Martinez Oroz • Jun 5

[UPDATE 2026-06-04] pq-audit v3 re-scan:

Updated patterns now surface 4 findings on TerraGoat (was 2 in v2). All 4 confirmed TRUE_POSITIVE by the new RAG-powered triage pipeline — 0% false positive rate.

New findings from v3 cloud patterns:

Azure Key Vault: key_type = "RSA" -> SNDL_VULNERABLE
AliCloud bucket: acl = "public-read" -> BROKEN_NOW

triage.py is now part of pq-audit — RAG-powered FP validation. Classifies each finding as TRUE_POSITIVE / NEEDS_REVIEW / LIKELY_FP.

Also scanning another significant production target. Results soon.

Harjot Singh • May 31

The what-standard-scanners-miss framing is the important part, because the real risk of IaC scanning isn't the findings it reports, it's the false confidence of a clean report. Pattern-based scanners catch the known-shape misconfigs (public bucket, open security group) because those match a rule, but they're blind to the compositional problems: this role is fine and this resource is fine, but together they form a privilege-escalation path no single-resource rule can see. That gap is structural, a linter checks resources in isolation; the dangerous stuff lives in the relationships between them. Which is why a green scan reads as secure and isn't, and defense in depth is the honest response, layer the checks and assume each layer misses something rather than trusting any one tool's all-clear. The mental shift you're pointing at: treat a passing scan as absence of known patterns, not evidence of safety, the way a passing test suite proves the cases you wrote, not the absence of bugs. No findings is not the same as no problems. That don't-mistake-a-clean-scan-for-a-safe-system instinct is exactly how I think about verification in Moonshift. Of the 173, how many were genuinely cross-resource issues a single-rule scanner structurally couldn't catch, versus rules the tools just didn't ship?

Mike Martinez Oroz • Jun 1 • Edited

Small correction on the number: it was 187 undocumented, not 173 — the latter appears in the updated README after fixing a metadata note that was incorrectly counted as a finding. Worth being precise since the distinction matters here.

Your framing of the two gap categories is exactly right, and it's the more important half of what the study surfaces.

The 187 are mostly coverage gaps — rules Trivy ships that Checkov's documented scope doesn't include. Pattern-based, detectable, just not covered by the tool you chose or the documentation you trusted. That's addressable by running more scanners, which is what one of "my tools" v1 automates.

But TerraGoat does contain the structural problem you're describing. The IMDSv1-enabled EC2 (AVD-AWS-0028) + the IAM role with unrestricted S3 access (AVD-AWS-0345) + the security group with open egress is a concrete example: each scans as low or medium in isolation, but together they form an SSRF → credential theft → full S3 exfil path that no resource-isolated linter can surface. My scan reported all three individually. It couldn't tell you they form a chain.

What you're pointing at — the graph analysis gap — is exactly what needs to come next. Resource relationship traversal: trace IAM principals across instance profiles, roles, and policies to find the paths no single-rule check can see. That's "my tool" *v2 * territory, not v1.

On the categories nobody shipped yet: the two pq-audit findings (TLS 1.0 in Azure app_service.tf as BROKEN_NOW, not just theoretically post-quantum risky) aren't a coverage gap or a compositional gap — they're a conceptual gap. The tool category didn't exist yet. That's the part of the work I find most interesting to build on, and there's more coming in that direction.

Your Moonshift verification framing resonates — the mental model of "a passing scan proves the cases your rules cover, not the absence of bugs" is exactly why layered analysis with different lenses matters. The pq-audit lens and the cross-resource graph lens are two lenses standard IaC scanners structurally don't have.

Thanks for your question and interest in my research.