What is MethodAtlas¶
The problem¶
Projects in Java, C#, or TypeScript routinely contain hundreds or thousands of test methods. A fraction of those tests explicitly verify security properties — correct authentication behaviour, cryptographic correctness, input validation, access control boundaries — but they live side-by-side with purely functional tests and are indistinguishable to anyone reading the test directory listing.
Without tooling, answering the question "which of our tests cover security requirements, and do they cover them completely?" requires a manual audit of every test file. That audit is time-consuming, error-prone, and does not stay current as the codebase evolves.
MethodAtlas automates the discovery and classification step: it reads source files lexically (without compiling them), identifies every test method (framework is detected automatically — JUnit 5/4/TestNG for Java, xUnit/NUnit/MSTest for C#, Jest/Vitest/Mocha for TypeScript/JavaScript), and asks an AI provider to decide whether each method is security-relevant, assign taxonomy tags, and provide a human-readable rationale.
Where it fits in the SSDLC¶
MethodAtlas is a testing-phase instrument in the Secure Software Development Life Cycle. It is not a replacement for static analysis, penetration testing, or threat modelling — it complements those activities by maintaining a continuously updated, machine-readable inventory of the security-test layer of a project.
flowchart LR
A([Plan]) --> B([Design]) --> C([Implement]) --> D([Test]) --> E([Deploy]) --> F([Operate])
MA[MethodAtlas]
MA -. classify tests\nemit SARIF · apply tags .-> D
style MA fill:#c5cae9,stroke:#283593
style D fill:#e8eaf6,stroke:#3f51b5
Typical integration points:
| Activity | MethodAtlas role |
|---|---|
| Nightly CI scan | Emit SARIF to GitHub Code Scanning; flag new unclassified tests |
| Sprint close (automated) | Run -apply-tags to write AI-generated @Tag and @DisplayName annotations directly to source |
| Sprint close (reviewed) | Export CSV, review and adjust tags/display_name columns, replay decisions with -apply-tags-from-csv |
| Security review | Export CSV as evidence of security-test coverage for auditors |
| Air-gapped audit | Manual AI workflow produces the same CSV without network access |
| Regression gating | Content hashes detect classes that changed since the last approved scan |
Why AI-assisted classification?¶
Manual classification of hundreds of tests is feasible once; keeping it current across active development is not. AI-assisted classification offers:
- Speed — an entire test class is classified in seconds.
- Consistency — the same taxonomy is applied uniformly regardless of who wrote the test or how it is named.
- Rationale — the
ai_reasonfield documents why a method was classified as security-relevant, making the classification defensible during review. - Automation — the tool runs in CI with no human intervention when an API provider is available, or via the manual workflow in restricted environments.
The taxonomy applied by MethodAtlas covers categories that align with the OWASP Testing Guide and common CWE groupings: authentication, authorisation, cryptography, input validation, session management, and others.
The two-phase design¶
MethodAtlas does not simply forward source files to an AI and ask "which tests are security-relevant?". Instead it separates the work into two distinct phases: a deterministic parsing step that establishes the structural ground truth, followed by an AI classification step that adds semantic meaning.
flowchart LR
subgraph p1["Phase 1 — Deterministic parser"]
direction TB
SRC[/"Source files\n(Java · C# · TypeScript)"/] --> AST["Language-specific AST\n/ parse tree"]
AST --> ML(["Method inventory\n(complete · stable · repeatable)"])
end
subgraph p2["Phase 2 — AI classification"]
direction TB
PRM["Prompt\ntaxonomy + method list + source"] --> AI[("AI provider")]
AI --> CL(["Classifications\nper method name"])
end
ML -->|"fixed method list"| PRM
CL --> OUT[/"CSV · SARIF · plain text"/]
style p1 fill:#f5f5f5,stroke:#9e9e9e
style p2 fill:#e8eaf6,stroke:#3f51b5
style ML fill:#e8eaf6,stroke:#3f51b5
style CL fill:#e8eaf6,stroke:#3f51b5
style OUT fill:#c5cae9,stroke:#283593
Phase 1 — deterministic method discovery¶
The parser reads each source file lexically, without compiling it, and extracts a precise list of test methods. The test framework is detected automatically from each file — JUnit 5 Jupiter, JUnit 4 (including @Theory), and TestNG for Java; xUnit, NUnit, and MSTest for C#; Jest, Vitest, and Mocha for TypeScript/JavaScript. This step is entirely rule-based: it finds every method carrying a recognised test annotation (or attribute, or function call), or any custom marker configured via -test-marker. The result is a canonical, repeatable inventory that does not depend on which AI model is used, which version is current, or whether the AI service is available at all.
This matters because AI models are not reliable at structural enumeration. Given a raw source file, a model may:
- silently skip a method that is hard to classify
- merge two methods into a single response entry
- hallucinate a method name that does not exist
- produce a different count on repeated invocations of the same prompt
None of these failure modes are possible when the method list is established by the parser first.
Phase 2 — AI classification against a fixed list¶
The prompt sent to the AI provider contains the taxonomy, the source text, and — critically — the exact list of method names the parser found. The model is instructed to classify only those methods and to return one entry per name. It cannot add entries or omit them without the mismatch being detectable.
This constraint produces several practical benefits:
| Property | Effect |
|---|---|
| Structural determinism | The same set of methods is always discovered regardless of model choice or prompt variation |
| Cost efficiency | The model does not spend tokens searching for test methods — that work is already done |
| Graceful degradation | If AI classification fails for a class, the structural data (method names, line counts) is still emitted with blank AI columns; the scan is never aborted |
| Auditability | The method inventory can be verified independently of the AI output, which is important when the CSV is used as audit evidence |
| Taxonomy control | The permitted tag set is injected explicitly into every prompt, so the model cannot invent categories outside the defined taxonomy |
Why not just use AI for everything?¶
Sending raw source trees to an AI and asking for a security-test inventory is superficially simpler but produces output that is difficult to trust in regulated contexts. There is no structural guarantee that every method was considered, no way to verify completeness without re-running the scan, and no stable output format if the model changes. MethodAtlas treats AI as a semantic enrichment layer on top of a foundation that is already correct by construction.
Regulatory context¶
Multiple standards and frameworks require evidence of security testing as part of the software development and assurance process. See the Compliance & Standards page for a framework-by-framework overview of how MethodAtlas supports those requirements.