Guide for Security Teams¶
MethodAtlas scans a project's test source code (Java, C#, or TypeScript/JavaScript) and produces a structured inventory of test methods that are relevant to security. This guide is written for security managers, compliance officers, and CISOs who receive MethodAtlas output and need to interpret it, act on findings, and incorporate results into audit evidence packages. It does not assume familiarity with any specific development technology or CI/CD tooling.
When you receive a CSV¶
When a development team or CI pipeline delivers a MethodAtlas output file, it will typically be a comma-separated values (CSV) file — a table where each row describes one test method and each column contains a specific piece of information about that method.
Open the file in a spreadsheet application (Microsoft Excel, Google Sheets, LibreOffice Calc) or a CSV viewer. The first row contains column headings. Each subsequent row corresponds to one test method found in the project's test code.
The column set varies depending on which options the engineering team used when running the scan. The tables below describe every column you may encounter.
For a complete column reference, see Output Formats.
What MethodAtlas produces¶
MethodAtlas reads a project's test source code (Java, C#, or TypeScript/JavaScript) and produces a structured inventory of test methods that are security-relevant — methods written to verify that the application correctly implements authentication, cryptography, input validation, access control, and similar security properties.
The output is a table. Each row describes one test method. The columns fall into two groups:
Structural data (always present)¶
These columns are derived directly from the source code, without any AI involvement. They are deterministic and do not change between runs unless the source changes.
| Column | Present when | Meaning |
|---|---|---|
fqcn |
Always | Fully qualified class name — the package/namespace and class that contains this test |
method |
Always | The name of the test method |
loc |
Always | Inclusive line count of the method declaration |
tags |
Always | Source-level tag values declared on the test (e.g. security, auth) — @Tag in Java, [Category]/[Trait] in C# |
display_name |
Always | Display name declared on the method (e.g. @DisplayName in Java, [Fact(DisplayName=…)] in C#); empty when absent |
content_hash |
-content-hash flag |
SHA-256 fingerprint of the enclosing class source — enables revision traceability |
AI enrichment (present when AI classification is enabled)¶
These columns are produced by an AI model that reads the test method body and classifies it according to a security taxonomy.
| Column | Present when | Meaning |
|---|---|---|
ai_security_relevant |
AI enabled | true if the AI determined this method tests a security property; false otherwise |
ai_display_name |
AI enabled | A human-readable description of what the test is verifying (e.g. SECURITY: auth — login rejects expired tokens) |
ai_tags |
AI enabled | Security taxonomy tags assigned by the AI (e.g. auth, crypto, injection) |
ai_reason |
AI enabled | The AI's rationale for its classification — one or two sentences explaining why the method is or is not security-relevant |
ai_interaction_score |
AI enabled | A measure of test quality; see below |
ai_confidence |
AI enabled + -ai-confidence |
The AI model's certainty in its classification, from 0.0 (uncertain) to 1.0 (certain) |
tag_ai_drift |
-drift-detect flag |
Compares source @Tag("security") annotation against AI classification; see below |
Understanding the interaction score¶
The ai_interaction_score column measures a specific weakness in test design
that is particularly dangerous in the security domain. It answers the question:
Does this test verify the outcome, or does it only verify that certain methods were called?
| Score | Meaning in plain English |
|---|---|
0.0 |
The test checks the actual result — a return value, a thrown exception, a database state. This is the strongest form of security test. |
1.0 |
The test only verifies that certain methods were called, without checking what they returned or what state they produced. |
| Values in between | Mixed: some outcome assertions alongside interaction-only checks. |
Why this matters¶
Consider a test named shouldStoreEncodedPassword. If that test only
verifies that the password encoder was called — but does not check that
the encoded value was actually stored, that the plaintext was discarded, and
that the stored form is used for authentication — then it provides no real
security evidence. The test will still pass even if the encoding logic was
removed from production code, as long as the encoder method was still invoked.
This pattern is sometimes called a "placebo test" or "Potemkin village test": it looks like a test, CI reports it as passing, code coverage tools count it as covered — but it does not actually verify the security property.
Standard code coverage tools cannot detect this. MethodAtlas can, because the AI reads the test body and understands what each assertion is actually checking.
Action threshold
A score of 1.0 warrants immediate developer review. A score above 0.8
warrants review before the next release. Scores below 0.5 on security-
relevant tests are generally acceptable.
Understanding the confidence score¶
The ai_confidence score reflects how certain the AI model is about its
security-relevance classification. This is separate from the test quality
measured by the interaction score.
| Range | Interpretation |
|---|---|
0.8 – 1.0 |
High confidence — the model is certain. Treat the classification as reliable. |
0.5 – 0.8 |
Moderate confidence — the model is fairly certain but human review is advisable, particularly if the method is a high-stakes security control. |
0.0 – 0.5 |
Low confidence — the AI is uncertain. Human review is required before including or excluding this method from an audit evidence package. |
Source-level tags versus AI classifications¶
MethodAtlas produces two independent kinds of security labels for each test method:
Source-level tags (tags column) are labels that a developer typed directly into the source file — @Tag("security") in Java, [Category("security")]/[Trait("Tag", "security")] in C#. They are factual — they represent what the developer intended when writing the test. They do not change unless a developer edits the source.
AI classifications (ai_security_relevant, ai_tags, ai_display_name, ai_reason columns) are produced by an AI model that reads the test method body and reasons about what the test actually does. The AI does not rely on the developer's intent; it reads the code.
These two sources will sometimes disagree, and both disagreement directions are significant:
| Scenario | Likely explanation | Action |
|---|---|---|
Developer labelled the test security; AI agrees |
Test is correctly labelled and correctly implemented | No action |
Developer labelled the test security; AI does not consider it security-relevant |
The annotation may be historical or aspirational; the test body does not actually exercise a security property | Developer review: tighten the test or remove the label |
| Developer did not label the test; AI considers it security-relevant | A security property is being tested but is not labelled — tag-based reports will miss it | Add @Tag("security") or document why the AI is wrong |
| Neither developer nor AI considers the test security-relevant | Ordinary functional or performance test | No action regarding security |
When drift detection is enabled (see below), the tag_ai_drift column makes these disagreements visible in a single field.
Understanding drift detection¶
When drift detection is enabled (-drift-detect flag), the output includes
a tag_ai_drift column for each method. It compares the @Tag("security")
annotation in the source code against the AI classification:
| Value | Meaning | Action |
|---|---|---|
none |
Source tag and AI agree. No action needed. | — |
tag-only |
Source has @Tag("security") but the AI does not consider it security-relevant. The annotation may be stale, or the AI may be wrong. |
Review the method; update the tag or add an override. |
ai-only |
AI considers the method security-relevant but the source has no @Tag("security") annotation. The test covers a security property but is not labelled as such. |
Consider adding @Tag("security") to the source, or document why the AI is incorrect. |
Drift is significant for audit purposes: dashboards and CI gates that rely on
source-level @Tag("security") annotations will silently miscount coverage if
drift exists and is not corrected.
Prioritising findings¶
The following framework helps security reviewers decide which findings to act on first:
| Priority | Condition | Recommended action |
|---|---|---|
| Critical | ai_security_relevant=true AND ai_interaction_score >= 0.8 |
Escalate to development team: the test is a placebo. Require outcome assertion before next release. |
| High | ai_security_relevant=true AND ai_confidence < 0.5 |
Manual review: the AI is uncertain. A qualified reviewer should determine whether the classification is correct. |
| Medium | tag_ai_drift = tag-only |
Review: the @Tag("security") annotation may be stale, or the AI taxonomy may not cover this security domain. |
| Medium | tag_ai_drift = ai-only |
Review: a security test may be unlabelled, which would cause it to be missed by tag-based reporting. |
| Low | display_name = "" (explicitly empty) |
Developer action: @DisplayName("") produces an unnamed test in all reports. Replace with a meaningful name to preserve the audit trail. |
| Low | ai_security_relevant=true, high confidence, low interaction score |
Verify taxonomy tags are correct. No immediate action required. |
Documenting accepted risks: the override file¶
When the security team reviews findings and reaches a conclusion about a specific test method — whether the AI is correct, incorrect, or the risk is accepted — those decisions should be recorded in the override file.
The override file is a YAML document stored in version control alongside the test source. It records:
- Methods where the AI classification is incorrect (false positives and false negatives).
- Methods where the security team has accepted the risk of a weak test (with documented rationale).
- Methods that the security team has confirmed as correctly classified.
Each entry supports a free-text note field that is never emitted in any
output — it is an internal annotation for the security team:
overrides:
# AI missed this security-critical test — correct the false negative
- fqcn: com.acme.crypto.AesGcmTest
method: roundTrip_encryptDecrypt
securityRelevant: true
tags: [security, crypto]
reason: "Verifies ciphertext integrity under AES-GCM — critical cryptographic test"
note: "Confirmed by security team 2026-04-24 — alice@example.com"
# Accepted risk: interaction-only test, but replacement is planned for next sprint
- fqcn: com.acme.auth.LegacyAuthTest
method: shouldCallEncoder
securityRelevant: true
tags: [security, auth]
note: "Accepted risk 2026-04-24 — outcome assertion to be added in sprint 42 — bob@example.com"
Every change to the override file is visible in the version control diff, creating a tamper-evident audit trail of all human classification decisions.
See Classification Overrides for the complete file format reference.
Viewing results in GitHub Code Scanning¶
When SARIF output is uploaded to GitHub Code Scanning, findings appear under Security → Code scanning in the GitHub repository. Each finding includes:
- The rule ID (taxonomy tag assigned by MethodAtlas).
- The affected file and line number.
- The AI rationale as the finding description.
- The interaction score and, when
-ai-confidenceis active, the confidence percentage — both embedded in the finding message text so they are visible without leaving the GitHub Security tab.
Findings can be filtered by rule, by file path, and by state (open, closed, dismissed). Dismissed findings are retained as an audit trail.
For organisations without GitHub Advanced Security, findings are delivered as inline annotations on pull request diffs — no separate dashboard is required.
Executive summary template¶
The following structure can be used as a basis for a security test coverage section in an audit evidence package or security review document:
Security test coverage summary¶
Prepared by: [role] Date: [date] Source revision: [git commit SHA]
Tool: MethodAtlas [version], scan run as part of [CI pipeline / release process]
| Metric | Value |
|---|---|
| Total test methods scanned | [n] |
| Security-relevant test methods (AI-classified) | [n] |
| High-confidence classifications (≥ 0.8) | [n] |
| Placebo tests requiring review (interaction score ≥ 0.8) | [n] |
| Human-reviewed overrides in force | [n] |
| Drift findings (tag vs AI disagreement) | [n] |
Security taxonomy coverage: [list of taxonomy tags present, e.g. auth, crypto, injection, session]
Open findings: [brief description of any critical or high priority items above, or "None"]
Artefacts retained: [file names of SARIF and CSV outputs, with content hashes]
Further reading¶
- AI Interaction Score — detailed explanation of the score and remediation guidance
- Classification Overrides — override file format and workflow
- Tag vs AI Drift — drift detection configuration and interpretation
- Output Formats — CSV, SARIF, and plain-text column reference
- Compliance & Standards — framework-specific mapping (OWASP SAMM, ISO 27001, NIST SSDF, DORA)