Guide for Security Teams¶

MethodAtlas scans a project's test source code (Java, C#, or TypeScript/JavaScript) and produces a structured inventory of test methods that are relevant to security. This guide is written for security managers, compliance officers, and CISOs who receive MethodAtlas output and need to interpret it, act on findings, and incorporate results into audit evidence packages. It does not assume familiarity with any specific development technology or CI/CD tooling.

When you receive a CSV¶

When a development team or CI pipeline delivers a MethodAtlas output file, it will typically be a comma-separated values (CSV) file — a table where each row describes one test method and each column contains a specific piece of information about that method.

Open the file in a spreadsheet application (Microsoft Excel, Google Sheets, LibreOffice Calc) or a CSV viewer. The first row contains column headings. Each subsequent row corresponds to one test method found in the project's test code.

The column set varies depending on which options the engineering team used when running the scan. The tables below describe every column you may encounter.

For a complete column reference, see Output Formats.

What MethodAtlas produces¶

MethodAtlas reads a project's test source code (Java, C#, or TypeScript/JavaScript) and produces a structured inventory of test methods that are security-relevant — methods written to verify that the application correctly implements authentication, cryptography, input validation, access control, and similar security properties.

The output is a table. Each row describes one test method. The columns fall into two groups:

Structural data (always present)¶

These columns are derived directly from the source code, without any AI involvement. They are deterministic and do not change between runs unless the source changes.

Column	Present when	Meaning
`fqcn`	Always	Fully qualified class name — the package/namespace and class that contains this test
`method`	Always	The name of the test method
`loc`	Always	Inclusive line count of the method declaration
`tags`	Always	Source-level tag values declared on the test (e.g. `security`, `auth`) — `@Tag` in Java, `[Category]`/`[Trait]` in C#
`display_name`	Always	Display name declared on the method (e.g. `@DisplayName` in Java, `[Fact(DisplayName=…)]` in C#); empty when absent
`content_hash`	`-content-hash` flag	SHA-256 fingerprint of the enclosing class source — enables revision traceability

AI enrichment (present when AI classification is enabled)¶

These columns are produced by an AI model that reads the test method body and classifies it according to a security taxonomy.

Column	Present when	Meaning
`ai_security_relevant`	AI enabled	`true` if the AI determined this method tests a security property; `false` otherwise
`ai_display_name`	AI enabled	A human-readable description of what the test is verifying (e.g. `SECURITY: auth — login rejects expired tokens`)
`ai_tags`	AI enabled	Security taxonomy tags assigned by the AI (e.g. `auth`, `crypto`, `injection`)
`ai_reason`	AI enabled	The AI's rationale for its classification — one or two sentences explaining why the method is or is not security-relevant
`ai_interaction_score`	AI enabled	A measure of test quality; see below
`ai_confidence`	AI enabled + `-ai-confidence`	The AI model's certainty in its classification, from `0.0` (uncertain) to `1.0` (certain)
`tag_ai_drift`	`-drift-detect` flag	Compares source `@Tag("security")` annotation against AI classification; see below

Understanding the interaction score¶

The ai_interaction_score column measures a specific weakness in test design that is particularly dangerous in the security domain. It answers the question:

Does this test verify the outcome, or does it only verify that certain methods were called?

Score	Meaning in plain English
`0.0`	The test checks the actual result — a return value, a thrown exception, a database state. This is the strongest form of security test.
`1.0`	The test only verifies that certain methods were called, without checking what they returned or what state they produced.
Values in between	Mixed: some outcome assertions alongside interaction-only checks.

Why this matters¶

Consider a test named shouldStoreEncodedPassword. If that test only verifies that the password encoder was called — but does not check that the encoded value was actually stored, that the plaintext was discarded, and that the stored form is used for authentication — then it provides no real security evidence. The test will still pass even if the encoding logic was removed from production code, as long as the encoder method was still invoked.

This pattern is sometimes called a "placebo test" or "Potemkin village test": it looks like a test, CI reports it as passing, code coverage tools count it as covered — but it does not actually verify the security property.

Standard code coverage tools cannot detect this. MethodAtlas can, because the AI reads the test body and understands what each assertion is actually checking.

Action threshold

A score of 1.0 warrants immediate developer review. A score above 0.8 warrants review before the next release. Scores below 0.5 on security- relevant tests are generally acceptable.

Understanding the confidence score¶

The ai_confidence score reflects how certain the AI model is about its security-relevance classification. This is separate from the test quality measured by the interaction score.

Range	Interpretation
`0.8` – `1.0`	High confidence — the model is certain. Treat the classification as reliable.
`0.5` – `0.8`	Moderate confidence — the model is fairly certain but human review is advisable, particularly if the method is a high-stakes security control.
`0.0` – `0.5`	Low confidence — the AI is uncertain. Human review is required before including or excluding this method from an audit evidence package.

Source-level tags versus AI classifications¶

MethodAtlas produces two independent kinds of security labels for each test method:

Source-level tags (tags column) are labels that a developer typed directly into the source file — @Tag("security") in Java, [Category("security")]/[Trait("Tag", "security")] in C#. They are factual — they represent what the developer intended when writing the test. They do not change unless a developer edits the source.

AI classifications (ai_security_relevant, ai_tags, ai_display_name, ai_reason columns) are produced by an AI model that reads the test method body and reasons about what the test actually does. The AI does not rely on the developer's intent; it reads the code.

These two sources will sometimes disagree, and both disagreement directions are significant:

Scenario	Likely explanation	Action
Developer labelled the test `security`; AI agrees	Test is correctly labelled and correctly implemented	No action
Developer labelled the test `security`; AI does not consider it security-relevant	The annotation may be historical or aspirational; the test body does not actually exercise a security property	Developer review: tighten the test or remove the label
Developer did not label the test; AI considers it security-relevant	A security property is being tested but is not labelled — tag-based reports will miss it	Add `@Tag("security")` or document why the AI is wrong
Neither developer nor AI considers the test security-relevant	Ordinary functional or performance test	No action regarding security

When drift detection is enabled (see below), the tag_ai_drift column makes these disagreements visible in a single field.

Understanding drift detection¶

When drift detection is enabled (-drift-detect flag), the output includes a tag_ai_drift column for each method. It compares the @Tag("security") annotation in the source code against the AI classification:

Value	Meaning	Action
`none`	Source tag and AI agree. No action needed.	—
`tag-only`	Source has `@Tag("security")` but the AI does not consider it security-relevant. The annotation may be stale, or the AI may be wrong.	Review the method; update the tag or add an override.
`ai-only`	AI considers the method security-relevant but the source has no `@Tag("security")` annotation. The test covers a security property but is not labelled as such.	Consider adding `@Tag("security")` to the source, or document why the AI is incorrect.

Drift is significant for audit purposes: dashboards and CI gates that rely on source-level @Tag("security") annotations will silently miscount coverage if drift exists and is not corrected.

Prioritising findings¶

The following framework helps security reviewers decide which findings to act on first:

Priority	Condition	Recommended action
Critical	`ai_security_relevant=true` AND `ai_interaction_score >= 0.8`	Escalate to development team: the test is a placebo. Require outcome assertion before next release.
High	`ai_security_relevant=true` AND `ai_confidence < 0.5`	Manual review: the AI is uncertain. A qualified reviewer should determine whether the classification is correct.
Medium	`tag_ai_drift = tag-only`	Review: the `@Tag("security")` annotation may be stale, or the AI taxonomy may not cover this security domain.
Medium	`tag_ai_drift = ai-only`	Review: a security test may be unlabelled, which would cause it to be missed by tag-based reporting.
Low	`display_name = ""` (explicitly empty)	Developer action: `@DisplayName("")` produces an unnamed test in all reports. Replace with a meaningful name to preserve the audit trail.
Low	`ai_security_relevant=true`, high confidence, low interaction score	Verify taxonomy tags are correct. No immediate action required.

Documenting accepted risks: the override file¶

When the security team reviews findings and reaches a conclusion about a specific test method — whether the AI is correct, incorrect, or the risk is accepted — those decisions should be recorded in the override file.

The override file is a YAML document stored in version control alongside the test source. It records:

Methods where the AI classification is incorrect (false positives and false negatives).
Methods where the security team has accepted the risk of a weak test (with documented rationale).
Methods that the security team has confirmed as correctly classified.

Each entry supports a free-text note field that is never emitted in any output — it is an internal annotation for the security team:

overrides:

  # AI missed this security-critical test — correct the false negative
  - fqcn: com.acme.crypto.AesGcmTest
    method: roundTrip_encryptDecrypt
    securityRelevant: true
    tags: [security, crypto]
    reason: "Verifies ciphertext integrity under AES-GCM — critical cryptographic test"
    note: "Confirmed by security team 2026-04-24 — alice@example.com"

  # Accepted risk: interaction-only test, but replacement is planned for next sprint
  - fqcn: com.acme.auth.LegacyAuthTest
    method: shouldCallEncoder
    securityRelevant: true
    tags: [security, auth]
    note: "Accepted risk 2026-04-24 — outcome assertion to be added in sprint 42 — bob@example.com"

Every change to the override file is visible in the version control diff, creating a tamper-evident audit trail of all human classification decisions.

See Classification Overrides for the complete file format reference.

Viewing results in GitHub Code Scanning¶

When SARIF output is uploaded to GitHub Code Scanning, findings appear under Security → Code scanning in the GitHub repository. Each finding includes:

The rule ID (taxonomy tag assigned by MethodAtlas).
The affected file and line number.
The AI rationale as the finding description.
The interaction score and, when -ai-confidence is active, the confidence percentage — both embedded in the finding message text so they are visible without leaving the GitHub Security tab.

Findings can be filtered by rule, by file path, and by state (open, closed, dismissed). Dismissed findings are retained as an audit trail.

For organisations without GitHub Advanced Security, findings are delivered as inline annotations on pull request diffs — no separate dashboard is required.

Executive summary template¶

The following structure can be used as a basis for a security test coverage section in an audit evidence package or security review document:

Security test coverage summary¶

Prepared by: [role] Date: [date] Source revision: [git commit SHA]

Tool: MethodAtlas [version], scan run as part of [CI pipeline / release process]

Metric	Value
Total test methods scanned	[n]
Security-relevant test methods (AI-classified)	[n]
High-confidence classifications (≥ 0.8)	[n]
Placebo tests requiring review (interaction score ≥ 0.8)	[n]
Human-reviewed overrides in force	[n]
Drift findings (tag vs AI disagreement)	[n]

Security taxonomy coverage: [list of taxonomy tags present, e.g. auth, crypto, injection, session]

Open findings: [brief description of any critical or high priority items above, or "None"]

Artefacts retained: [file names of SARIF and CSV outputs, with content hashes]