Skip to content

Guide for Security Teams

MethodAtlas scans a project's test source code (Java, C#, or TypeScript/JavaScript) and produces a structured inventory of test methods that are relevant to security. This guide is written for security managers, compliance officers, and CISOs who receive MethodAtlas output and need to interpret it, act on findings, and incorporate results into audit evidence packages. It does not assume familiarity with any specific development technology or CI/CD tooling.

When you receive a CSV

When a development team or CI pipeline delivers a MethodAtlas output file, it will typically be a comma-separated values (CSV) file — a table where each row describes one test method and each column contains a specific piece of information about that method.

Open the file in a spreadsheet application (Microsoft Excel, Google Sheets, LibreOffice Calc) or a CSV viewer. The first row contains column headings. Each subsequent row corresponds to one test method found in the project's test code.

The column set varies depending on which options the engineering team used when running the scan. The tables below describe every column you may encounter.

For a complete column reference, see Output Formats.

What MethodAtlas produces

MethodAtlas reads a project's test source code (Java, C#, or TypeScript/JavaScript) and produces a structured inventory of test methods that are security-relevant — methods written to verify that the application correctly implements authentication, cryptography, input validation, access control, and similar security properties.

The output is a table. Each row describes one test method. The columns fall into two groups:

Structural data (always present)

These columns are derived directly from the source code, without any AI involvement. They are deterministic and do not change between runs unless the source changes.

Column Present when Meaning
fqcn Always Fully qualified class name — the package/namespace and class that contains this test
method Always The name of the test method
loc Always Inclusive line count of the method declaration
tags Always Source-level tag values declared on the test (e.g. security, auth) — @Tag in Java, [Category]/[Trait] in C#
display_name Always Display name declared on the method (e.g. @DisplayName in Java, [Fact(DisplayName=…)] in C#); empty when absent
content_hash -content-hash flag SHA-256 fingerprint of the enclosing class source — enables revision traceability

AI enrichment (present when AI classification is enabled)

These columns are produced by an AI model that reads the test method body and classifies it according to a security taxonomy.

Column Present when Meaning
ai_security_relevant AI enabled true if the AI determined this method tests a security property; false otherwise
ai_display_name AI enabled A human-readable description of what the test is verifying (e.g. SECURITY: auth — login rejects expired tokens)
ai_tags AI enabled Security taxonomy tags assigned by the AI (e.g. auth, crypto, injection)
ai_reason AI enabled The AI's rationale for its classification — one or two sentences explaining why the method is or is not security-relevant
ai_interaction_score AI enabled A measure of test quality; see below
ai_confidence AI enabled + -ai-confidence The AI model's certainty in its classification, from 0.0 (uncertain) to 1.0 (certain)
tag_ai_drift -drift-detect flag Compares source @Tag("security") annotation against AI classification; see below

Understanding the interaction score

The ai_interaction_score column measures a specific weakness in test design that is particularly dangerous in the security domain. It answers the question:

Does this test verify the outcome, or does it only verify that certain methods were called?

Score Meaning in plain English
0.0 The test checks the actual result — a return value, a thrown exception, a database state. This is the strongest form of security test.
1.0 The test only verifies that certain methods were called, without checking what they returned or what state they produced.
Values in between Mixed: some outcome assertions alongside interaction-only checks.

Why this matters

Consider a test named shouldStoreEncodedPassword. If that test only verifies that the password encoder was called — but does not check that the encoded value was actually stored, that the plaintext was discarded, and that the stored form is used for authentication — then it provides no real security evidence. The test will still pass even if the encoding logic was removed from production code, as long as the encoder method was still invoked.

This pattern is sometimes called a "placebo test" or "Potemkin village test": it looks like a test, CI reports it as passing, code coverage tools count it as covered — but it does not actually verify the security property.

Standard code coverage tools cannot detect this. MethodAtlas can, because the AI reads the test body and understands what each assertion is actually checking.

Action threshold

A score of 1.0 warrants immediate developer review. A score above 0.8 warrants review before the next release. Scores below 0.5 on security- relevant tests are generally acceptable.

Understanding the confidence score

The ai_confidence score reflects how certain the AI model is about its security-relevance classification. This is separate from the test quality measured by the interaction score.

Range Interpretation
0.81.0 High confidence — the model is certain. Treat the classification as reliable.
0.50.8 Moderate confidence — the model is fairly certain but human review is advisable, particularly if the method is a high-stakes security control.
0.00.5 Low confidence — the AI is uncertain. Human review is required before including or excluding this method from an audit evidence package.

Source-level tags versus AI classifications

MethodAtlas produces two independent kinds of security labels for each test method:

Source-level tags (tags column) are labels that a developer typed directly into the source file — @Tag("security") in Java, [Category("security")]/[Trait("Tag", "security")] in C#. They are factual — they represent what the developer intended when writing the test. They do not change unless a developer edits the source.

AI classifications (ai_security_relevant, ai_tags, ai_display_name, ai_reason columns) are produced by an AI model that reads the test method body and reasons about what the test actually does. The AI does not rely on the developer's intent; it reads the code.

These two sources will sometimes disagree, and both disagreement directions are significant:

Scenario Likely explanation Action
Developer labelled the test security; AI agrees Test is correctly labelled and correctly implemented No action
Developer labelled the test security; AI does not consider it security-relevant The annotation may be historical or aspirational; the test body does not actually exercise a security property Developer review: tighten the test or remove the label
Developer did not label the test; AI considers it security-relevant A security property is being tested but is not labelled — tag-based reports will miss it Add @Tag("security") or document why the AI is wrong
Neither developer nor AI considers the test security-relevant Ordinary functional or performance test No action regarding security

When drift detection is enabled (see below), the tag_ai_drift column makes these disagreements visible in a single field.

Understanding drift detection

When drift detection is enabled (-drift-detect flag), the output includes a tag_ai_drift column for each method. It compares the @Tag("security") annotation in the source code against the AI classification:

Value Meaning Action
none Source tag and AI agree. No action needed.
tag-only Source has @Tag("security") but the AI does not consider it security-relevant. The annotation may be stale, or the AI may be wrong. Review the method; update the tag or add an override.
ai-only AI considers the method security-relevant but the source has no @Tag("security") annotation. The test covers a security property but is not labelled as such. Consider adding @Tag("security") to the source, or document why the AI is incorrect.

Drift is significant for audit purposes: dashboards and CI gates that rely on source-level @Tag("security") annotations will silently miscount coverage if drift exists and is not corrected.

Prioritising findings

The following framework helps security reviewers decide which findings to act on first:

Priority Condition Recommended action
Critical ai_security_relevant=true AND ai_interaction_score >= 0.8 Escalate to development team: the test is a placebo. Require outcome assertion before next release.
High ai_security_relevant=true AND ai_confidence < 0.5 Manual review: the AI is uncertain. A qualified reviewer should determine whether the classification is correct.
Medium tag_ai_drift = tag-only Review: the @Tag("security") annotation may be stale, or the AI taxonomy may not cover this security domain.
Medium tag_ai_drift = ai-only Review: a security test may be unlabelled, which would cause it to be missed by tag-based reporting.
Low display_name = "" (explicitly empty) Developer action: @DisplayName("") produces an unnamed test in all reports. Replace with a meaningful name to preserve the audit trail.
Low ai_security_relevant=true, high confidence, low interaction score Verify taxonomy tags are correct. No immediate action required.

Documenting accepted risks: the override file

When the security team reviews findings and reaches a conclusion about a specific test method — whether the AI is correct, incorrect, or the risk is accepted — those decisions should be recorded in the override file.

The override file is a YAML document stored in version control alongside the test source. It records:

  • Methods where the AI classification is incorrect (false positives and false negatives).
  • Methods where the security team has accepted the risk of a weak test (with documented rationale).
  • Methods that the security team has confirmed as correctly classified.

Each entry supports a free-text note field that is never emitted in any output — it is an internal annotation for the security team:

overrides:

  # AI missed this security-critical test — correct the false negative
  - fqcn: com.acme.crypto.AesGcmTest
    method: roundTrip_encryptDecrypt
    securityRelevant: true
    tags: [security, crypto]
    reason: "Verifies ciphertext integrity under AES-GCM  critical cryptographic test"
    note: "Confirmed by security team 2026-04-24  alice@example.com"

  # Accepted risk: interaction-only test, but replacement is planned for next sprint
  - fqcn: com.acme.auth.LegacyAuthTest
    method: shouldCallEncoder
    securityRelevant: true
    tags: [security, auth]
    note: "Accepted risk 2026-04-24  outcome assertion to be added in sprint 42  bob@example.com"

Every change to the override file is visible in the version control diff, creating a tamper-evident audit trail of all human classification decisions.

See Classification Overrides for the complete file format reference.

Viewing results in GitHub Code Scanning

When SARIF output is uploaded to GitHub Code Scanning, findings appear under Security → Code scanning in the GitHub repository. Each finding includes:

  • The rule ID (taxonomy tag assigned by MethodAtlas).
  • The affected file and line number.
  • The AI rationale as the finding description.
  • The interaction score and, when -ai-confidence is active, the confidence percentage — both embedded in the finding message text so they are visible without leaving the GitHub Security tab.

Findings can be filtered by rule, by file path, and by state (open, closed, dismissed). Dismissed findings are retained as an audit trail.

For organisations without GitHub Advanced Security, findings are delivered as inline annotations on pull request diffs — no separate dashboard is required.

Executive summary template

The following structure can be used as a basis for a security test coverage section in an audit evidence package or security review document:

Security test coverage summary

Prepared by: [role] Date: [date] Source revision: [git commit SHA]

Tool: MethodAtlas [version], scan run as part of [CI pipeline / release process]

Metric Value
Total test methods scanned [n]
Security-relevant test methods (AI-classified) [n]
High-confidence classifications (≥ 0.8) [n]
Placebo tests requiring review (interaction score ≥ 0.8) [n]
Human-reviewed overrides in force [n]
Drift findings (tag vs AI disagreement) [n]

Security taxonomy coverage: [list of taxonomy tags present, e.g. auth, crypto, injection, session]

Open findings: [brief description of any critical or high priority items above, or "None"]

Artefacts retained: [file names of SARIF and CSV outputs, with content hashes]

Further reading