Confidence scoring¶
When -ai-confidence is passed alongside -ai, MethodAtlas instructs the model to attach a numeric confidence score to each method classification, enabling downstream filtering and audit-evidence quality control.
When to use¶
Enable confidence scoring when you need to distinguish high-certainty security findings from borderline classifications — for example, when assembling an evidence package for a compliance audit or when gating a CI pipeline on only well-supported security test coverage.
See AI Providers for provider options and the -ai-confidence flag description in CLI reference.
What the score means¶
The score is a decimal in the range 0.0–1.0. It reflects how certain the model is that a method genuinely tests a security property — not simply how confident it is that its answer is "yes".
A score of 0.0 does not indicate uncertainty. It means the model is confident the method is not security-relevant. The absence of a high score is itself information: the model read the test body and concluded it exercises no security property.
The model is instructed to prefer securityRelevant=false over returning a score below 0.5. Low-confidence hits are suppressed rather than surfaced with a low score. In practice you will see scores clustered at 1.0, ~0.7, and 0.0.
Score reference¶
| Score range | Meaning | Example method |
|---|---|---|
1.0 |
The method name and body explicitly and unambiguously test a named security property. | testSQLInjectionBlocked — asserts parameterised query rejects ' OR 1=1 -- with an exception |
~0.7 |
The method clearly tests a security-adjacent concern, but mapping to a specific tag requires inference from class name or surrounding tests. | testUserInputHandling — sanitises input but the class context is needed to confirm the injection-prevention intent |
~0.5 |
The classification is plausible but ambiguous — the method name or body is equally consistent with a non-security interpretation. | testFormSubmit — exercises a form handler that happens to include CSRF token validation alongside unrelated UI logic |
0.0 |
The method is not security-relevant — the model is confident it tests no security property. | testNullInput — verifies that format(null) returns an empty string with no security implication |
Why confidence matters¶
In regulated environments, test coverage evidence submitted to auditors needs to be defensible. Confidence scores let you:
- Filter high-confidence findings — export only rows where
ai_confidence >= 0.7for your audit evidence package. - Flag borderline classifications for human review — rows around
0.5are candidates for manual inspection with override files. - Track classification quality over time — a sudden drop in average confidence across a module may indicate that test naming conventions have become unclear.
Example¶
Output (CSV excerpt):
fqcn,method,loc,tags,display_name,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score,ai_confidence
com.acme.crypto.AesGcmTest,roundTrip_encryptDecrypt,18,,,true,SECURITY: crypto - AES-GCM round-trip,security;crypto,Verifies ciphertext and plaintext integrity under AES-GCM.,0.0,1.0
com.acme.auth.SessionTest,sessionToken_isRotatedAfterLogin,12,,,true,SECURITY: auth - session token rotation after login,security;auth,Session token is replaced on successful login to prevent fixation.,0.0,0.7
com.acme.util.DateFormatterTest,format_returnsIso8601,5,,,false,,,Test verifies date formatting output only.,0.0,0.0
The third row (format_returnsIso8601) has ai_confidence=0.0 because the model is confident this method tests no security property — not because the model is uncertain.
Filtering high-confidence findings¶
Because the output is plain CSV, standard shell tools work:
# Keep only rows where ai_confidence >= 0.7
./methodatlas -ai -ai-confidence /src | \
awk -F',' 'NR==1 || ($11+0) >= 0.7'
Or in a YAML configuration file: