3.8 KiB
Eval Harness — Rule 6
Fixture-based regression tests for generated artifacts.
Why this exists
"Evals are the test suite for your prompts. You would never ship code without tests; don't ship prompts without evals." — Anthropic Engineering
The validation gate (tools/validate-generation.mjs) checks existence and structural compliance.
The eval harness checks semantic correctness: are the right patterns present in the generated code?
Do the generated files actually follow the rules in prompts/?
Together they enforce:
- Gate: "file exists, field names present, auth seams wired"
- Evals: "DTO has class-validator decorators, FK uses ReferenceInput, date uses DateInput, guard is present"
Usage
# Run all evals
npm run eval:generation
# Run evals for one entity
node tools/eval/run-evals.mjs --entity equipment
# Verbose output (show each file being checked)
node tools/eval/run-evals.mjs --verbose
Fixture format
Each fixture lives in tools/eval/fixtures/<entity>/:
fixtures/
equipment/
meta.json ← what this fixture tests
backend.assertions.json ← patterns the NestJS files must satisfy
frontend.assertions.json ← patterns the React Admin files must satisfy
repair-order/
meta.json
backend.assertions.json
frontend.assertions.json
meta.json
{
"entity": "Equipment",
"kebab": "equipment",
"resource": "equipment",
"description": "...",
"tests": ["dto-decorator-coverage", "auth-guards", ...]
}
*.assertions.json
Each file entry supports:
| Key | Type | Meaning |
|---|---|---|
path |
string | Relative path from repo root |
must_contain |
string[] | Each string must appear as a literal substring |
must_not_contain |
string[] | Each string must NOT appear |
must_match_regex |
string[] | Each pattern must match (multiline dot-all) |
must_not_match_regex |
string[] | Each pattern must NOT match |
comment |
string | Human-readable explanation of what is being tested |
Eval-driven development workflow
This is the critical principle from Anthropic and Google:
- Write the failing eval first. When you change a prompt or add a rule, add an assertion that captures the new expectation before re-generating.
- Run evals:
npm run eval:generation→ see failures. - Re-generate the affected entity (following the generation workflow in
AGENTS.md). - Run evals again: all pass → the change is verified.
- Commit both the updated fixture and the regenerated artifacts together.
A passing eval after a prompt change confirms the LLM followed the new rule. A failing eval before a prompt change tells you exactly which prior contract was broken.
Adding a new entity fixture
When adding a new entity to domain/toir.api.dsl and generating its backend + frontend:
- Create
tools/eval/fixtures/<kebab>/meta.json - Create
tools/eval/fixtures/<kebab>/backend.assertions.jsonwith at minimum:- controller:
@Controller(...),@UseGuards(,JwtAuthGuard, HTTP methods - create_dto:
from 'class-validator', required fields with!:,@IsString(,@IsOptional( - update_dto:
from 'class-validator', fields with?:,@IsOptional(
- controller:
- Create
tools/eval/fixtures/<kebab>/frontend.assertions.jsonwith at minimum:- create:
ReferenceInputfor FK fields,NumberInputfor numeric,DateInputfor date,SelectInputfor enum - show:
ReferenceFieldfor FK fields,DateFieldfor date
- create:
- Run
npm run eval:generationto verify the fixture catches real issues.
Integration with git hooks
The pre-commit hook (installed by npm run install-hooks) runs both:
node tools/validate-generation.mjs --artifacts-only— existence gatenpm run eval:generation— semantic eval gate
Both must pass before a commit is accepted.