Eval Harness — Rule 6

Fixture-based regression tests for generated artifacts.

Why this exists

"Evals are the test suite for your prompts. You would never ship code without tests; don't ship prompts without evals." — Anthropic Engineering

The validation gate (tools/validate-generation.mjs) checks existence and structural compliance. The eval harness checks semantic correctness: are the right patterns present in the generated code? Do the generated files actually follow the rules in prompts/?

Together they enforce:

Gate: "file exists, field names present, auth seams wired"
Evals: "DTO has class-validator decorators, FK uses ReferenceInput, date uses DateInput, guard is present"

Usage

# Run all evals
npm run eval:generation

# Run evals for one entity
node tools/eval/run-evals.mjs --entity equipment

# Verbose output (show each file being checked)
node tools/eval/run-evals.mjs --verbose

Fixture format

Each fixture lives in tools/eval/fixtures/<entity>/:

fixtures/
  equipment/
    meta.json                  ← what this fixture tests
    backend.assertions.json    ← patterns the NestJS files must satisfy
    frontend.assertions.json   ← patterns the React Admin files must satisfy
  repair-order/
    meta.json
    backend.assertions.json
    frontend.assertions.json

`meta.json`

{
  "entity": "Equipment",
  "kebab": "equipment",
  "resource": "equipment",
  "description": "...",
  "tests": ["dto-decorator-coverage", "auth-guards", ...]
}

`*.assertions.json`

Each file entry supports:

Key	Type	Meaning
`path`	string	Relative path from repo root
`must_contain`	string[]	Each string must appear as a literal substring
`must_not_contain`	string[]	Each string must NOT appear
`must_match_regex`	string[]	Each pattern must match (multiline dot-all)
`must_not_match_regex`	string[]	Each pattern must NOT match
`comment`	string	Human-readable explanation of what is being tested

Eval-driven development workflow

This is the critical principle from Anthropic and Google:

Write the failing eval first. When you change a prompt or add a rule, add an assertion that captures the new expectation before re-generating.
Run evals: npm run eval:generation → see failures.
Re-generate the affected entity (following the generation workflow in AGENTS.md).
Run evals again: all pass → the change is verified.
Commit both the updated fixture and the regenerated artifacts together.

A passing eval after a prompt change confirms the LLM followed the new rule. A failing eval before a prompt change tells you exactly which prior contract was broken.

Adding a new entity fixture

When adding a new entity to domain/toir.api.dsl and generating its backend + frontend:

Create tools/eval/fixtures/<kebab>/meta.json
Create tools/eval/fixtures/<kebab>/backend.assertions.json with at minimum:
- controller: @Controller(...), @UseGuards(, JwtAuthGuard, HTTP methods
- create_dto: from 'class-validator', required fields with !:, @IsString(, @IsOptional(
- update_dto: from 'class-validator', fields with ?:, @IsOptional(
Create tools/eval/fixtures/<kebab>/frontend.assertions.json with at minimum:
- create: ReferenceInput for FK fields, NumberInput for numeric, DateInput for date, SelectInput for enum
- show: ReferenceField for FK fields, DateField for date
Run npm run eval:generation to verify the fixture catches real issues.

Integration with git hooks

The pre-commit hook (installed by npm run install-hooks) runs both:

node tools/validate-generation.mjs --artifacts-only — existence gate
npm run eval:generation — semantic eval gate

Both must pass before a commit is accepted.

3.8 KiB Raw Blame History