# Eval Harness — Rule 6 Fixture-based regression tests for generated artifacts. ## Why this exists > "Evals are the test suite for your prompts. You would never ship code without tests; > don't ship prompts without evals." — Anthropic Engineering The validation gate (`tools/validate-generation.mjs`) checks **existence** and **structural compliance**. The eval harness checks **semantic correctness**: are the right patterns present in the generated code? Do the generated files actually follow the rules in `prompts/`? Together they enforce: - Gate: "file exists, field names present, auth seams wired" - Evals: "DTO has class-validator decorators, FK uses ReferenceInput, date uses DateInput, guard is present" The committed fixture corpus is a reviewed semantic contract. It may be scaffolded from source-of-truth as a helper, but it should not be auto-regenerated wholesale during every full regeneration run, or it stops acting as an independent regression signal. ## Usage ```bash # Run all evals npm run eval:generation # Run evals for one entity node tools/eval/run-evals.mjs --entity equipment # Verbose output (show each file being checked) node tools/eval/run-evals.mjs --verbose ``` ## Fixture format Each fixture lives in `tools/eval/fixtures//`: ``` fixtures/ equipment/ meta.json ← what this fixture tests backend.assertions.json ← patterns the NestJS files must satisfy frontend.assertions.json ← patterns the React Admin files must satisfy repair-order/ meta.json backend.assertions.json frontend.assertions.json ``` ### `meta.json` ```json { "entity": "Equipment", "kebab": "equipment", "resource": "equipment", "description": "...", "tests": ["dto-decorator-coverage", "auth-guards", ...] } ``` ### `*.assertions.json` Each file entry supports: | Key | Type | Meaning | |-----|------|---------| | `path` | string | Relative path from repo root | | `must_contain` | string[] | Each string must appear as a literal substring | | `must_not_contain` | string[] | Each string must NOT appear | | `must_match_regex` | string[] | Each pattern must match (multiline dot-all) | | `must_not_match_regex` | string[] | Each pattern must NOT match | | `comment` | string | Human-readable explanation of what is being tested | ## Eval-driven development workflow This is the critical principle from Anthropic and Google: 1. **Write the failing eval first.** When you change a prompt or add a rule, add an assertion that captures the new expectation *before* re-generating. 2. **Run evals**: `npm run eval:generation` → see failures. 3. **Re-generate** the affected entity (following the generation workflow in `AGENTS.md`). 4. **Run evals again**: all pass → the change is verified. 5. **Commit both** the updated fixture and the regenerated artifacts together. A passing eval after a prompt change confirms the LLM followed the new rule. A failing eval before a prompt change tells you exactly which prior contract was broken. Automation note: - It is reasonable to generate starter fixtures or coverage manifests from source-of-truth. - It is not reasonable to let the same regeneration step auto-refresh the authoritative committed eval corpus, because that couples the semantic gate too tightly to the generator and can hide regressions. ## Adding a new entity fixture When adding a new entity to `domain/toir.api.dsl` and generating its backend + frontend: 1. Create `tools/eval/fixtures//meta.json` 2. Create `tools/eval/fixtures//backend.assertions.json` with at minimum: - controller: `@Controller(...)`, `@UseGuards(`, `JwtAuthGuard`, HTTP methods - create_dto: `from 'class-validator'`, required fields with `!:`, `@IsString(`, `@IsOptional(` - update_dto: `from 'class-validator'`, fields with `?:`, `@IsOptional(` 3. Create `tools/eval/fixtures//frontend.assertions.json` with at minimum: - create: `ReferenceInput` for FK fields, `NumberInput` for numeric, `DateInput` for date, `SelectInput` for enum - show: `ReferenceField` for FK fields, `DateField` for date 4. Run `npm run eval:generation` to verify the fixture catches real issues. ## Integration with git hooks The pre-commit hook (installed by `npm run install-hooks`) runs both: 1. `node tools/validate-generation.mjs --artifacts-only` — existence gate 2. `npm run eval:generation` — semantic eval gate Both must pass before a commit is accepted.