git init

2026-04-06 12:50:46 +03:00
commit 73ddb1a948
155 changed files with 26688 additions and 0 deletions
--- a/tools/eval/README.md
+++ b/tools/eval/README.md
@@ -0,0 +1,113 @@
+# Eval Harness — Rule 6
+
+Fixture-based regression tests for generated artifacts.
+
+## Why this exists
+
+> "Evals are the test suite for your prompts. You would never ship code without tests;
+> don't ship prompts without evals." — Anthropic Engineering
+
+The validation gate (`tools/validate-generation.mjs`) checks **existence** and **structural compliance**.
+The eval harness checks **semantic correctness**: are the right patterns present in the generated code?
+Do the generated files actually follow the rules in `prompts/`?
+
+Together they enforce:
+- Gate: "file exists, field names present, auth seams wired"
+- Evals: "DTO has class-validator decorators, FK uses ReferenceInput, date uses DateInput, guard is present"
+
+The committed fixture corpus is a reviewed semantic contract. It may be scaffolded from source-of-truth as a helper, but it should not be auto-regenerated wholesale during every full regeneration run, or it stops acting as an independent regression signal.
+
+## Usage
+
+```bash
+# Run all evals
+npm run eval:generation
+
+# Run evals for one entity
+node tools/eval/run-evals.mjs --entity equipment
+
+# Verbose output (show each file being checked)
+node tools/eval/run-evals.mjs --verbose
+```
+
+## Fixture format
+
+Each fixture lives in `tools/eval/fixtures/<entity>/`:
+
+```
+fixtures/
+  equipment/
+    meta.json                  ← what this fixture tests
+    backend.assertions.json    ← patterns the NestJS files must satisfy
+    frontend.assertions.json   ← patterns the React Admin files must satisfy
+  repair-order/
+    meta.json
+    backend.assertions.json
+    frontend.assertions.json
+```
+
+### `meta.json`
+
+```json
+{
+  "entity": "Equipment",
+  "kebab": "equipment",
+  "resource": "equipment",
+  "description": "...",
+  "tests": ["dto-decorator-coverage", "auth-guards", ...]
+}
+```
+
+### `*.assertions.json`
+
+Each file entry supports:
+
+| Key | Type | Meaning |
+|-----|------|---------|
+| `path` | string | Relative path from repo root |
+| `must_contain` | string[] | Each string must appear as a literal substring |
+| `must_not_contain` | string[] | Each string must NOT appear |
+| `must_match_regex` | string[] | Each pattern must match (multiline dot-all) |
+| `must_not_match_regex` | string[] | Each pattern must NOT match |
+| `comment` | string | Human-readable explanation of what is being tested |
+
+## Eval-driven development workflow
+
+This is the critical principle from Anthropic and Google:
+
+1. **Write the failing eval first.** When you change a prompt or add a rule, add an
+   assertion that captures the new expectation *before* re-generating.
+2. **Run evals**: `npm run eval:generation` → see failures.
+3. **Re-generate** the affected entity (following the generation workflow in `AGENTS.md`).
+4. **Run evals again**: all pass → the change is verified.
+5. **Commit both** the updated fixture and the regenerated artifacts together.
+
+A passing eval after a prompt change confirms the LLM followed the new rule.
+A failing eval before a prompt change tells you exactly which prior contract was broken.
+
+Automation note:
+
+- It is reasonable to generate starter fixtures or coverage manifests from source-of-truth.
+- It is not reasonable to let the same regeneration step auto-refresh the authoritative committed eval corpus, because that couples the semantic gate too tightly to the generator and can hide regressions.
+
+## Adding a new entity fixture
+
+When adding a new entity to `domain/toir.api.dsl` and generating its backend + frontend:
+
+1. Create `tools/eval/fixtures/<kebab>/meta.json`
+2. Create `tools/eval/fixtures/<kebab>/backend.assertions.json` with at minimum:
+   - controller: `@Controller(...)`, `@UseGuards(`, `JwtAuthGuard`, HTTP methods
+   - create_dto: `from 'class-validator'`, required fields with `!:`, `@IsString(`, `@IsOptional(`
+   - update_dto: `from 'class-validator'`, fields with `?:`, `@IsOptional(`
+3. Create `tools/eval/fixtures/<kebab>/frontend.assertions.json` with at minimum:
+   - create: `ReferenceInput` for FK fields, `NumberInput` for numeric, `DateInput` for date, `SelectInput` for enum
+   - show: `ReferenceField` for FK fields, `DateField` for date
+4. Run `npm run eval:generation` to verify the fixture catches real issues.
+
+## Integration with git hooks
+
+The pre-commit hook (installed by `npm run install-hooks`) runs both:
+1. `node tools/validate-generation.mjs --artifacts-only` — existence gate
+2. `npm run eval:generation` — semantic eval gate
+
+Both must pass before a commit is accepted.