107 lines
3.8 KiB
Markdown
107 lines
3.8 KiB
Markdown
# Eval Harness — Rule 6
|
|
|
|
Fixture-based regression tests for generated artifacts.
|
|
|
|
## Why this exists
|
|
|
|
> "Evals are the test suite for your prompts. You would never ship code without tests;
|
|
> don't ship prompts without evals." — Anthropic Engineering
|
|
|
|
The validation gate (`tools/validate-generation.mjs`) checks **existence** and **structural compliance**.
|
|
The eval harness checks **semantic correctness**: are the right patterns present in the generated code?
|
|
Do the generated files actually follow the rules in `prompts/`?
|
|
|
|
Together they enforce:
|
|
- Gate: "file exists, field names present, auth seams wired"
|
|
- Evals: "DTO has class-validator decorators, FK uses ReferenceInput, date uses DateInput, guard is present"
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
# Run all evals
|
|
npm run eval:generation
|
|
|
|
# Run evals for one entity
|
|
node tools/eval/run-evals.mjs --entity equipment
|
|
|
|
# Verbose output (show each file being checked)
|
|
node tools/eval/run-evals.mjs --verbose
|
|
```
|
|
|
|
## Fixture format
|
|
|
|
Each fixture lives in `tools/eval/fixtures/<entity>/`:
|
|
|
|
```
|
|
fixtures/
|
|
equipment/
|
|
meta.json ← what this fixture tests
|
|
backend.assertions.json ← patterns the NestJS files must satisfy
|
|
frontend.assertions.json ← patterns the React Admin files must satisfy
|
|
repair-order/
|
|
meta.json
|
|
backend.assertions.json
|
|
frontend.assertions.json
|
|
```
|
|
|
|
### `meta.json`
|
|
|
|
```json
|
|
{
|
|
"entity": "Equipment",
|
|
"kebab": "equipment",
|
|
"resource": "equipment",
|
|
"description": "...",
|
|
"tests": ["dto-decorator-coverage", "auth-guards", ...]
|
|
}
|
|
```
|
|
|
|
### `*.assertions.json`
|
|
|
|
Each file entry supports:
|
|
|
|
| Key | Type | Meaning |
|
|
|-----|------|---------|
|
|
| `path` | string | Relative path from repo root |
|
|
| `must_contain` | string[] | Each string must appear as a literal substring |
|
|
| `must_not_contain` | string[] | Each string must NOT appear |
|
|
| `must_match_regex` | string[] | Each pattern must match (multiline dot-all) |
|
|
| `must_not_match_regex` | string[] | Each pattern must NOT match |
|
|
| `comment` | string | Human-readable explanation of what is being tested |
|
|
|
|
## Eval-driven development workflow
|
|
|
|
This is the critical principle from Anthropic and Google:
|
|
|
|
1. **Write the failing eval first.** When you change a prompt or add a rule, add an
|
|
assertion that captures the new expectation *before* re-generating.
|
|
2. **Run evals**: `npm run eval:generation` → see failures.
|
|
3. **Re-generate** the affected entity (following the generation workflow in `AGENTS.md`).
|
|
4. **Run evals again**: all pass → the change is verified.
|
|
5. **Commit both** the updated fixture and the regenerated artifacts together.
|
|
|
|
A passing eval after a prompt change confirms the LLM followed the new rule.
|
|
A failing eval before a prompt change tells you exactly which prior contract was broken.
|
|
|
|
## Adding a new entity fixture
|
|
|
|
When adding a new entity to `domain/toir.api.dsl` and generating its backend + frontend:
|
|
|
|
1. Create `tools/eval/fixtures/<kebab>/meta.json`
|
|
2. Create `tools/eval/fixtures/<kebab>/backend.assertions.json` with at minimum:
|
|
- controller: `@Controller(...)`, `@UseGuards(`, `JwtAuthGuard`, HTTP methods
|
|
- create_dto: `from 'class-validator'`, required fields with `!:`, `@IsString(`, `@IsOptional(`
|
|
- update_dto: `from 'class-validator'`, fields with `?:`, `@IsOptional(`
|
|
3. Create `tools/eval/fixtures/<kebab>/frontend.assertions.json` with at minimum:
|
|
- create: `ReferenceInput` for FK fields, `NumberInput` for numeric, `DateInput` for date, `SelectInput` for enum
|
|
- show: `ReferenceField` for FK fields, `DateField` for date
|
|
4. Run `npm run eval:generation` to verify the fixture catches real issues.
|
|
|
|
## Integration with git hooks
|
|
|
|
The pre-commit hook (installed by `npm run install-hooks`) runs both:
|
|
1. `node tools/validate-generation.mjs --artifacts-only` — existence gate
|
|
2. `npm run eval:generation` — semantic eval gate
|
|
|
|
Both must pass before a commit is accepted.
|