git init
This commit is contained in:
113
tools/eval/README.md
Normal file
113
tools/eval/README.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Eval Harness — Rule 6
|
||||
|
||||
Fixture-based regression tests for generated artifacts.
|
||||
|
||||
## Why this exists
|
||||
|
||||
> "Evals are the test suite for your prompts. You would never ship code without tests;
|
||||
> don't ship prompts without evals." — Anthropic Engineering
|
||||
|
||||
The validation gate (`tools/validate-generation.mjs`) checks **existence** and **structural compliance**.
|
||||
The eval harness checks **semantic correctness**: are the right patterns present in the generated code?
|
||||
Do the generated files actually follow the rules in `prompts/`?
|
||||
|
||||
Together they enforce:
|
||||
- Gate: "file exists, field names present, auth seams wired"
|
||||
- Evals: "DTO has class-validator decorators, FK uses ReferenceInput, date uses DateInput, guard is present"
|
||||
|
||||
The committed fixture corpus is a reviewed semantic contract. It may be scaffolded from source-of-truth as a helper, but it should not be auto-regenerated wholesale during every full regeneration run, or it stops acting as an independent regression signal.
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Run all evals
|
||||
npm run eval:generation
|
||||
|
||||
# Run evals for one entity
|
||||
node tools/eval/run-evals.mjs --entity equipment
|
||||
|
||||
# Verbose output (show each file being checked)
|
||||
node tools/eval/run-evals.mjs --verbose
|
||||
```
|
||||
|
||||
## Fixture format
|
||||
|
||||
Each fixture lives in `tools/eval/fixtures/<entity>/`:
|
||||
|
||||
```
|
||||
fixtures/
|
||||
equipment/
|
||||
meta.json ← what this fixture tests
|
||||
backend.assertions.json ← patterns the NestJS files must satisfy
|
||||
frontend.assertions.json ← patterns the React Admin files must satisfy
|
||||
repair-order/
|
||||
meta.json
|
||||
backend.assertions.json
|
||||
frontend.assertions.json
|
||||
```
|
||||
|
||||
### `meta.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"entity": "Equipment",
|
||||
"kebab": "equipment",
|
||||
"resource": "equipment",
|
||||
"description": "...",
|
||||
"tests": ["dto-decorator-coverage", "auth-guards", ...]
|
||||
}
|
||||
```
|
||||
|
||||
### `*.assertions.json`
|
||||
|
||||
Each file entry supports:
|
||||
|
||||
| Key | Type | Meaning |
|
||||
|-----|------|---------|
|
||||
| `path` | string | Relative path from repo root |
|
||||
| `must_contain` | string[] | Each string must appear as a literal substring |
|
||||
| `must_not_contain` | string[] | Each string must NOT appear |
|
||||
| `must_match_regex` | string[] | Each pattern must match (multiline dot-all) |
|
||||
| `must_not_match_regex` | string[] | Each pattern must NOT match |
|
||||
| `comment` | string | Human-readable explanation of what is being tested |
|
||||
|
||||
## Eval-driven development workflow
|
||||
|
||||
This is the critical principle from Anthropic and Google:
|
||||
|
||||
1. **Write the failing eval first.** When you change a prompt or add a rule, add an
|
||||
assertion that captures the new expectation *before* re-generating.
|
||||
2. **Run evals**: `npm run eval:generation` → see failures.
|
||||
3. **Re-generate** the affected entity (following the generation workflow in `AGENTS.md`).
|
||||
4. **Run evals again**: all pass → the change is verified.
|
||||
5. **Commit both** the updated fixture and the regenerated artifacts together.
|
||||
|
||||
A passing eval after a prompt change confirms the LLM followed the new rule.
|
||||
A failing eval before a prompt change tells you exactly which prior contract was broken.
|
||||
|
||||
Automation note:
|
||||
|
||||
- It is reasonable to generate starter fixtures or coverage manifests from source-of-truth.
|
||||
- It is not reasonable to let the same regeneration step auto-refresh the authoritative committed eval corpus, because that couples the semantic gate too tightly to the generator and can hide regressions.
|
||||
|
||||
## Adding a new entity fixture
|
||||
|
||||
When adding a new entity to `domain/toir.api.dsl` and generating its backend + frontend:
|
||||
|
||||
1. Create `tools/eval/fixtures/<kebab>/meta.json`
|
||||
2. Create `tools/eval/fixtures/<kebab>/backend.assertions.json` with at minimum:
|
||||
- controller: `@Controller(...)`, `@UseGuards(`, `JwtAuthGuard`, HTTP methods
|
||||
- create_dto: `from 'class-validator'`, required fields with `!:`, `@IsString(`, `@IsOptional(`
|
||||
- update_dto: `from 'class-validator'`, fields with `?:`, `@IsOptional(`
|
||||
3. Create `tools/eval/fixtures/<kebab>/frontend.assertions.json` with at minimum:
|
||||
- create: `ReferenceInput` for FK fields, `NumberInput` for numeric, `DateInput` for date, `SelectInput` for enum
|
||||
- show: `ReferenceField` for FK fields, `DateField` for date
|
||||
4. Run `npm run eval:generation` to verify the fixture catches real issues.
|
||||
|
||||
## Integration with git hooks
|
||||
|
||||
The pre-commit hook (installed by `npm run install-hooks`) runs both:
|
||||
1. `node tools/validate-generation.mjs --artifacts-only` — existence gate
|
||||
2. `npm run eval:generation` — semantic eval gate
|
||||
|
||||
Both must pass before a commit is accepted.
|
||||
Reference in New Issue
Block a user