# Eval Harness — Rule 6

Fixture-based regression tests for generated artifacts.

## Why this exists

> "Evals are the test suite for your prompts. You would never ship code without tests;
> don't ship prompts without evals." — Anthropic Engineering

The validation gate (`tools/validate-generation.mjs`) checks **existence** and **structural compliance**.
The eval harness checks **semantic correctness**: are the right patterns present in the generated code?
Do the generated files actually follow the rules in `prompts/`?

Together they enforce:
- Gate: "file exists, field names present, auth seams wired"
- Evals: "DTO has class-validator decorators, FK uses ReferenceInput, date uses DateInput, guard is present"

The committed fixture corpus is a reviewed semantic contract. It may be scaffolded from source-of-truth as a helper, but it should not be auto-regenerated wholesale during every full regeneration run, or it stops acting as an independent regression signal.

## Usage

```bash
# Run all evals
npm run eval:generation

# Run evals for one entity
node tools/eval/run-evals.mjs --entity equipment

# Verbose output (show each file being checked)
node tools/eval/run-evals.mjs --verbose
```

## Fixture format

Each fixture lives in `tools/eval/fixtures/<entity>/`:

```
fixtures/
  equipment/
    meta.json                  ← what this fixture tests
    backend.assertions.json    ← patterns the NestJS files must satisfy
    frontend.assertions.json   ← patterns the React Admin files must satisfy
  repair-order/
    meta.json
    backend.assertions.json
    frontend.assertions.json
```

### `meta.json`

```json
{
  "entity": "Equipment",
  "kebab": "equipment",
  "resource": "equipment",
  "description": "...",
  "tests": ["dto-decorator-coverage", "auth-guards", ...]
}
```

### `*.assertions.json`

Each file entry supports:

| Key | Type | Meaning |
|-----|------|---------|
| `path` | string | Relative path from repo root |
| `must_contain` | string[] | Each string must appear as a literal substring |
| `must_not_contain` | string[] | Each string must NOT appear |
| `must_match_regex` | string[] | Each pattern must match (multiline dot-all) |
| `must_not_match_regex` | string[] | Each pattern must NOT match |
| `comment` | string | Human-readable explanation of what is being tested |

## Eval-driven development workflow

This is the critical principle from Anthropic and Google:

1. **Write the failing eval first.** When you change a prompt or add a rule, add an
   assertion that captures the new expectation *before* re-generating.
2. **Run evals**: `npm run eval:generation` → see failures.
3. **Re-generate** the affected entity (following the generation workflow in `AGENTS.md`).
4. **Run evals again**: all pass → the change is verified.
5. **Commit both** the updated fixture and the regenerated artifacts together.

A passing eval after a prompt change confirms the LLM followed the new rule.
A failing eval before a prompt change tells you exactly which prior contract was broken.

Automation note:

- It is reasonable to generate starter fixtures or coverage manifests from source-of-truth.
- It is not reasonable to let the same regeneration step auto-refresh the authoritative committed eval corpus, because that couples the semantic gate too tightly to the generator and can hide regressions.

## Adding a new entity fixture

When adding a new entity to `domain/toir.api.dsl` and generating its backend + frontend:

1. Create `tools/eval/fixtures/<kebab>/meta.json`
2. Create `tools/eval/fixtures/<kebab>/backend.assertions.json` with at minimum:
   - controller: `@Controller(...)`, `@UseGuards(`, `JwtAuthGuard`, HTTP methods
   - create_dto: `from 'class-validator'`, required fields with `!:`, `@IsString(`, `@IsOptional(`
   - update_dto: `from 'class-validator'`, fields with `?:`, `@IsOptional(`
3. Create `tools/eval/fixtures/<kebab>/frontend.assertions.json` with at minimum:
   - create: `ReferenceInput` for FK fields, `NumberInput` for numeric, `DateInput` for date, `SelectInput` for enum
   - show: `ReferenceField` for FK fields, `DateField` for date
4. Run `npm run eval:generation` to verify the fixture catches real issues.

## Integration with git hooks

The pre-commit hook (installed by `npm run install-hooks`) runs both:
1. `node tools/validate-generation.mjs --artifacts-only` — existence gate
2. `npm run eval:generation` — semantic eval gate

Both must pass before a commit is accepted.