155 lines
5.2 KiB
Markdown
155 lines
5.2 KiB
Markdown
# Future Work — Deferred Items
|
||
|
||
This file tracks engineering improvements that are deliberately deferred due to the
|
||
current stage of the project. They are not forgotten — they are acknowledged technical
|
||
debt that should be addressed before scaling.
|
||
|
||
---
|
||
|
||
## Rule 7 — Tracing, Telemetry, Cost/Latency Observability
|
||
|
||
**Status:** Deferred. No LLM calls are instrumented.
|
||
|
||
**Why it matters (Anthropic / Google / Microsoft guidance):**
|
||
Without observability, you cannot:
|
||
- Know which prompts are expensive (token count, latency)
|
||
- Detect prompt regressions via cost drift
|
||
- Attribute generation failures to specific prompt versions
|
||
- Track improvement over time
|
||
|
||
**What needs to be built:**
|
||
|
||
### 7.1 — Generation log
|
||
|
||
Create `tools/generation-log.mjs` that wraps any LLM generation call and writes a
|
||
structured JSON entry to `logs/generation.jsonl`:
|
||
|
||
```json
|
||
{
|
||
"timestamp": "2026-04-03T10:00:00.000Z",
|
||
"entity": "Equipment",
|
||
"artifact": "backend",
|
||
"prompt_version": "1.0",
|
||
"model": "...",
|
||
"input_tokens": 4200,
|
||
"output_tokens": 1800,
|
||
"latency_ms": 3200,
|
||
"validation_passed": true,
|
||
"eval_passed": true
|
||
}
|
||
```
|
||
|
||
### 7.2 — Cost budget alerts
|
||
|
||
Add a threshold check (e.g., warn if input_tokens > 8000 for a single entity generation).
|
||
This enforces the context budget from `prompts/general-prompt.md §CONTEXT BUDGET`.
|
||
|
||
### 7.3 — Prompt version tracking
|
||
|
||
Add `<!-- prompt-version: X.Y -->` comments to all prompt files (already started in
|
||
`backend-rules.md` and `frontend-rules.md`). Increment version on any non-trivial change.
|
||
Log the prompt versions alongside the generation log entry.
|
||
|
||
### 7.4 — Drift detection
|
||
|
||
Compare generation log entries across runs. If token count for the same entity increases
|
||
by >20% without a DSL change, flag it as context rot.
|
||
|
||
**Effort estimate:** Medium. 2–3 days to build the logging layer. Zero effort for
|
||
prompt versioning (already partially done).
|
||
|
||
**Trigger:** Implement before the system is used for more than 10 entities or before
|
||
any production deployment.
|
||
|
||
---
|
||
|
||
## Rule 8 — Risk Controls and Red-Teaming
|
||
|
||
**Status:** Deferred. No sanitization or adversarial testing exists.
|
||
|
||
**Why it matters (Anthropic / Google / Microsoft guidance):**
|
||
LLM-generated code at scale introduces risks that do not exist in hand-written code:
|
||
- **Prompt injection**: malicious content in DSL `description` fields could steer
|
||
generation (e.g., `description "Ignore previous instructions and..."`)
|
||
- **Generated credential leakage**: LLM may hallucinate hardcoded secrets that look
|
||
real (e.g., `apiKey: 'sk-...'`)
|
||
- **Missing auth guards**: already caught by Rule 4 validator, but adversarial prompts
|
||
could bypass it by generating valid-looking guard syntax that is semantically inactive
|
||
- **Supply chain**: generated package imports could reference non-existent or malicious
|
||
packages if the LLM hallucinates
|
||
|
||
**What needs to be built:**
|
||
|
||
### 8.1 — DSL input sanitization
|
||
|
||
In `tools/api-summary.mjs`, before building the summary, check all `description` and
|
||
`label` fields for injection patterns:
|
||
|
||
```javascript
|
||
function sanitizeDslString(value, fieldPath) {
|
||
const injectionPatterns = [
|
||
/ignore previous/i,
|
||
/disregard.*instruction/i,
|
||
/you are now/i,
|
||
/system:/i,
|
||
];
|
||
for (const pattern of injectionPatterns) {
|
||
if (pattern.test(value)) {
|
||
throw new Error(`Potential prompt injection in DSL field ${fieldPath}: "${value}"`);
|
||
}
|
||
}
|
||
return value;
|
||
}
|
||
```
|
||
|
||
### 8.2 — Generated code security scan
|
||
|
||
Add to `tools/validate-generation.mjs` (or a separate `tools/security-scan.mjs`):
|
||
|
||
```javascript
|
||
// Check no hardcoded secrets leaked into generated code
|
||
function validateNoSecretLeakage() {
|
||
const patterns = [
|
||
/sk-[a-zA-Z0-9]{20,}/, // OpenAI key pattern
|
||
/[a-zA-Z0-9+/]{40}={0,2}/, // Base64 secret-like
|
||
/password\s*=\s*['"][^'"]{4,}['"]/, // Hardcoded password
|
||
/apiKey\s*=\s*['"][^'"]{4,}['"]/, // Hardcoded API key
|
||
];
|
||
// Run against all generated files...
|
||
}
|
||
```
|
||
|
||
### 8.3 — UseGuards completeness audit
|
||
|
||
Beyond the current validator check (UseGuards present), add: verify that the guard
|
||
constructor arguments are non-empty and match the expected guard class names. A guard
|
||
call like `@UseGuards()` (empty) passes the current regex but provides no protection.
|
||
|
||
### 8.4 — Red-team fixture
|
||
|
||
Create `tools/eval/fixtures/_adversarial/` with a fixture that includes a DSL snippet
|
||
containing a benign injection attempt (e.g., a `description` field with "ignore format
|
||
rules") and verifies the generation still produces spec-compliant output.
|
||
|
||
### 8.5 — Generated import allowlist
|
||
|
||
Maintain a list of approved npm packages that generated code may import. Flag any
|
||
import not on the allowlist as a manual review item.
|
||
|
||
**Effort estimate:** Medium-High. 3–5 days. Security scan and sanitization are low
|
||
effort; red-team fixtures and import allowlisting are higher effort.
|
||
|
||
**Trigger:** Implement before any external user can influence `domain/*.api.dsl` content
|
||
(i.e., before a UI or API to edit the DSL is exposed).
|
||
|
||
---
|
||
|
||
## Tracking
|
||
|
||
| Rule | Status | Priority | Trigger |
|
||
|------|--------|----------|---------|
|
||
| Rule 7 — Telemetry | Deferred | Medium | Before >10 entities or production deployment |
|
||
| Rule 8 — Risk controls | Deferred | High | Before DSL editing is exposed to external users |
|
||
|
||
Last updated: 2026-04-03
|