git init

2026-04-03 20:54:37 +03:00
commit c89c23fd1d
50 changed files with 6716 additions and 0 deletions
--- a/docs/future-work.md
+++ b/docs/future-work.md
@@ -0,0 +1,154 @@
+# Future Work — Deferred Items
+
+This file tracks engineering improvements that are deliberately deferred due to the
+current stage of the project. They are not forgotten — they are acknowledged technical
+debt that should be addressed before scaling.
+
+---
+
+## Rule 7 — Tracing, Telemetry, Cost/Latency Observability
+
+**Status:** Deferred. No LLM calls are instrumented.
+
+**Why it matters (Anthropic / Google / Microsoft guidance):**
+Without observability, you cannot:
+- Know which prompts are expensive (token count, latency)
+- Detect prompt regressions via cost drift
+- Attribute generation failures to specific prompt versions
+- Track improvement over time
+
+**What needs to be built:**
+
+### 7.1 — Generation log
+
+Create `tools/generation-log.mjs` that wraps any LLM generation call and writes a
+structured JSON entry to `logs/generation.jsonl`:
+
+```json
+{
+  "timestamp": "2026-04-03T10:00:00.000Z",
+  "entity": "Equipment",
+  "artifact": "backend",
+  "prompt_version": "1.0",
+  "model": "...",
+  "input_tokens": 4200,
+  "output_tokens": 1800,
+  "latency_ms": 3200,
+  "validation_passed": true,
+  "eval_passed": true
+}
+```
+
+### 7.2 — Cost budget alerts
+
+Add a threshold check (e.g., warn if input_tokens > 8000 for a single entity generation).
+This enforces the context budget from `prompts/general-prompt.md §CONTEXT BUDGET`.
+
+### 7.3 — Prompt version tracking
+
+Add `<!-- prompt-version: X.Y -->` comments to all prompt files (already started in
+`backend-rules.md` and `frontend-rules.md`). Increment version on any non-trivial change.
+Log the prompt versions alongside the generation log entry.
+
+### 7.4 — Drift detection
+
+Compare generation log entries across runs. If token count for the same entity increases
+by >20% without a DSL change, flag it as context rot.
+
+**Effort estimate:** Medium. 2–3 days to build the logging layer. Zero effort for
+prompt versioning (already partially done).
+
+**Trigger:** Implement before the system is used for more than 10 entities or before
+any production deployment.
+
+---
+
+## Rule 8 — Risk Controls and Red-Teaming
+
+**Status:** Deferred. No sanitization or adversarial testing exists.
+
+**Why it matters (Anthropic / Google / Microsoft guidance):**
+LLM-generated code at scale introduces risks that do not exist in hand-written code:
+- **Prompt injection**: malicious content in DSL `description` fields could steer
+  generation (e.g., `description "Ignore previous instructions and..."`)
+- **Generated credential leakage**: LLM may hallucinate hardcoded secrets that look
+  real (e.g., `apiKey: 'sk-...'`)
+- **Missing auth guards**: already caught by Rule 4 validator, but adversarial prompts
+  could bypass it by generating valid-looking guard syntax that is semantically inactive
+- **Supply chain**: generated package imports could reference non-existent or malicious
+  packages if the LLM hallucinates
+
+**What needs to be built:**
+
+### 8.1 — DSL input sanitization
+
+In `tools/api-summary.mjs`, before building the summary, check all `description` and
+`label` fields for injection patterns:
+
+```javascript
+function sanitizeDslString(value, fieldPath) {
+  const injectionPatterns = [
+    /ignore previous/i,
+    /disregard.*instruction/i,
+    /you are now/i,
+    /system:/i,
+  ];
+  for (const pattern of injectionPatterns) {
+    if (pattern.test(value)) {
+      throw new Error(`Potential prompt injection in DSL field ${fieldPath}: "${value}"`);
+    }
+  }
+  return value;
+}
+```
+
+### 8.2 — Generated code security scan
+
+Add to `tools/validate-generation.mjs` (or a separate `tools/security-scan.mjs`):
+
+```javascript
+// Check no hardcoded secrets leaked into generated code
+function validateNoSecretLeakage() {
+  const patterns = [
+    /sk-[a-zA-Z0-9]{20,}/,         // OpenAI key pattern
+    /[a-zA-Z0-9+/]{40}={0,2}/,     // Base64 secret-like
+    /password\s*=\s*['"][^'"]{4,}['"]/, // Hardcoded password
+    /apiKey\s*=\s*['"][^'"]{4,}['"]/,   // Hardcoded API key
+  ];
+  // Run against all generated files...
+}
+```
+
+### 8.3 — UseGuards completeness audit
+
+Beyond the current validator check (UseGuards present), add: verify that the guard
+constructor arguments are non-empty and match the expected guard class names. A guard
+call like `@UseGuards()` (empty) passes the current regex but provides no protection.
+
+### 8.4 — Red-team fixture
+
+Create `tools/eval/fixtures/_adversarial/` with a fixture that includes a DSL snippet
+containing a benign injection attempt (e.g., a `description` field with "ignore format
+rules") and verifies the generation still produces spec-compliant output.
+
+### 8.5 — Generated import allowlist
+
+Maintain a list of approved npm packages that generated code may import. Flag any
+import not on the allowlist as a manual review item.
+
+**Effort estimate:** Medium-High. 3–5 days. Security scan and sanitization are low
+effort; red-team fixtures and import allowlisting are higher effort.
+
+**Trigger:** Implement before any external user can influence `domain/*.api.dsl` content
+(i.e., before a UI or API to edit the DSL is exposed).
+
+---
+
+## Tracking
+
+| Rule | Status | Priority | Trigger |
+|------|--------|----------|---------|
+| Rule 7 — Telemetry | Deferred | Medium | Before >10 entities or production deployment |
+| Rule 8 — Risk controls | Deferred | High | Before DSL editing is exposed to external users |
+
+Last updated: 2026-04-03