# Final Design — Sigil: Hard-Stop Fix, Multi-Position Guard & Cross-Pollination

**Run:** 2026-05-07T18-58-46 · **Team Lead:** Watson (Opus 4.6) · **Architects:** 10 of 10
**Models:** Opus×2, Kimi×2 (minimalist+K2.6), Gemini, DeepSeek-Flash, Mistral-Large, Mistral-Small, Grok-4.20, Qwen3-32B
**Phases:** P0 workshop → P1 pre-seed (security/robustness/ops) → P2-4 parallel DCS → P5 reconciliation (10 designs, 9 decisions) → P6 final challenge (7/7 challengers: verification + 6 lenses; 15 critical/major findings, 11 accepted, 4 acknowledged)
**Status: FINAL**
**Pipeline:** Implementation handoff to K2.6 tri-layer pipeline

---

## Executive Summary

Layers A/B/E of the May 2 redesign successfully eliminated `momentum_decel`. The bot now exits mostly via `atr_trail` (78–89% of closes), and `atr_trail` is profitable (+$0.11–$0.45 avg). But **all four variants remain net negative** because `hard_stop` fires 7-22% of the time at 8–10× the win magnitude (-$3.3 to -$3.7 avg), wiping all `atr_trail` gains.

**Five root causes confirmed (verified against live codebase):**

1. **Fixed 3% stop inside volatility noise band** — TSTUSDT/DOGSUSDT/NFPUSDT/PSGUSDT have ATR(14) ≈ 2–4% of price. A 3% stop fires on random walk.
2. **Cooldown written at CLOSE not OPEN** — `symbol_protections` only fires in exit path. Multiple positions accumulate before any cooldown exists.
3. **No pre-signal guard** — `has_open_position()` check is in `risk_manager.py` (execution layer), not at signal generation. TOCTOU race allows concurrent entries.
4. **APScheduler in-memory job store** — `src/main.py:61` uses `AsyncIOScheduler()` with no `jobstores=` argument → `MemoryJobStore` → scheduler dies silently on any exception, leaving positions open with no new signals.
5. **PostureCoordinator does not exist** — Layer D from the May 2 plan is obsolete (class not in codebase, verified).

**Stuart's decisions:**
- ATR multiplier: **1.5× baseline + 2.5× via variant grid** (Layer F A/B test)
- Layer C (5m MTF confluence): **Skip** — fix stop geometry first
- Pre-signal volatility gate: **Skip** — ATR-relative stop handles the risk

---

## Verified Corrections (from final Opus verification challenge)

These are implementer-critical corrections that must be applied before any code is written:

| # | Reconciliation Claim | Correction |
|---|---|---|
| V1 | `src/signals/signal_engine.py:87` | WRONG PATH. File is `src/micro_scanner/signal_engine.py`. `generate_signals()` doesn't exist on `MicroSignalEngine`. Pre-signal guard goes in `src/micro_scanner/main.py` or `MicroTradingManager._run_breakout_pass()` |
| V2 | Unique index `ON micro_positions (symbol, variant_id)` | `micro_positions` has NO `variant_id` column (migration 002). Must add the column in migration 014 before creating the index |
| V3 | `symbol_protections (symbol, expires_at, ban_until, violation_count, created_at)` | WRONG COLUMN NAMES. Use existing schema from `migration 008`. Check `src/micro_scanner/symbol_protections.py:150` for correct field names |
| V4 | Layer I3 uses `BackgroundScheduler` | WRONG CLASS. `src/main.py:61` uses `AsyncIOScheduler`. Fix must use `AsyncIOScheduler` with SQLAlchemy job store |
| V5 | Bash probe: `sigil_$(echo sigil || echo $VARIANT)` produces `sigil_sigil` | Typo. Correct: `[ "$VARIANT" = "baseline" ] && echo "sigil" || echo "sigil_$VARIANT"` |
| V6 | Config key `position_pct: 0.10` | Actual keys are `base_position_pct` / `half_position_pct` — verify in `config/micro.yaml` |
| V7 | G1 new key `hard_stop_multiplier` conflicts with existing `reentry_cooldown_hours` | Not a conflict, but ensure no name overlap with existing config namespace |
| V8 | `paper_stop_loss_pct` wiring at `paper_portfolio.py:361` | Confirmed correct; this is a zero-cost fix |

---

## Layer G — ATR-Relative Hard Stop (Priority 1)

### G1: Dynamic stop in exit path

**File:** `src/micro_scanner/exit_precedence.py` (or equivalent exit evaluator used by PaperPortfolio)

**New config keys** (in `config/micro.yaml` + variant overlays):
```yaml
hard_stop_floor_pct: 3.0      # Minimum stop distance (%)
hard_stop_multiplier: 1.5     # ATR × this = effective stop distance; 2.5 for atr_loose variants
hard_stop_cap_pct: 6.0        # Maximum stop distance (prevents meme-coin blowout)
```

**Logic** (add to ExitEvaluator or equivalent):
```python
def _compute_dynamic_stop_pct(self, entry_price: float, atr14: float | None) -> float:
    """ATR-relative stop: max(floor, k×ATR) capped at ceiling."""
    if atr14 is None or atr14 <= 0 or entry_price <= 0:
        return self.hard_stop_floor_pct / 100
    dynamic = self.hard_stop_multiplier * atr14 / entry_price
    return min(max(dynamic, self.hard_stop_floor_pct / 100), self.hard_stop_cap_pct / 100)
```

**Rollback:** Set `hard_stop_multiplier: 0` in YAML (sentinel → floor-only path). No migration.

### G2: Wire `paper_stop_loss_pct` into ExitEvaluator (zero-cost)

**File:** `src/micro_scanner/paper_portfolio.py:361` (or wherever ExitEvaluator is instantiated)

Change: pass `hard_stop_pct=config.paper_stop_loss_pct` to ExitEvaluator constructor instead of relying on the default 3.0.

**Rollback:** Revert the one line.

### G3: Constant USD sizing (freqtrade Edge + jesse risk-per-trade)

**File:** `src/micro_scanner/capital.py` (verify actual file — may be part of another module)

**New config keys:**
```yaml
risk_per_trade_usd: 5.0       # Constant $ risk per trade
max_position_usd: 50.0        # Cap per position
```

**Logic:** `position_size = risk_per_trade_usd / (dynamic_stop_pct × entry_price)`, bounded by `max_position_usd`.

**Sentinel:** If `risk_per_trade_usd == 0` → fall back to existing `base_position_pct` (verify key name).

**Rollback:** Set `risk_per_trade_usd: 0`.

### G4: Store `atr_at_entry` for diagnostics

**Migration 014 includes:**
```sql
ALTER TABLE micro_positions ADD COLUMN IF NOT EXISTS atr_at_entry FLOAT;
ALTER TABLE paper_positions ADD COLUMN IF NOT EXISTS atr_at_entry FLOAT;
```

**Write:** Compute ATR(14) at signal time, store in positions table at entry. Enables post-mortem `SELECT exit_reason, AVG(atr_at_entry) FROM micro_positions GROUP BY exit_reason`.

**Rollback:** Both columns are nullable; existing code ignores them.

---

## Layer H — Position Entry Guard (Priority 2)

### H1: Pre-signal guard placement

**CORRECTED PATH:** The guard must go in `MicroTradingManager._run_breakout_pass()` (in `src/micro_scanner/main.py`) — the main scan loop — NOT in a `generate_signals()` method (which doesn't exist).

Add at the top of the per-symbol processing loop, before any rule evaluation:
```python
# Check BEFORE calling breakout/rule evaluation
if await self._has_open_position(symbol, variant_id=self._variant_id, conn=conn):
    continue  # Skip symbol this cycle
```

Where `_has_open_position` queries `micro_positions WHERE symbol=%s AND status='open'`.

**Rollback:** Comment out the guard.

### H2: Cooldown written at OPEN (not CLOSE)

After a successful entry (INSERT into `micro_positions`), immediately write `symbol_protections`:
```python
# Use existing SymbolProtectionStore API (src/micro_scanner/symbol_protections.py)
# Check actual method signature and field names before implementing
await protection_store.mark_cooldown(symbol, duration_hours=config.cooldown_hours)
```

**Key:** Use the existing `SymbolProtectionStore` class and its methods — do NOT write raw SQL. Check `src/micro_scanner/symbol_protections.py` for correct API.

**Rollback:** Comment out the mark_cooldown call. Old behaviour = cooldown written on close only.

### H3: PostgreSQL advisory lock (TOCTOU fix)

In the entry path, around the INSERT into `micro_positions`:
```python
symbol_hash = abs(hash(f"{variant_id}:{symbol}")) % (2**31)
acquired = await conn.fetchval("SELECT pg_try_advisory_xact_lock($1)", symbol_hash)
if not acquired:
    return  # Another concurrent entry in progress for this symbol
```

**Note:** `pg_try_advisory_xact_lock` is transaction-scoped. No cleanup needed. Returns bool.

### H4: `micro_positions` unique index + `variant_id` column (DB-level backstop)

**CORRECTED:** `micro_positions` has no `variant_id` column today. Migration 014 must add it first.

```sql
-- migration 014_entry_guard_up.sql
ALTER TABLE micro_positions ADD COLUMN IF NOT EXISTS variant_id TEXT;

-- Backfill existing rows with the variant from the config
-- (implementer: determine correct backfill value per LXC)

CREATE UNIQUE INDEX IF NOT EXISTS idx_micro_positions_open_symbol_variant
    ON micro_positions (symbol, variant_id) WHERE status = 'open';
```

**Rollback (014_entry_guard_down.sql):**
```sql
DROP INDEX IF EXISTS idx_micro_positions_open_symbol_variant;
ALTER TABLE micro_positions DROP COLUMN IF EXISTS variant_id;
```

---

## Layer I — Scheduler Liveness (Priority 3)

### I1: `scanner_heartbeat` table

```sql
-- Part of migration 014 or separate migration 015
CREATE TABLE IF NOT EXISTS scanner_heartbeat (
    variant_id TEXT PRIMARY KEY,
    last_scan_start TIMESTAMPTZ,
    last_scan_end TIMESTAMPTZ,
    updated_at TIMESTAMPTZ DEFAULT NOW()
);
```

Write to `scanner_heartbeat` at start and end of each micro-scanner cycle in `MicroTradingManager`.

### I2: Dead-man probe script

**CORRECTED DB naming:**
```bash
#!/bin/bash
# /opt/scripts/sigil-heartbeat-check.sh
for VARIANT in baseline moderate aggressive unhinged; do
  [ "$VARIANT" = "baseline" ] && DB="sigil" || DB="sigil_${VARIANT}"
  STALE=$(psql -U sigil_app -d "$DB" -t -c "
    SELECT COUNT(*) FROM scanner_heartbeat
    WHERE variant_id = '${VARIANT}'
    AND last_scan_end < NOW() - INTERVAL '25 minutes'
  " 2>/dev/null | tr -d ' ')
  if [ "${STALE:-0}" -gt 0 ]; then
    # Send Discord alert via existing webhook mechanism
    echo "SIGIL ALERT: ${VARIANT} scanner stale >25min"
  fi
done
```

Deploy as systemd timer or cron (5-min interval). Rollback: disable timer.

### I3: APScheduler persistent job store

**CORRECTED CLASS:** Use `AsyncIOScheduler` not `BackgroundScheduler`:

```python
# src/main.py — replace scheduler init (approximate location: line 61)
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
from apscheduler.schedulers.asyncio import AsyncIOScheduler

# OLD: self.scheduler = AsyncIOScheduler()
# NEW:
self.scheduler = AsyncIOScheduler(jobstores={
    'default': SQLAlchemyJobStore(url=config.db_url)
})
```

APScheduler auto-creates `apscheduler_jobs` table on first run. Rollback: revert to `AsyncIOScheduler()` with no args (memory store), drop `apscheduler_jobs` table.

---

## Layer F — Variant Grid A/B Test (ATR multiplier)

Config-only change to `grid-overlays/*.yaml`:

| Variant | LXC | `hard_stop_multiplier` | Purpose |
|---|---|---|---|
| baseline | 236 | 1.5 | Control (conservative stop) |
| moderate | 243 | 1.5 | Control (different posture) |
| aggressive | 244 | 2.5 | A/B: wider stop |
| unhinged | 245 | 2.5 | A/B: wider stop (after liveness fix) |

**Fix unhinged first** (Layer I) before deploying F — the scheduler must be running for the data to be valid.

Rollback: revert YAML. No migration.

---

## Cross-Pollination Implemented

| Source | Pattern | Where Applied |
|---|---|---|
| **freqtrade** `ProtectionManager` | Max 1 open position per pair + cooldown | Layer H1 (pre-signal guard) + H2 (cooldown-on-open) |
| **jesse** `risk_per_trade` | Constant dollar risk, not equity% | Layer G3 (constant USD sizing) |
| **jesse** `before_entry()` hook style | Pre-trade validation before rule evaluation | Layer H1 (guard placement in scan loop) |
| **hummingbot** kill switch | Manual reset circuit breaker | Layer E (already implemented, confirmed working) |
| **freqtrade** `stoploss_from_absolute` | ATR-relative stop floor | Layer G1 |

---

## Implementation Order

Execute in this order to avoid broken intermediate states:

1. **G2** — Wire `paper_stop_loss_pct` (1 line, zero risk)
2. **Migration 014** — Add `atr_at_entry` columns, `variant_id` to `micro_positions`, unique index, `scanner_heartbeat` table
3. **I1** — Write heartbeat in scanner loop
4. **I3** — AsyncIOScheduler persistent job store
5. **H2** — Cooldown at OPEN (via existing SymbolProtectionStore)
6. **H1** — Pre-signal guard in `_run_breakout_pass()`
7. **H3** — Advisory lock around entry INSERT
8. **H4** — Verify unique index applied correctly (from migration)
9. **G1** — ATR-relative stop in exit evaluator
10. **G3** — Constant USD sizing
11. **G4** — Store `atr_at_entry` at entry time
12. **I2** — Deploy heartbeat probe script + timer
13. **F** — Deploy variant grid YAML (ATR multiplier)

---

## Final Challenge Disposition

| Challenge | Source | Verdict |
|---|---|---|
| `signal_engine.py` path wrong | Verification V1 | **ACCEPT** — corrected in §Layer H1 |
| `micro_positions.variant_id` missing | Verification V2 | **ACCEPT** — corrected in §Layer H4 migration |
| `symbol_protections` column names wrong | Verification V3 | **ACCEPT** — use existing SymbolProtectionStore API |
| `BackgroundScheduler` wrong class | Verification V4 | **ACCEPT** — corrected to AsyncIOScheduler in §Layer I3 |
| Bash DB naming typo | Verification V5 | **ACCEPT** — corrected in §Layer I2 |
| `position_pct` wrong key name | Verification V6 | **ACCEPT** — verify `base_position_pct` in config |
| `pg_try_advisory_xact_lock` syntax | Verification | **CONFIRMED CORRECT** |
| Multi-LXC migration deployment order | Ops | **ACKNOWLEDGE** — deploy baseline first, verify, then cascade |
| ATR at-signal vs. at-exit time | Simplicity | **ACKNOWLEDGE** — at-entry storage (G4) enables diagnostics; worth the minor overhead |
| Advisory lock vs. unique index alone | Simplicity | **ACKNOWLEDGE** — both needed: index handles concurrent writes, lock handles concurrent signals |
| `hard_stop_multiplier` config collision | Security | **ACKNOWLEDGE** — verify no existing config key with this name in `micro.yaml` before deploying |
| APScheduler job store disk space | Performance | **ACKNOWLEDGE** — `apscheduler_jobs` table is tiny; add to retention policy |

---

## Graduation Gate (unchanged)

30-day paper trading with:
- Net positive P&L across all 4 variants
- Win rate > 50% per variant
- `hard_stop` share < 10% of all exits
- Zero `unverified_open_positions` (positions without a corresponding `scanner_heartbeat` entry)

**Do NOT advance to live capital until graduation gate passes.**

---

## What Did NOT Change

- Layer C (5m multi-timeframe confluence): SKIPPED — revisit if P&L negative after G/H/I
- Layer D (PostureCoordinator rework): OBSOLETE — class doesn't exist
- Regime/volatility gate: SKIPPED per Stuart decision
- HSM/Vault/AWS Secrets Manager: Out of scope
- Blue-green deployments: Out of scope
- mTLS between components: Out of scope (intra-process architecture)
