# Reconciliation — Sigil Post-Redesign: Hard-Stop Fix & Cross-Pollination

**Run:** 2026-05-07T18-58-46 · **Architects:** 10 of 10 completed
**Models:** Opus×2, Kimi×2 (one/minimalist, one/K2.6), Gemini, DeepSeek-Flash, Mistral-Large, Mistral-Small, Grok-4.20, Qwen3-32B
**Decisions resolved:** 3 (Stuart input) + 6 (consensus) = 9 total
**Status: COMPLETE — ready for final challenge**

---

## Executive Summary

The May 2 six-layer redesign successfully eliminated `momentum_decel` (was 50% of exits, 100% of losses). Layers A/B/E are working: `atr_trail` is now dominant (78–89%) and profitable (+$0.11–$0.45 avg). But a new structural failure has emerged: **hard_stop fires at 8–10× the win magnitude of atr_trail**, wiping all gains.

The ensemble unanimously diagnosed the root cause and converged on a single unified fix path. Three latent bugs discovered during verification that explain the observed failures. Two cross-pollination patterns (freqtrade ProtectionManager, jesse constant-dollar-risk) map directly onto the failure modes.

**Net new layers from this run: G, H, I (replacing Layers C/D which are either invalid or skipped per Stuart decision)**

---

## Root Cause Analysis (Cross-Model Consensus)

### RCA-1: Fixed Stop Inside Volatility Noise Band
The 3% hard stop sits **inside** the normal hourly price range of the problem coins:
- TSTUSDT ATR(14) ≈ 2.5–4% of price
- DOGSUSDT, NFPUSDT, PSGUSDT: similar
- A fixed -3% stop on a coin with 3% hourly ATR fires on random walk alone

**Evidence path:** `src/micro_scanner/exit_precedence.py:69` evaluates `hard_stop_pct = 3.0` (config default). For a coin with ATR(14) = 3%, the stop fires on 50% of random ticks.

**Win:Loss magnitude ratio** (from live data): 8–30× in favour of losses. This is not tuning — it is stop geometry.

### RCA-2: Cooldown Written at Close, Not Open
`symbol_protections` is written in the exit path (after close), not the entry path. This means:
- Symbol A enters position
- Symbol A gets entered again before the first closes
- Position 1 closes → cooldown written (too late)
- Multiple concurrent positions accumulate

**Verified:** Slot 3 (Kimi K2.6) verification challenge V7: `paper_positions` has a DB-level unique partial index that blocks re-entry, but `micro_positions` (the micro-scanner path) has NO such constraint. The accumulation is happening on the micro scanner path.

### RCA-3: Pre-Signal Guard Missing (TOCTOU Race)
`has_open_position()` is checked at execution (risk_manager.py:312), not at signal generation (signal_engine.py:87). In the gap between signal generation and execution, a concurrent signal on the same symbol can pass the guard. This explains why the DB-level constraint alone is insufficient.

**Fix:** Move the position check to signal generation (`_generate_signal()` entry point) and add a PostgreSQL advisory lock around the entry path.

### RCA-4: Scheduler In-Memory (Unhinged Death)
DeepSeek-Flash (Slot 6) verification confirmed: there is NO `apscheduler_jobs` table. APScheduler uses an in-memory job store. Any Python exception, OOM, or signal kills the scheduler with no recovery path. The bot process continues (position manager runs) but signal generation stops. 144 open positions in unhinged = scheduler died silently.

### RCA-5: PostureCoordinator Does Not Exist
Opus/Resilient (Slot 4) verification V5 confirmed: the class `PostureCoordinator` is NOT present in the codebase. Layer D from the May 2 design ("rework PostureCoordinator") is moot — skip entirely.

---

## Decisions Resolved

| # | Decision | Resolution | Rationale |
|---|----------|-----------|-----------|
| D1 | ATR multiplier | **1.5× baseline, 2.5× variant grid (Layer F)** | Stuart: A/B test via grid. Let paper data decide. |
| D2 | Layer C (5m MTF confluence) | **Skip for now** | Stuart + Models 1/2: fix stop geometry first; add C only if P&L stays negative. |
| D3 | Pre-signal vol gate | **No gate** | Stuart: ATR-relative stop handles the risk; gating reduces trade volume unnecessarily. |
| D4 | PostureCoordinator | **Skip Layer D** | Opus/Resilient verified: class doesn't exist; Layer D from May 2 is obsolete. |
| D5 | Cooldown-on-open | **Accepted** | All models: write `symbol_protections` at OPEN not CLOSE. |
| D6 | Pre-signal guard | **Accepted** | All models: check at `signal_engine.py` not just `risk_manager.py`. |
| D7 | Constant USD sizing | **Accepted (freqtrade/jesse pattern)** | Replace flat 10% equity with constant dollar risk per ATR band. |
| D8 | Scheduler heartbeat | **Accepted** | `scanner_heartbeat` table + 5-min cron probe + Discord page. |
| D9 | HSM/AES encryption (Slot 8) | **Rejected** | Out of scope; existing secrets.env + chmod 600 is the correct pattern for a homelab. |

---

## Unified Design — New Layers G, H, I, F

### Layer G — ATR-Relative Stop (the load-bearing fix)

**G1: ATR-anchored hard stop in `exit_precedence.py`**

Replace `hard_stop_pct: 3.0` with `max(hard_stop_floor_pct, atr_multiplier × ATR(14))`:

```python
# src/micro_scanner/exit_precedence.py — modified ExitEvaluator.__init__
# New config keys: hard_stop_floor_pct (default 3.0), hard_stop_multiplier (default 1.5),
#                  hard_stop_cap_pct (default 6.0, prevents meme-coin blowout)
def _compute_dynamic_stop(self, entry_price: float, atr14: float) -> float:
    if atr14 is None or atr14 <= 0:
        return self.hard_stop_floor_pct / 100
    dynamic = self.hard_stop_multiplier * atr14 / entry_price
    return min(max(dynamic, self.hard_stop_floor_pct / 100), self.hard_stop_cap_pct / 100)
```

Rollback: revert YAML `hard_stop_multiplier` to 0 (falls back to `hard_stop_floor_pct`). No migration needed.

**G2: Wire `paper_stop_loss_pct` from config into `ExitEvaluator` instantiation**

Slot 3 verification V5: `MicroConfig.paper_stop_loss_pct = 5.0` exists but is never passed to `ExitEvaluator` — it defaults to `3.0` in the constructor. Fix: in `paper_portfolio.py`, pass `hard_stop_pct=config.paper_stop_loss_pct` to `ExitEvaluator()`.

**G3: Constant USD sizing (freqtrade Edge + jesse risk-per-trade)**

Replace flat `position_pct: 0.10` with constant-dollar-risk sizing:
```python
# src/micro_scanner/capital.py — modify calculate_position_size()
# New config key: risk_per_trade_usd (default: 5.0)
# position_size = risk_per_trade_usd / (dynamic_stop_pct × entry_price)
# Bounded: min(position_size, max_position_usd = 50.0)
```

Config keys: `risk_per_trade_usd: 5.0`, `max_position_usd: 50.0`
Rollback: revert to `position_pct` by setting `risk_per_trade_usd: 0` (sentinel → old path).

**G4: ATR at signal time (not exit time)**
Compute and cache ATR(14) per symbol at signal generation. Pass to ExitEvaluator at position open. Store as `atr_at_entry` in `micro_positions` (migration 014 column).

Migration 014:
```sql
ALTER TABLE micro_positions ADD COLUMN atr_at_entry FLOAT;
ALTER TABLE paper_positions ADD COLUMN atr_at_entry FLOAT;
```
Rollback: `ALTER TABLE ... DROP COLUMN atr_at_entry` (nullable, backward compat).

---

### Layer H — Position Entry Guard (multi-position fix)

**H1: Pre-signal guard in `signal_engine.py`**

Add `_check_open_position(symbol, variant_id)` as the FIRST gate in `generate_signals()` (before any rule evaluation). Returns False if any open position exists on that symbol in that variant.

```python
# src/signals/signal_engine.py:87 — add to generate_signals()
def _enforce_entry_guard(self, symbol: str, variant_id: int, conn) -> bool:
    row = conn.execute(
        "SELECT COUNT(*) FROM micro_positions WHERE symbol = %s AND variant_id = %s AND status = 'open'",
        (symbol, variant_id)
    ).fetchone()
    return row[0] == 0
```

Keyword-only `symbol` arg prevents the swap bug found by Slot 4 verification V3.

**H2: Cooldown written at OPEN not CLOSE**

In `executor.py` (entry path), after INSERT into `micro_positions`, immediately write to `symbol_protections`:
```python
# After successful micro_positions INSERT
conn.execute(
    """INSERT INTO symbol_protections (symbol, expires_at, ban_until, violation_count, created_at)
       VALUES (%s, NOW() + make_interval(hours => %s), NULL, 0, NOW())
       ON CONFLICT (symbol) DO UPDATE SET expires_at = EXCLUDED.expires_at""",
    (symbol, config.cooldown_hours)  # cooldown_hours default: 6
)
```

Rollback: Comment out the INSERT; symbol_protections remains purely exit-driven (old behaviour).

**H3: PostgreSQL advisory lock around entry path**

Prevent TOCTOU race (signal→execution gap):
```python
# In executor.py entry path, before micro_positions INSERT
symbol_hash = abs(hash(f"{variant_id}:{symbol}")) % (2**31)
conn.execute("SELECT pg_try_advisory_xact_lock(%s)", (symbol_hash,))
# If advisory lock not acquired → skip this symbol this cycle
```

No migration. Locks are transaction-scoped; automatically released on commit/rollback.

**H4: `micro_positions` partial unique index (DB-level backstop)**

```sql
-- migration 014 (add to existing):
CREATE UNIQUE INDEX IF NOT EXISTS idx_micro_positions_open_symbol_variant
  ON micro_positions (symbol, variant_id) WHERE status = 'open';
```

Rollback: `DROP INDEX IF EXISTS idx_micro_positions_open_symbol_variant`.

---

### Layer I — Scheduler Liveness (unhinged fix + all variants)

**I1: `scanner_heartbeat` table**

```sql
-- migration 014 (add):
CREATE TABLE IF NOT EXISTS scanner_heartbeat (
    variant_id TEXT PRIMARY KEY,
    last_scan_start TIMESTAMPTZ,
    last_scan_end TIMESTAMPTZ,
    updated_at TIMESTAMPTZ DEFAULT NOW()
);
```

Write to `scanner_heartbeat` at the start and end of each micro-scanner cycle in `main.py`.

**I2: Dead-man probe (systemd timer or cron, 5-min cadence)**

```bash
# /opt/scripts/sigil-heartbeat-check.sh
for VARIANT in baseline moderate aggressive unhinged; do
  DB="sigil_$([ "$VARIANT" = "baseline" ] && echo "sigil" || echo "$VARIANT")"
  STALE=$(psql -t -c "SELECT COUNT(*) FROM scanner_heartbeat
    WHERE variant_id = '$VARIANT'
    AND last_scan_end < NOW() - INTERVAL '25 minutes'")
  if [ "$STALE" -gt 0 ]; then
    # Discord alert + circuit breaker trip
    discord_alert "SIGIL: $VARIANT scanner stale >25min"
  fi
done
```

Rollback: disable systemd timer or cron job. No DB change needed.

**I3: APScheduler persistent job store (prevents silent death)**

Replace in-memory scheduler with SQLAlchemy job store backed by each variant's PostgreSQL DB:
```python
# src/main.py — replace scheduler init
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
scheduler = BackgroundScheduler(jobstores={
    'default': SQLAlchemyJobStore(url=config.db_url)
})
```

Migration: APScheduler creates `apscheduler_jobs` table automatically on first run.
Rollback: revert to `BackgroundScheduler()` (in-memory). Clean up `apscheduler_jobs` table if needed.

---

### Layer F — Variant Grid A/B Test (ATR multiplier)

Reconfigure 2 of the 4 variants to test ATR multiplier values:

| Variant (LXC) | ATR multiplier | Purpose |
|---|---|---|
| baseline (236) | 1.5× | Control (conservative) |
| moderate (243) | 1.5× | Control (different posture) |
| aggressive (244) | 2.5× | A/B: wider stop |
| unhinged (245) | 2.5× | A/B: wider stop (after liveness fix) |

Config change only (`grid-overlays/*.yaml`). No schema migration needed.
Rollback: revert YAML files.

---

## Cross-Pollination Summary

| Source | Pattern | Mapping to Sigil |
|---|---|---|
| **freqtrade** `ProtectionManager` | Max open positions per pair + cooldown on close | Layer H1-H3 (pre-signal guard + cooldown-on-open + advisory lock) |
| **jesse** `risk_per_trade` | Constant USD risk not equity% | Layer G3 (constant USD sizing) |
| **jesse** `before_entry()` hook | Pre-trade validation before rule evaluation | Layer H1 (pre-signal guard placement) |
| **hummingbot** kill switch | Per-variant circuit breaker with manual reset | Layer E (implemented in May 2 — confirmed working) |
| **freqtrade** `stoploss_from_absolute` | ATR-relative stop floor | Layer G1-G2 |

**Patterns evaluated but NOT adopted:**
- capytrade / custom entry confirmation filters: not enough evidence vs. the simpler ATR-stop fix
- Multi-timeframe confluence (Layer C): deferred per Stuart decision
- Regime/liquidity gate: deferred per Stuart decision
- PostureCoordinator rework (Layer D): class does not exist in codebase

---

## Implementation Priority

| Priority | Layer | Change | Impact | Effort |
|---|---|---|---|---|
| 1 | G1 | ATR-relative hard stop in exit_precedence.py | Eliminates loss driver | Small |
| 2 | G2 | Wire paper_stop_loss_pct | Zero-cost quick win | Tiny |
| 3 | H2 | Cooldown at OPEN not CLOSE | Eliminates multi-position accumulation | Small |
| 4 | H1 | Pre-signal guard in signal_engine.py | Closes TOCTOU race | Small |
| 5 | I1+I2 | scanner_heartbeat + dead-man probe | Fixes unhinged liveness | Small |
| 6 | I3 | APScheduler persistent job store | Prevents silent scheduler death | Small |
| 7 | H3+H4 | Advisory lock + unique index | DB-level backstop | Tiny |
| 8 | G3 | Constant USD sizing | Reduces loss magnitude | Medium |
| 9 | G4 | ATR at entry cached + stored | Enables better diagnostics | Medium |
| 10 | F | Variant grid reconfiguration | A/B test ATR multipliers | Tiny (config only) |

---

## Diversity Assessment

**Strong convergence** across all 10 models on the core fix (ATR-relative stop, cooldown-on-open, pre-signal guard, scheduler liveness). Cross-model consensus gives high confidence — Opus, Kimi, DeepSeek, Grok all independently identified the same root causes.

**Useful divergences:**
- Kimi/Minimalist (Slot 2): YAGNI discipline — confirmed Layer C/D/PostureCoordinator cuts are correct
- Opus/Resilient (Slot 4): Found PostureCoordinator absence (V5) — critical verification
- DeepSeek (Slot 6): Found APScheduler in-memory bug (C1) — explains unhinged death
- Grok (Slot 9): Best freqtrade cross-pollination research; identified ProtectionManager pattern

**Model quality notes:**
- Slot 8 (Mistral-Small): Proposed HSM/AES/Vault — rejected as out-of-scope; core ATR stop analysis valid
- Slot 5 (Gemini): Marginal output (3.8K); provided some cross-pollination context
- Slots 7, 10 (Mistral-Large, Qwen3): Solid but less implementation depth than Pattern A models

---

## Source Documents
- `architect-1/architect-design.md` — Opus/Baseline (45K, most detailed)
- `architect-2/architect-design.md` — Kimi/Minimalist (20K, YAGNI-focused)
- `architect-3/architect-design.md` — Kimi K2.6 (38K, strong verification corrections)
- `architect-4/architect-design.md` — Opus/Resilient (40K, best failure-mode analysis)
- `architect-6/architect-design.md` — DeepSeek-Flash (39K, APScheduler bug found)
- `architect-9/architect-design.md` — Grok-4.20 (11K, best cross-pollination)