Build log

Shipping in the open

Generated from the engineering changelogs in the Skybolt repo — unedited, newest first.

Cloud Deployment

  • Added the public site (@skybolt/site) as a fourth container in the production stack (ADR-0049 follow-up): the edge Caddy now terminates TLS for skybolt.ai + www.skybolt.ai (www → apex) and reverse-proxies to a site container serving the static Astro build. The site ships as its own GHCR image (ghcr.io/txmerc/skybolt/site), built/pushed by deploy.yml and tag-pinned / rolled-back beside the api (SKYBOLT_SITE_TAG).
  • Validation: bash -n on deploy/rollback, YAML parse of compose + workflow, and a local pnpm --filter @skybolt/site build (4 pages incl. the build-time changelog). The image build runs in CI; docker compose config runs implicitly on the droplet at deploy.
  • Operator action for go-live: apex + www A records (Namecheap) → droplet, alongside api.

Desktop App

  • Marketing-site consolidation: deleted the React apps/web/src/marketing/MarketingSite.tsx (the public marketing site now lives solely in the Astro apps/site). apps/web/src/App.tsx no longer branches on path/runtime to serve a marketing page — it always renders SkyboltApp. Updated AppRouting.test.tsx accordingly and dropped the now-unused three, @types/three, and gsap dependencies from apps/web (smaller renderer bundle).
  • scripts/dev.ps1 / scripts/dev.sh default runs now also start the public Astro site (apps/site) on http://localhost:50030 alongside the desktop app, and stop it when the desktop process exits (incl. Ctrl-C). New -Site (--site) switch runs only the site.

Encrypted Project Sync

  • Locked/ciphertext project.db → clean 423 on every per-project route. The A4 convert route added an auth-first pre-flight that sniffs the header and returns 422 before opening, but that guard protected only sync/convert; every other per-project route (cards, terminals, agents, workspace, project detail, …) still opened the project via _open_project, whose per-project ATTACH raised a raw sqlite3.DatabaseError → opaque HTTP 500 when the on-disk project.db was git-crypt ciphertext (\x00GITCRYPT), a sealed artifact (SBSEALv1), or otherwise not SQLite.
    • Engine: store/project_store.py adds ProjectDatabaseLockedError + a header classifier _project_db_locked_detail (mirrors services/sync_ops.inspect_project_db). _ensure_project_db sniffs the header on EVERY open — not gated by the _UPGRADED_PROJECT_DBS upgrade cache, so a re-locked checkout (a pull/checkout brings ciphertext after a prior successful open) is still caught — and raises the typed error; _open_project keeps a tightly-scoped ATTACH safety net for a residual non-SQLite body and always closes the global connection on failure. app.py registers an exception handler mapping it to 423 Locked with an actionable detail (git-crypt → "Unlock Encrypted Data"; sealed → "Unseal (Sync & Resume)"; generic → "Unlock or unseal"), mirroring the engine-lock require_unlocked middleware. Additive only: the convert pre-flight (still 422) and the graceful /projects listing fallback (_read_project_settings_row) are unchanged; background callers (scheduler) already wrap per-project opens in except Exception.
    • Tests (tests/test_project_db_lock.py, 5): detail + listing routes return 423 (never 500) for all three ciphertext kinds with actionable details; the lock still fires after a prior successful open (cache bypass); the global /projects index still lists a locked project.

Public Site

  • Marketing consolidation: migrated the richer single-page design/content of the former React marketing site (apps/web/src/marketing/MarketingSite.tsx) into this Astro landing, keeping the HeroSwarm agent animation as the hero. New / sections: a command-center hero with a three-step strip, Platform (#platform, six product cards), Machines (#machines, capability chips + a command-window mock), and Privacy (#privacy, three trust groups) — replacing the previous "Control the chaos" grid, principles terminal, and pricing teaser. Nav now targets the section anchors plus Pricing. The React site's Tailwind was translated to the Astro site's CSS-variable system; the mobile nav was resized so all four links fit a phone width. The old React MarketingSite.tsx was deleted, apps/web/src/App.tsx now always renders the Skybolt app (its AppRouting test updated), and the unused three/gsap deps were removed from apps/web. scripts/dev.ps1 / dev.sh default runs now start the desktop app + this site on :50030 together (new -Site / --site flag runs the site alone).
  • Animated hero swarm (src/components/HeroSwarm.astro): a live "agents working a board" layer — an ORCH node routes five named agents (claude-code, codex-01, vllm-local, expert:docs, expert:security, reference colors) to task cards with progress bars; agents fly over, dock, pulse and fill the bar, pair up over a green collaboration link, and completed cards flash green with a ✓ before respawning with the next task title; packets travel the work links. Dependency-free (no three.js/gsap): runtime DOM + one SVG line pool driven by a single rAF loop, paused offscreen via IntersectionObserver. The build-time constellation remains as the no-JS fallback; prefers-reduced-motion gets a static composed scene. Mobile runs a smaller 3-agent/3-card config. The technical-design hero section was rewritten accordingly (the three.js island is no longer the planned upgrade path).
  • Hero visibility fix: the upper hero rendered as a near-black void — the readability scrim (0.35–0.55 black) sat over the whole starfield and flattened the Atmosphere gradient and constellation below perception. The scrim is now light at the top (0.05 → 0.22 @ 42%) and dark only in the lower text zone; the hero background gained a faint #0077e4 radial halo; stars are brighter/larger (opacity 0.35–0.9, r 0.6–1.9); the constellation stroke went 0.25 → 0.4 and agent nodes gained translucent halo rings. The starfield is also 28px taller than the hero so the drift animation never exposes an unpainted strip.
  • Content refresh to match the shipped product (Waves A+B): landing panels now feature the gated full Git surface (M4) and repo import/readiness audits (P-M2) — "Kanban cards" merged into the orchestration panel and "Local by default" ceded to the principles terminal directly below. /features grew matching "Git operations, gated" and "Project import & readiness" cards (eight total), the orchestration card swapped the unshipped overlap-radar/artifact-registry claim for lease-backed scheduling + file-scope conflict prevention and the live agent board, and the cloud identity card now mentions device-session list/revoke. Pricing Free tier lists repo import with readiness audits.
  • Mobile nav: page links (Features/Changelog/Pricing) stay visible under 720px (smaller type); the waitlist CTA pill hides instead — previously the links vanished entirely, leaving subpages unreachable except via the footer.
  • Font loading: preload <link>s for the five above-the-fold latin woff2 faces (Space Grotesk 500/600, Inter 400/500/600) to suppress the fallback-font flash on first paint.
  • Containerized production deployment (ADR-0049 follow-up, rides ADR-0043): apps/site/Dockerfile (multi-stage — pnpm workspace build mirroring the site CI job → caddy:2-alpine static file-server on :8080), apps/site/Dockerfile.dockerignore (keeps documents/ in the build context so the build-time changelog resolves — the root .dockerignore drops it), and apps/site/site.Caddyfile (internal static server).
  • Added the site service to deploy/production/docker-compose.prod.yml (own GHCR image ghcr.io/txmerc/skybolt/site, edge network, healthcheck) and skybolt.ai + www.skybolt.ai blocks to Caddyfile.prod (edge Caddy terminates TLS; www → apex redirect).
  • Extended the deploy pipeline to two images: deploy.yml builds/pushes the site image beside the api; deploy.sh/rollback.sh pin and roll back both tags (SKYBOLT_SITE_TAG).

Agents Execution

  • Per-project schema v3 → v4 → v5 (storage/project_db.py, PROJECT_SCHEMA_VERSION 3 → 5, same strictly-additive ladder):
    • v4 (M2): card_dependencies (UNIQUE(card_id, depends_on_card_id), kind ∈ {ordering, interface}), additive cards.file_scope_json (JSON glob list; ** is the exclusive solo scope) + cards.priority (ascending, default 100), and real terminal_sessions lease columns (control_mode, agent_lease_session_id, agent_lease_state, agent_lease_claimed_at, agent_lease_released_at, orchestrator_session_id) — the columns _terminal_out used to hardcode.
    • v5 (M3): command_profiles execution columns (execution_target_id, working_path, allowed_actors_json, agent_callable, audit_policy ∈ {none, metadata, full}) and command_runs result columns (profile_id, terminal_session_id, output_ref, exit_code).
  • Scheduler multi-claim (M2) (scheduler.py): each tick now runs per-project maintenance (heartbeats, fallback completion, the TTL reaper) first, then claims up to admission headroom sessions. Eligibility = dependencies all completed (SQL NOT EXISTS over card_dependencies) ∧ declared file-scope disjoint from every in-flight card's scope (glob/prefix comparison; ** waits for an empty board and runs solo) ∧ admission: global cap (SKYBOLT_AGENT_MAX_CONCURRENT, default 3), live machine PTY budget (SKYBOLT_AGENT_MAX_MACHINE_PTYS vs LocalTerminalBroker.live_count()), and catalog_models.max_concurrency when the session's declared model resolves. Over-cap sessions stay queued — backpressure, not failure. Claims also take file_scope leases (radar visibility + exact-pattern race belt-and-suspenders).
  • TTL reaper (M2): each tick, held leases whose heartbeat_at (fallback acquired_at/ created_at) is older than ttl_seconds (SKYBOLT_AGENT_LEASE_TTL_SECONDS, default 600) are released; holder sessions are requeued through the existing release path (or failed past the attempt limit with the card escalated to review); orphan leases release directly.
  • Dependents unblock through the events bus (M2): completion writes a durable dependents_unblocked coordination event (payload card_ids) and publishes it on scheduler_bus, so the next tick fires immediately instead of waiting out the poll interval.
  • Terminal lease endpoints (M2) (routes/terminals.py): POST /projects/{id}/terminals/{tid}/lease (one active holder; resource_leases under BEGIN IMMEDIATE — a concurrent second claim 409s) and POST .../release (returns the terminal to human control; 409 when not held). _terminal_out reads the real v4 columns, falling back to M1 metadata_json fields for not-yet-upgraded rows.
  • Command execution on targets (M3) (new services/command_exec.py): a saved command profile runs on its execution target in three shapes — local captured (shell subprocess; stdout/stderr written to a machine-local log under <engine db dir>/projects/<pid>/command-output/<run_id>.log, recorded as output_ref; raw output never lands in the syncable project.db), local visible (in_terminal=true opens a broker terminal linked via terminal_session_id), and SSH (_run_ssh_shell, cd -- <cwd> && <command>). execute_command_run accepts an explicit cwd override — the M8 quality gates will point it at a card's worktree. audit_policy controls retention: full keeps the log, metadata keeps byte counts only, none keeps neither.
  • Run endpoint + approval gating (M3) (routes/command_profiles.py, routes/approvals.py): the 409 stub is replaced by POST /projects/{id}/command-profiles/{pid}/runs (writes command_runs rows; body: actor, agent_session_id, cwd, in_terminal, timeout_seconds). approval_policy != 'none' parks the run awaiting_approval behind an approval_requests row (subject_type='command_run'); the approvals resolve flow executes it on approve (new branch, mirroring the git-operation branch) or marks it rejected/blocked. Actor gates: agent_callable=false profiles 403 agent actors; a non-empty allowed_actors list restricts all actors. PATCH /projects/{id}/command-profiles/{pid} updates the M3 metadata; GET /projects/{id}/command-runs returns real history (was an empty stub).
  • M1 handoffs closed: LocalTerminalBroker.open gained an env kwarg (per-spawn overrides merged over the engine environment; None = historical inherit) so the scheduler injects SKYBOLT_COMPLETE_TOKEN/SKYBOLT_AGENT_SESSION_ID/SKYBOLT_PROJECT_ID/SKYBOLT_COMPLETE_URL per seat without touching os.environ; _terminal_out no longer hardcodes lease fields.
  • Serializers: _card_out exposes file_scope/priority; _command_profile_out the M3 execution metadata; new _command_run_out (profile/terminal links, output_ref, exit_code); _approval_request_out exposes command_run_id. All tolerant of not-yet-upgraded rows.
  • Tests: tests/test_scheduler.py grew to 20 (M2 adds: two concurrent thread claimers → exactly one wins; two schedulers → one claim; dependency order + bus unblock; overlapping scopes never co-run; ** runs solo; global cap under a 4-session burst; PTY cap; provider max_concurrency; reaper requeue + cap-to-review; terminal lease/release endpoint contract; the ladder test now asserts v5). New tests/test_command_exec.py (13: cwd precedence, actor gates, captured run + output log, failure exit codes, cwd override, missing cwd, audit none, unsupported action blocked + disabled 409, agent-callable PATCH flow, approve-executes, reject-without-execute, in-terminal link, SSH routing). tests/test_terminal.py grew to 5 (env merge for the child only; broker env kwarg reaches the spawned PTY).
  • Per-project schema v2 → v3 (storage/project_db.py): new _upgrade_project_schema version ladder (strictly additive; older synced project.db files upgrade in place), resource_leases with the ux_active_lease partial unique index (WHERE state='held' — the IntegrityError IS the mutex), coordination_events, and additive agent_sessions runtime columns (card_id, lease_state, attempt_count, worktree_path, branch_name, engine_terminal_id, heartbeat_at, claimed_at, outcome_json).
  • New skybolt_engine/scheduler.py: AgentScheduler self-registers via services/runtime_hooks (start/stop in the app lifespan, quiesce hook for sync push/pull, defers while the engine is locked). Claim path opens the project DB writer with BEGIN IMMEDIATE and writes only leases + status; after commit it git worktree adds <repo>/.skybolt/worktrees/<session_id>, launches the CLI seat on the in-process LocalTerminalBroker (cwd=worktree), injects the brain prompt package into the PTY, and persists a terminal_sessions row (control_mode='agent' in metadata). recover() reconciles this machine's rows on start: dead terminals requeue (cards back to queue) or fail past the attempt limit (cards escalated to review).
  • New services/worktree_ops.py (worktree add/remove/prune with the strict-argv _run_git discipline; re-applies the .skybolt/.gitignore that ignores worktrees/) and services/events.py (durable coordination-event insert + minimal in-process async pub/sub).
  • Completion contract: POST /projects/{id}/agent-sessions/{sid}/complete accepts a per-session scoped token (injected into the seat PTY as SKYBOLT_COMPLETE_TOKEN, stored only as a SHA-256 hash, never the engine session token) via the X-Skybolt-Complete-Token header, or a normal renderer session. On completion: read-only git status/HEAD summary, journal + coordination event, leases released, card working → completed, worktree removed + pruned. Fallback signals wired minimally: terminal liveness false, then output-idle timeout; the fired signal is journaled. Per ADR-0036 every completion is recorded as claimed-complete, not quality-verified.
  • _agent_session_out now serializes the scheduler state (lease_state, attempt_count, branch_name, engine_terminal_id, heartbeat_at, claimed_at, outcome) tolerantly for not-yet-upgraded rows; agent_sessions.card_id is populated at create.
  • Tests: apps/engine/tests/test_scheduler.py (9 tests) drive tick_once()/recover() deterministically with a real python -c seat through the real broker: end-to-end claim → worktree → seat → scoped-token completion; wrong/missing token 403/401; terminal-exit fallback; recovery requeue + attempt-limit failure; quiesce blocks claims; v2 → v3 ladder upgrade; the partial-index lease mutex; event-bus pub/sub.
  • Known follow-ups: tests/test_storage_split.py:84 still asserts project schema version 2 (one line, owned by the storage wave); app.py's engine-token middleware must exempt the completion path before headless seats can signal completion on token-configured engines; LocalTerminalBroker.open needs an env parameter so the scoped token no longer rides a guarded os.environ override around the spawn.

Cloud Deployment

  • Authored the production deploy stack (ADR-0043 milestone C1):
    • deploy/production/docker-compose.prod.yml — api (GHCR tag-pinned, no host port, healthcheck, log caps) + Caddy on a pinned 172.28.0.0/24 network so TRUSTED_PROXY_CIDRS can name the proxy hop; Caddy state in named volumes.
    • deploy/production/Caddyfile.prodapi.skybolt.ai with automatic Let's Encrypt, the shared security-headers snippet, /api/* proxy, and static landing pages.
    • deploy/production/site/verify-email.html (completes verification in-browser via POST /api/v1/auth/verify-email) and reset-password.html (copyable token + desktop-app instructions; split derivation forbids web-side password handling).
    • deploy/production/{provision,deploy,rollback,backup}.sh — idempotent droplet bootstrap (Docker, UFW 22/80/443, fail2ban, unattended-upgrades, skybolt deploy user), tag-pinned deploy with previous-tag rollback state and /api/health polling, one-command rollback (no downgrades — migrations stay backward-compatible one release), weekly pg_dump with keep-4 local pruning and optional DO Spaces upload.
    • .github/workflows/deploy.ymlapi-v* tags or confirmed manual dispatch → build apps/api/Dockerfile → push ghcr.io/txmerc/skybolt/api:<tag> → SSH deploy with pinned host key; production environment gate; GHA build cache.
    • Validation run: bash -n on all four scripts, YAML parse of compose + workflow. docker compose config is not runnable on this dev machine (no Docker) — it runs implicitly on the droplet at first deploy.sh.
  • Go-live (C2) is operator-executed; the runbook is this feature's test-plan.md.

Cloud Identity (skybolt.ai)

  • Sign-out, background refresh, and targeted session revocation (Pillar 2 milestones B2/B3/B4, ADR-0042).
    • API (B4): auth_sessions.refresh_family_id (model + migration 0018_identity_sessions — the documented follow-up column; numbered 0018 to leave 0017 free for a parallel workstream) links each bearer access session to the refresh-token family it was minted with. New GET /api/v1/auth/sessions (one entry per signed-in device: family id, created/last-used, last ip/user-agent, current flag; never token material) and POST /api/v1/auth/sessions/revoke {family_id} (revokes ONE family's refresh tokens + its access sessions only; 404 for unknown/foreign families; returns current so the client knows it revoked itself). Refresh-reuse revocation now also targets only the compromised family's sessions instead of all of a user's sessions. Rate limits reuse the existing refresh (list) and token-confirm (revoke) ceilings.
    • Engine (B2): POST /auth/cloud/logout — requires the local cookie session, best-effort cloud /logout (mints an access token from the keychain refresh token after an engine restart; offline or upstream errors never block), then clears the keychain refresh token, the in-memory access token, and deletes the cached cloud_identity row (state reads unlinked; the email is not retained). Local users/sessions and at-rest custody (account key + argon2id wrap) are untouched — offline login/unlock keep working.
    • Engine (B4): GET /auth/cloud/sessions + POST /auth/cloud/sessions/revoke passthroughs (the renderer never talks to the cloud — binding rule), with a shared _authed_cloud_call helper that retries once through a refresh-token rotation when the in-memory access token is missing/stale. Revoking this machine's own family performs the same local cleanup as logout and returns the unlinked state.
    • Engine (B3): new services/cloud_refresh.py registers cloud-refresh with the runtime_hooks seam at import time (app.py untouched). First pass ~60 s after startup, then every ~12 h: rotate the keychain refresh token, re-fetch /me, update the cached row. Every failure is swallowed to debug logs; a locked engine (at_rest.is_enabled() false with encryption on) skips the pass; a 401 clears the dead token so the renderer can prompt. refresh_once() is the synchronous, directly-testable core.
    • last_synced_at is now written as ISO-8601 UTC with timezone offset everywhere the engine writes it (the documented datetime('now') follow-up); the renderer formatter still compensates for pre-existing suffix-less rows.
    • Renderer: CloudAccountSection gained Sign out (themed confirm modal — never a native dialog) and an on-demand "Devices" list (render stays offline-safe) with per-device "Sign out device" confirm modals; the current family is labelled "This device", and revoking it is treated as a local cloud sign-out. api.ts: cloudLogout, cloudSessions, cloudRevokeSession (+ CloudSession type).
    • Tests: API +5 (test_identity_v1.py, 17 total), engine +18 (test_cloud_auth.py 35 + new test_cloud_refresh.py 7), web +6 (CloudAccountSection.test.tsx, 11 total).
  • Production mailer: SMTP2GO backend (Pillar 2 milestone B1, ADR-0043; provider chosen by the owner — the team already uses SMTP2GO).
    • The Mailer protocol is now async; routes await mailer.send(...) so a slow SMTP/API call can never block the single-worker event loop.
    • Smtp2goMailer registered in _BACKENDS: POSTs to the SMTP2GO HTTP API (/v3/email/send, X-Smtp2go-Api-Key header) via httpx.AsyncClient (10 s timeout, injectable transport for tests). Failures — including SMTP2GO's failures-inside-a-200 (data.failed > 0) — are logged with recipient + subject only, never the token-bearing body, and swallowed (a failed send must not fail signup; the rate-limited resend/reset endpoints are the retry path).
    • Config: SMTP2GO_API_KEY + MAIL_FROM settings; validation rejects console outside development/test (the documented launch blocker) and requires key+from for smtp2go. httpx promoted from dev-only to a runtime dependency; apps/api/.env.example added.
    • Tests: tests/control_plane/test_mailer.py (9 — MockTransport payload/auth assertions, API-error, failures-inside-200, and network-error swallowing with no-body-logged checks, prod config guards).
  • Proxy IP trust hardening (Pillar 2 milestone B5, ADR-0043).
    • client_ip honors X-Forwarded-For only when the socket peer is inside TRUSTED_PROXY_CIDRS (new setting, validated CIDR list, empty default = ignore XFF), and then only the rightmost entry — the hop our proxy appended. Previously any client could spoof per-IP rate-limit buckets and login audit IPs with a forged header.
    • Tests: tests/control_plane/test_client_ip.py (7 — trust matrix incl. spoof-ignored, rightmost-entry, non-IP peers, invalid-CIDR settings rejection).
  • Wired the cloud API into CI and setup (commit 9af8b4c).
    • .github/workflows/ci.yml gained the api job: ruff, pyright, and the full apps/api pytest suite against SQLite (TEST_DATABASE_URL=sqlite+aiosqlite:…) on every push/PR, on ubuntu-latest to match the deploy target.
    • Both setup scripts (scripts/setup.sh, scripts/setup.ps1) now provision apps/api/.venv by default (SKYBOLT_SETUP_OPTIONAL_API=0 opts out) — apps/api is the active identity service, no longer optional-only infrastructure.
    • Fixed the stale joke_teller roster test (the persona was deliberately removed from BUILTIN_PERSONAS; the test had drifted).
  • Phase B renderer: cloud sign-in flows and the skybolt.ai account card (commit b334097).
    • AuthScreen.tsx setup/login/restore modes now drive the cloud flows: setup → cloudSignup + one-time recovery-code screen; login → cloudLogin with silent fallback to local sign-in on 503 and a needs_recovery hand-off into restore mode (email/password kept, amber guidance shown); restore → cloudRestore. Non-blocking warnings (keychain misses) render once on a "Signed in with a warning" screen. Local unlock unchanged.
    • Account settings gained the "skybolt.ai account" card (src/app/auth/CloudAccountSection.tsx): linked state from the engine's cached getCloudState (offline-safe render), "Sync now" via cloudRefresh (503 → offline notice; 401/409 → re-sign-in notice), and the local-profile link flow via cloudMigrate.
    • API client functions in api.ts: getCloudState, cloudSignup, cloudLogin, cloudRestore, cloudMigrate, cloudRefresh.
    • Tests: AuthScreen.test.tsx + CloudAccountSection.test.tsx (11 new).
  • Phase B engine: cloud auth client + split derivation (commit 594f37f).
    • security/cloud_kdf.py — Argon2id + HKDF-SHA256 split derivation (auth_verifier sent; wrap_key local-only, repr=False); server-published params bounds-checked client-side; 64-char hex salts indistinguishable from the API's fakes.
    • security/at_rest.py — the custody password record now dispatches between legacy PBKDF2 and the argon2id cloud wrap (set_cloud_password_wrap); same account key, nothing re-encrypted, legacy wraps still unlock.
    • services/cloud_identity.py — httpx client for /api/v1/auth (CloudUnavailableError/CloudApiError); refresh token in the OS keychain (skybolt-cloud/refresh-token), access token in process memory; keychain failures degrade to a response warning, never a crash.
    • routes/cloud_auth.py + the cloud_identity single-row cache table (global schema v13) — /auth/cloud/{state,signup,login,restore,migrate,refresh}. Every cloud call degrades to 503 offline; state never touches the network; local /auth/login + /auth/unlock stay fully offline. Login with no local profile returns needs_recovery (no key escrow — the recovery code is the only cross-machine path).
    • Tests: tests/test_cloud_auth.py (24) + tests/test_at_rest.py cloud-wrap additions (3).
  • Phase A identity service v1 in apps/api (commit ad70ff4).
    • app/api/routes/identity_v1.py (mounted /api/v1/auth): signup/login/kdf/refresh/logout/verify-email/resend-verification/password-reset/me. Split-derivation verifiers — the server stores only argon2id(auth_verifier); bearer access tokens (opaque, SHA-256-hashed, 1 h) + rotating refresh-token families (60 days) with atomic claim and reuse-revocation (family + all access sessions); timing-equalized unknown-email login and deterministic fake KDF salts; password reset revokes all credentials and does not decrypt data.
    • app/schemas/identity_v1.py (contract of record for the engine client), app/core/mailer.py (console backend behind a Mailer protocol; Resend/Postmark slot in via _BACKENDS — real backend is a launch blocker), app/core/ratelimit.py (in-process sliding window, per-IP/per-email on every endpoint).
    • Migration alembic/versions/0016_identity_v1.py: users.email_verified/kdf_salt/ kdf_params, refresh_tokens, email_tokens, entitlements; merges the two 0015 heads.
    • Tests: tests/control_plane/test_identity_v1.py (12).
  • ADR-0042 (skybolt.ai cloud identity with zero-knowledge encryption) accepted, alongside ADR-0041 (sealed project artifact) (commit cd32231). Decided the split-derivation password model, engine-as-only-cloud-client, offline-tolerance guarantees, the accepted enumeration oracles, and that the recovery code remains the only cross-machine decryption path.

Cloud Metadata Sync —

  • New /api/v1/sync router (apps/api/app/api/routes/sync_v1.py): POST /push (batched allowlisted changes, server-assigned monotonic cursor, rollup upserts) and GET /pull (cursor/limit paging with has_more), Bearer-authenticated via the identity v1 session helper.
  • Strict allowlist schemas (apps/api/app/schemas/sync_v1.py, extra="forbid", discriminated union on entity_kind): project_meta, card, board_column, agent_session_meta. Unknown fields/kinds and forbidden classes => 422.
  • New tables synced_projects + sync_changes (apps/api/app/db/models/sync.py, migration 0017_metadata_sync — chained after the concurrently created 0018_identity_sessions; single head verified).
  • Per-plan quotas from Entitlement (free: 3 projects, 10k changes/24h; premium: 100 / 200k) and per-user rate limits, all configurable in apps/api/app/config.py.
  • Tests: apps/api/tests/control_plane/test_sync_v1.py (11 tests: round-trip, cursor monotonicity, allowlist negatives, isolation, quotas, idempotency, auth).

Desktop App

  • Added engine-crash recovery: the shell watches the sidecar process and emits skybolt://engine-exited on unexpected exits; the renderer's new EngineRecovery surface (event listener plus /health ping backstop) offers Restart Engine, which respawns the sidecar with a fresh port/token and reconnects the renderer cleanly.
  • Mirrored engine sidecar stdout/stderr to <data-dir>/logs/skybolt-engine.log (rotated at 5 MB) and surfaced the path in the recovery UI; the logs/ folder moves with the storage location.
  • Wired tauri-plugin-updater with manual-only checks (check_for_update/install_update commands and a Help -> Check for Updates... menu) and the documented pre-exit contract: the engine is stopped before any binary swap, and respawned if the install fails. Signing keys are not generated yet, so checks report "unconfigured" until an operator runs tauri signer generate and fills plugins.updater.pubkey (see technical-design.md).
  • Packaging pass: bundle now uses the standard src-tauri/icons set (including icon.ico for Windows), version 0.1.0, publisher/category/description metadata, and explicit NSIS currentUser install mode.

Encrypted Project Sync

  • A3 — portable-secret vault + per-secret opt-in. Secrets now travel with sealed projects, per-secret and opt-in (ADR-0041 tier 2). Vault JSON shape + opt-in contract documented in technical-design.md.
    • Engine: global portable_secrets(ref TEXT PRIMARY KEY, opted_in_at TEXT) table (schema v14 → v15, additive; helpers _portable_secret_refs/_set_secret_portable in store/project_store.py). _build_portable_vault (services/sync_ops.py) seals only refs that are project-referenced AND opted in; machine-scoped refs (machines.credential_ref) are excluded UNCONDITIONALLY; unresolvable opted-in refs are reported skipped. include_secrets on a sealed push now means "seal WITH the vault" (the A2-era 422s in _push_sealed_local/_sync_push_ssh_sealed are gone); the push response's secrets carries {included, skipped}. _install_vault_secrets restores the vault on every unseal flow — sealed sync/unlock, resolve take_remote, sealed SSH import — installing values into the keychain (env refs → manual) and re-marking each ref portable locally so it keeps travelling. New project-admin routes: GET /projects/{id}/secrets (per-ref portability listing; values never returned) and POST /projects/{id}/secrets/portable (toggle; 422 for machine-scoped/unparseable refs). The legacy sealed_artifact._unwrap_data_key reference in tests moved to the public unwrap_data_key (landed with the wave-B checkpoint).
    • Renderer: SyncResumePanel.tsx gains a sealed-only "Portable Secrets" section — on-demand listing (loading/error/empty states), per-ref Make portable / Stop syncing, machine-scoped rows explained and untoggleable, opt-in behind a themed confirm stating the value "will be stored encrypted in the repo". The hidden include-secrets checkbox is replaced by "Include portable secrets (sealed vault)" with inline consequence copy; push results render the vault summary. api.ts: ProjectSecretEntry, listProjectSecrets, setProjectSecretPortable, PortableSecretsInstallSummary, widened push/unlock/resolve result types.
  • A4 — one-click git-crypt → sealed conversion. POST /projects/{id}/sync/convert (project-admin; local + SSH; idempotent — already-sealed → 200 no-op; 423 when the account key is locked). Verifies the working tree is unlocked before the per-project DB is opened (auth-first on the global DB; a \x00GITCRYPT project.db → clean 422 instead of a raw database error). Steps: import secrets.enc into the keychain + mark refs portable (_install_project_secrets now reports refs); strip Skybolt's git-crypt rules from .gitattributes (git_sync.strip_git_crypt_attribute_lines/strip_git_crypt_attributes — user rules preserved, file removed when empty); write the sealed gitignore; delete + untrack .skybolt/keys/account.enc, .skybolt/secrets/, and the plaintext project.db; seal at max(existing, 0) + 1; single commit. SSH variant _sync_convert_ssh_sealed (services/sync_ops.py) fetches secrets/.gitattributes over SSH, retires remote artifacts with --ignore-unmatch, uploads only ciphertext. Renderer: "Convert To Sealed Sync" banner on every git-crypt project + themed confirm modal; success shows generation/secrets and flips the panel to the sealed surface.
  • A5 — git-crypt retired from setup. scripts/setup.ps1 Ensure-GitCrypt and scripts/setup.sh ensure_git_crypt are detection-only (no Scoop bootstrap, no package installs); an installed git-crypt is reported as legacy-only. sync/remote-git-crypt-install
    • panel install guidance demoted to legacy paths that recommend conversion. documents/setup.md rewritten: sealed is the sync format, git-crypt is legacy-read-only, conversion is the migration.
  • Also: /projects/import/ssh now authenticates + checks account access before any SSH fetch or local write (was after the fetch); test_ssh_import_rejects_option_injection_host updated to authenticate first.
  • Tests (tests/test_sealed_sync.py, 40): include-secrets vault push (committed blob's vault = opted-in refs only, response summary, no plaintext anywhere) + opt-in-without-include shipping no vault; secrets listing/toggle route coverage (machine-scoped 422, values never in responses); vault build rules unit (forced machine-scoped row still excluded; skipped refs); cross-machine vault round trip (unlock installs + re-marks portable); local conversion end-to-end (repo retirement, untracking, sealed status, idempotent re-run, user .gitattributes rules preserved) + locked-tree 422 / missing-key 423 / nothing-to-convert 422 guards; SSH conversion with recorded primitives (ciphertext-only upload, remote retirement commands, secrets installed + marked portable).
  • Sealed SSH import (closes the milestone-A2 deferral): POST /projects/import/ssh now resumes a sealed-sync remote whose fresh checkout holds no plaintext project.db. When the project.db fetch fails — or the fetched bytes are the SBSEALv1 container itself — the route fetches <remote>/.skybolt/project.sealed, unseals it locally with the account key into the project's engine-local .skybolt dir, then proceeds with the unchanged registration/rehome flow and mirrors the artifact header's generation into project_registry.sync_generation.
    • Engine, routes/projects.py: new _import_ssh_sealed_fallback (fetch blob → keyless read_header for the project id → _unseal_project → identity/header consistency check). 423 when the account key is locked (shared _ACCOUNT_KEY_LOCKED_DETAIL from routes/sync.py); 422 for a missing/unopenable artifact or a foreign account key; temp fetch files always cleaned up. git-crypt remotes (readable project.db) take the existing path unchanged.
    • Tests, tests/test_sealed_sync.py (+5, _ssh_fetch faked from a local "remote" dir per the existing patterns): blob-only remote imports end-to-end (cards readable, travelled sync meta intact, registry mirror = header generation, installed DB is real SQLite); container at the project.db path falls back via the magic sniff; locked key → 423 with nothing registered; foreign key → 422; neither file readable → one clean 422. The account_key fixture now also patches routes/projects.
    • Known gap (also noted in overview follow-ups): re-importing a stale sealed remote over newer already-registered local state is not generation-guarded the way sync/unlock is — matches the plaintext import paths; candidate follow-up.
  • Sealed-artifact sync integration (ADR-0041 milestone A2): .skybolt/project.sealed is now the sync format for new projects; git-crypt stays as the legacy path (explicit format:"git-crypt" on init, or auto-detected on an already-initialized git-crypt repo).
    • Engine, services/sync_ops.py: async _seal_project (snapshot refresh → generation bump → meta persistence → registry mirror → quiesce → WAL checkpoint → seal → write blob) and _unseal_project (verify/decrypt → quiesced DB replace). Generation + the account-key-wrapped data key persist in per-project __project_meta (sync_generation / sync_format / sealed_data_key_wrap — raw key never persisted) and travel inside the sealed payload; the generation is mirrored to project_registry.sync_generation (global schema v13 → v14, additive; helpers in store/project_store.py). _checkpoint_project_db now opens via connect_project (plaintext driver) so it works with at-rest encryption active. Both seal and unseal wrap the file-touching section in runtime_hooks.quiesced_project (ADR-0038 H1/H2 seam; no-op until the P1 scheduler registers). SSH: _sync_init_ssh_sealed / _sync_push_ssh_sealed seal locally and upload only ciphertext (_ssh_upload_file) — no git-crypt on the remote; _ssh_read_remote_sealed_header reads the remote blob header keylessly (first 64 KiB, base64). inspect_project_db reports a blob-only checkout as locked with the keyless header project id.
    • Engine, routes/sync.py: sync/status adds format (sealed|git-crypt|none), local_generation, remote_generation, conflict (remote < local, per ADR-0041 anti-rollback). sync/init defaults new projects to sealed (423 when the account key is locked; response carries format/generation, key_path:null). sync/push for sealed projects seals → stages only .skybolt/.gitignore + project.sealed → commits/pushes (private-repo gate and transient ssh_passphrase as before); 409 + conflict:true when the working-tree blob's generation diverges; include_secrets → 422 (vault is A3). sync/unlock sniffs the sealed magic before any git-crypt requirement and unseals in place with the account key (no key file; 409 on rollback attempts; 423 locked). New POST /projects/{id}/sync/resolve {resolution: keep_local|take_remote, message?, push?, confirm_private_repo?, ssh_passphrase?} — keep_local reseals at max(local, remote)+1 and commits/pushes; take_remote unseals the repo blob over the local DB (route connection closed first). Request models SyncInitFormatRequest/SyncResolveRequest live in routes/sync.py for now (schemas.py owned by a parallel workstream — fold in later).
    • Engine, git_sync.py: sealed .skybolt/.gitignore writer (project.db + WAL/SHM ignored, !project.sealed re-included, NO filter lines) + sealed_artifact_path helper.
    • Engine, services/git_ops.py: new stage guard _guard_staged_paths wired into _git_stage_or_unstage and both sync push paths — refuses the plaintext .skybolt/project.db (unless an active git-crypt clean filter covers it), always refuses WAL/SHM, and refuses staged text content matching security/redaction.SECRET_VALUE_PATTERNS (bounded directory expansion; .skybolt/secrets/** and project.sealed exempt; refusals name the path, never the content).
    • Renderer: SyncResumePanel.tsx is format-aware — Sealed/git-crypt badge, generation counters, sealed init copy (recovery-code confirm; no git-crypt availability gate), conflict resolution UI (keep-local / take-remote behind a themed Modal confirmation), an "apply newer remote state" offer when the repo is ahead, include-secrets hidden for sealed; all legacy git-crypt UI (install guidance, key-file unlock, secrets export) preserved for legacy projects. api.ts: ProjectSyncFormat, generation/conflict fields on ProjectSyncStatus, key_path: string | null + generation on ProjectSyncInitResult, resolveProjectSync + ProjectSyncResolveResult, optional format on initProjectSync.
    • Tests: new apps/engine/tests/test_sealed_sync.py (20 — init/push/round-trip on real tmp repos with real crypto, conflict + both resolutions, detect, stage guard, SSH sealed with mocked primitives, persistence); test_git_sync.py sealed gitignore tests; legacy test_sync.py init calls now pass format:"git-crypt" (paths otherwise unchanged, still green including the real-git-crypt E2E); renderer SyncResumePanel.test.tsx rewritten format-aware (22 tests).
    • Deferred to follow-ups: sealed SSH import (/projects/import/ssh fetching the blob), A3 vault + per-secret opt-in, A4 conversion flow, A5 setup-script cleanup.
  • Sealed artifact primitives (ADR-0041 milestone A1): new skybolt_engine/security/sealed_artifact.py implements the SBSEALv1 container — AES-256-GCM payload keyed by a persistent per-project data key, the data key wrapped by the account key in the header (key_id-tagged for rotation/teammate wraps), AAD = the exact header bytes (binds project id + generation + version; the anti-rollback/anti-swap guarantee), keyless read_header() for locked-machine status/conflict checks, strict unknown-version rejection, tolerated unknown wrap types, and length-prefixed project.db + vault payload with a manifest hash double-check. Byte spec in technical-design.md (new). Tests: tests/test_sealed_artifact.py (13 — round-trips, header/ciphertext tamper, wrong-key, version-reject, truncation, non-deterministic ciphertext). Sync integration (seal/unseal in sync_ops, generation persistence, conflict UI, stage guard) is milestone A2 — git-crypt remains the active sync path until then.
  • Extended in-app git-crypt unlock to SSH-machine projects (was local-only). POST /projects/{id}/sync/unlock now branches on the project's execution-target type — same endpoint and same unlockProjectSync client call.
    • Engine: SSH branch in project_sync_unlock (routes/sync.py) delegates to _ssh_remote_git_crypt_unlock (services/ssh_ops.py), which streams the user's local git-crypt key over the SSH-encrypted channel into a unique remote mktemp file (mode 0600), runs git-crypt unlock in the remote repo, and always deletes the temp key afterward (in a finally, even on failure). git-crypt is auto-detected on the remote with a best-effort install and manual guidance fallback; an SSH target with no resolvable remote repo_path is rejected. Key path/contents are never logged or returned. Supersedes the prior "SSH targets rejected — unlock is local-only for now" behavior.
    • Renderer: added a native Browse button to both unlock UIs — the "Unlock Encrypted Data" section in the Sync & Resume panel (SyncResumePanel.tsx) and the "Unlock Encrypted Project" connect-existing modal (SkyboltApp.tsx) — to pick the key file from the local machine via a new desktop select_file Tauri command + pickLocalFile helper. The key path stays transient; Browse shows only in the desktop runtime.
    • Tests: apps/engine/tests/test_sync.py test_sync_unlock_ssh_* (remote success + cleanup, wrong-key failure + cleanup, git-crypt missing → manual guidance, missing local key, no remote path configured).
  • Made the local git-crypt install guidance package-manager-aware and copy-pasteable. _git_crypt_install_command (services/git_ops.py) now detects the local package manager via shutil.which (Windows: Scoop → winget → Chocolatey; macOS: Homebrew; Linux: apt/dnf/yum/pacman/ zypper) so the suggested command works on the user's actual setup. GET /projects/{id}/sync/status now returns git_crypt_install_command, and the Sync & Resume panel renders it as a copyable command with a Copy button when git-crypt is missing locally. For SSH projects the panel notes the engine auto-installs git-crypt on the remote during unlock/sync, so nothing needs installing locally. Note: git-crypt unlock is run once per checkout per machine — the key is stored in .git/git-crypt/ and the checkout stays unlocked (Skybolt never re-locks); re-unlock only on a fresh clone or a new machine.

Git Operation Surface —

  • Added engine helpers for merge (--no-ff default) / merge abort, rebase / rebase abort, cherry-pick / cherry-pick abort, stash save/pop/list, tag create/delete/list, branch delete/rename, fetch, pull (--ff-only default), force push (--force-with-lease, its own action), bounded log, bounded blame attribution, commit amend, and conflict listing (services/git_ops.py), all in the (status, metadata) shape over strict argv runners.
  • Failed merge/rebase/cherry-pick/stash-pop/pull now return structured conflict metadata: state, bounded conflicts[], conflict_count, and a resolution hint naming the matching abort action.
  • Extended local dispatch with a shared per-action handler map and lifted the SSH mutating-Git block: the same handlers run over _run_ssh_git (services/git_exec.py). GitHub gh actions over SSH remain pending (M7).
  • New gates in constants.py: HIGH_AUTHORITY_GIT_ACTIONS now also covers merge, rebase, force_push, branch_delete, tag_delete, cherry_pick, and commit_amend (any history rewrite is high authority in v1); PROTECTED_BRANCH_WRITE_ACTIONS + DEFAULT_PROTECTED_BRANCH_PATTERNS (main, master) force approval for ANY write touching a protected branch, with per-project fnmatch patterns from git_settings_json.protected_branches.
  • routes/git.py resolves both gates before insert, stamps protected_branch into the approval metadata, and extends the approval titles/bodies per action; unknown actions stay 422.
  • New tests/test_git_surface.py (18 tests): every new action on tmp repos, conflict structures, approve-executes / reject-discards, protected-branch defaults + custom globs, and SSH routing asserted on exact argv.
  • Docs: this feature folder created; supersedes the execution-engine doc statement that mutating SSH Git actions stay blocked.

In-App IDE

  • Fixed: the workspace API now works for SSH-machine projects, not only Local machines. Previously every /projects/{id}/workspace/* endpoint ran local filesystem operations against project_path, which for an SSH target is a remote path — the tree came back empty and reads/writes failed. Each endpoint now branches on target["kind"] ("local" vs "ssh"); the local code paths are unchanged.
  • For SSH targets the engine drives portable remote shell commands over the existing SSH helpers (_run_ssh_shell, _run_ssh_git, _ssh_fetch, _ssh_upload_file): ls -Ap for the tree, wc -c + cat (fetch to a tempfile) for reads/raw, upload-from-tempfile for writes/create, test/mkdir/mv/rm/rmdir for create/rename/delete, git status --porcelain for git-status, and remote rggit grepgrep -rn for content search. Remote .gitignore is fetched, appended idempotently, and uploaded back.
  • Remote path safety: with no local FS to resolve against, SSH paths are sanitized with the existing _safe_relative_path + .git/.skybolt blocklist and every remote path is shlex.quoted before it reaches a shell. The remote search keeps the same ReDoS rule as local — a user regex only runs under a timeout-bounded engine (remote rg/git grep); otherwise it is refused with 422.
  • Added focused SSH tests in apps/engine/tests/test_app.py (fake-SSH against a temp "remote" dir, mirroring test_sync.py): tree listing + protected-dir exclusion, read content, raw bytes, save via upload, create/rename/delete round-trip, traversal/.git write rejection, idempotent gitignore, git-status (repo + non-repo), and git-grep search + regex refusal.
  • Fixed: .env and other dotfiles/extensionless files (.gitignore, Dockerfile, etc.) were misreported as binary and could not be opened. Path(".env").suffix is "", so the suffix-only text check never matched. _is_text_like now also matches known text filenames and .env/.env.*. More importantly, the read path no longer refuses on an unknown extension: it sniffs the actual bytes and opens anything without a NUL byte as text (_looks_binary), so unknown-but-textual files still open in the editor. Only genuinely binary content (or images) is withheld. Applies to both local and SSH reads; covered by test_workspace_reads_dotfiles_and_unknown_text_as_text.
  • Performance: added a cached bulk file index (GET /projects/{id}/workspace/index) so the IDE loads the whole tree in ONE round-trip instead of one tree call per folder — the big win over SSH, where every call is a fresh ssh connection. The engine builds the flat listing with os.walk (local) or a single portable find (SSH, both -type passes in one connection), excludes .git/.skybolt/ soft-excludes, caps at max_entries (truncated → client falls back to lazy per-folder loading), and caches it for 30s keyed by (project, target), invalidated on create/rename/delete. The frontend builds the tree client-side from the index (folders expand with no fetch) and serves filename search instantly from the in-memory index; content search stays a live scan. SSH connection reuse via ControlMaster was not used (unsupported by the Windows OpenSSH client) — the one-shot index sidesteps it by collapsing N folder round-trips into one.
  • UX: opening a file now shows immediate feedback. The tab appears instantly in a loading state with a spinner (and the editor area shows a spinner) while the content is fetched, instead of a silent lag with no tab; the placeholder fills in when content arrives, or is removed if the read fails. This is most noticeable over SSH where the read is a remote round-trip.

Machines

  • Retired the runner-era request/response contract in favor of execution_target_id. The engine no longer accepts runner_id in any request schema (terminals, chat threads/runs, browser sessions, source materials, agent sessions, command profiles, Git operations) and no longer writes or echoes runner_id in metadata or API responses; the renderer stopped sending and reading it. Existing DB columns are untouched (additive-only schema). Removed the unused runner enrollment/pairing endpoints. GET /accounts/{id}/runners and POST /accounts/{id}/runners/{id}/revoke remain as documented legacy for the renderer's machine status and delete-machine flows.

Project Import —

  • Added services/repo_scan.py: bounded, read-only machine-side repo scan (languages, frameworks, package managers, inferred test/lint/typecheck/build commands, Docker/Compose files, docs presence, git metadata) for Local and SSH targets, reusing the workspace index and SSH helpers.
  • Added services/readiness.py: 22-item readiness audit (present/partial/missing, fixable flags) plus the static project-name-interpolated scaffold template registry (AGENTS.md, /documents skeleton, setup/dev scripts, .env.example).
  • Added routes/readiness.py: POST /projects/{id}/scan, GET /projects/{id}/readiness, POST /projects/{id}/readiness/fixes/preview (no writes, unified diffs), POST /projects/{id}/readiness/fixes/apply (explicit files only, per-file overwrite gate, run-capable role gate, workspace path-safety, approval_requests audit row per apply). Registered in routes/__init__.py.
  • Scan responses carry suggested_command_profiles shaped for the existing command-profiles create endpoint; seeding is the wizard's job.
  • Tests: apps/engine/tests/test_readiness.py (12 tests — detection fixtures covering all six manifest ecosystems, audit fixtures, preview/apply safety, role gating, SSH branch with a fake remote).

Public Site

  • Created the public site (Pillar 3 milestone P-M9, ADR-0049): apps/site, a new pnpm workspace package @skybolt/site — Astro 5.x, TypeScript (astro check via @astrojs/check), static output only (output: 'static'), fonts self-hosted via @fontsource/space-grotesk + @fontsource/inter (no Google Fonts CDN).
  • Pages v1: landing (static SVG/CSS interpretation of the reference three.js Atmosphere → Void hero — build-time seeded starfield + accent agent constellation; island upgrade path documented), /features (six cards sourced from the feature overviews), /changelog (generated at build time from documents/features/*/changelog.md by src/lib/changelog.ts — drift-tolerant ## YYYY-MM-DD parser, grouped by date then feature, newest first), /pricing (Free vs Premium placeholder, "coming soon", no billing), and a waitlist section (stub submit handler; documented TODO pointing at the future apps/api waitlist endpoint — nothing is transmitted).
  • Visual identity per documents/design-system.md: void base, #0077e4 accent, 2px radii, glass panels, tracked uppercase micro-labels, dense technical layout; responsive and prefers-reduced-motion-gated animation.
  • Workspace/CI wiring: added apps/site to pnpm-workspace.yaml (and approved sharp builds for Astro), added root site:build script, appended the site CI job (astro check + pnpm --filter @skybolt/site build) to .github/workflows/ci.yml.
  • Follow-ups: Caddy serving block for skybolt.ai on the ADR-0043 droplet (documents/features/cloud-deployment/), real waitlist endpoint in apps/api + form wiring, optional three.js hero island, documents/features/index.md entry (deferred — concurrent-wave file ownership).

SSH Machines

  • Hardened SSH endpoint handling: host/user values are now validated where the connection spec is built (SshMachineSpec.__post_init__ in apps/engine/skybolt_engine/adapters/ssh.py), so every ssh_ops path (probe, import, shell, git, file transfer, remote git-crypt unlock) automatically rejects a host/user that ssh could misread as an option flag (e.g. -oProxyCommand=...) instead of only the probe adapter. /projects/import/ssh now returns a clean 422 for such values rather than a 500. No shell-injection was possible (argv lists, no shell=True); this closes the lower-severity flag-injection gap flagged by the remote git-crypt unlock security review. Tests: apps/engine/tests/test_adapters.py and apps/engine/tests/test_sync.py.
  • Added a Test connection button to the saved SSH Machine card (ProjectEnvironmentPanel in apps/web/src/app/SkyboltApp.tsx) that re-runs the live GET /projects/{id}/machine/probe on demand, with visible probe status (Checking / Online / Offline) and error text. Previously the renderer only probed on entering the Environment tab or on a settings change, so a Machine that was offline at setup stayed visually offline after the user fixed SSH and had to be deleted and re-added. Frontend-only; the backend probe route is unchanged. Test: apps/web/src/app/SkyboltApp.test.tsx "re-probes the saved ssh machine when Test connection is clicked".

Agents Execution

  • Designed the multi-agent orchestration and conflict-prevention architecture and recorded it as ADR-0036 and the agents-execution feature docs (overview.md, technical-design.md, storage-and-portability.md, test-plan.md). No code changed this round.
  • Chose SQLite (WAL + busy timeout) + an in-process asyncio event bus + the existing renderer WebSocket as the coordination substrate, and rejected the earlier Redis proposal as incompatible with local-first/offline-after-install and the "engine owns runtime / no external services" rules.
  • Defined the executor (an in-engine asyncio scheduler) as the missing piece that advances queued agent sessions: claim → prepare worktree → launch CLI seat with a prompt package → heartbeat → complete → advance card → unblock dependents, with TTL-based lease expiry and restart/cross-machine recovery.
  • Defined the four-tier conflict-prevention model: per-card worktree isolation, scope-disjoint scheduling over a dependency graph, an advisory overlap radar (escalate, do not deadlock), and an interface-first Shared Artifact Registry — the primary defense against duplicate helpers/types.
  • Set concurrency as N-agnostic with a default of 3-5 and admission control over global/machine/ provider/disk capacity; documented 30 concurrent agents on one host as unrealistic.
  • Recorded the per-project storage and cross-machine portability design as ADR-0037 and storage-and-portability.md: a global engine database plus a per-project .skybolt/project.db, split into machine-agnostic (portable) and machine-keyed (reconciled-on-open) rows, designed to be encrypted before being committed.
  • Updated implementation-plan.md to replace the Redis-era assumptions with the SQLite direction, the per-project storage split, and the phased build (P0-P4).

Desktop App

  • Made .\scripts\setup.ps1 install missing Windows prerequisites by default, with -NoInstallPrereqs available for check-only environments.
  • Made .\scripts\dev.ps1 the documented Windows desktop dev command, with -Desktop retained only as a compatibility alias.

Encrypted Project Sync

  • Added read-only project detection so connecting an existing folder auto-offers to import an existing Skybolt project instead of forcing a re-set-up on a second computer.
    • Engine: POST /projects/detect (routes/projects.py detect_project), request model ProjectDetectRequest (schemas.py), and filesystem helper inspect_project_db (services/sync_ops.py). Session-gated like POST /projects/import, no account_id; HTTP 200 for all detection outcomes; reports none/locked/readable/unreadable from the project.db header plus already_imported against this machine's project_registry. Never registers or mutates anything.
    • Renderer: completeProjectGitSetup (apps/web/src/app/SkyboltApp.tsx) calls detectProject before the bare existing-repo connect. readable + not imported → "Existing Skybolt Project Found" modal (Import vs Connect Without Importing); readable + already imported → switch to it; locked → show the git-crypt unlock hint and stop; none/unreadable (or a failed detect) → proceed with the normal connect. Client wrapper detectProject + ProjectDetectResult in apps/web/src/app/api.ts. Clone / new-repo providers unchanged.
    • Tests: apps/engine/tests/test_sync.py test_detect_* (4 tests).
  • Added in-app git-crypt unlock so an encrypted checkout can be decrypted without a terminal.
    • Engine: POST /projects/{id}/sync/unlock (routes/sync.py project_sync_unlock), request model SyncUnlockRequest (schemas.py). Session-gated, project_admin only. Request { key_path, repo_path? }; repo_path optional (resolved from the project's default execution target when omitted; SSH targets with no explicit repo_path were rejected at the time — unlock was local-only, since extended to remote SSH unlock on 2026-06-09 above). Runs git-crypt unlock <key_path> in the repo (argument list, no shell; key contents never logged). Response { unlocked, repo_path, detail }.
    • Renderer: unlockProjectSync (apps/web/src/app/api.ts) wired into (a) the connect-existing detect flow — a locked detection now opens an "Unlock Encrypted Project" modal (SkyboltApp.tsx): enter key file path → unlock → auto re-detect → import prompt; and (b) the "Sync & Resume" panel as an "Unlock Encrypted Data" section (apps/web/src/app/project/tabs/SyncResumePanel.tsx), shown when in_repo and git_crypt_initialized.
    • Tests: apps/engine/tests/test_sync.py test_sync_unlock_* (happy path, explicit repo_path, missing key file, requires git repository); renderer tests in SkyboltApp.test.tsx and SyncResumePanel.test.tsx.
  • Made project delete non-destructive for repo-hosted databases.
    • Engine: DELETE /projects/{id} (routes/projects.py delete_project) now removes the engine-local project.db only for non-repo-hosted projects. When .skybolt lives inside a git repo (detected via <.skybolt parent>/.git), it de-registers the project but leaves the git-tracked .skybolt/project.db on disk so the project can be re-imported via the connect-existing detect flow. Docstring note added to _remove_project_db (store/project_store.py).
    • Tests: apps/engine/tests/test_sync.py test_delete_preserves_repo_hosted_db, test_delete_removes_non_repo_hosted_db.

Execution Engine

  • Fixed terminal deletion so subprocess-backed terminals are spawned in their own process group/session and Kill/Kill All terminates the group before deleting the terminal row. This prevents web servers launched from a terminal shell from surviving after Skybolt clears the panel.
  • Changed renderer terminal capacity handling to ignore legacy machine-local runner capacity and show/enforce only project terminal capacity.
  • Disabled config-defined SSH port forwarding on normal Execution Engine SSH terminal, probe, command, and Git calls so connecting to a saved SSH config entry does not automatically bind local preview ports.
  • Added a project-scoped Machine probe route so the renderer can display SSH Machine connectivity status without opening a terminal.
  • Fixed desktop-engine terminal creation for SSH Execution Targets. SSH terminals now spawn a structured OpenSSH argv behind the existing terminal WebSocket bridge and use the remote project path as the SSH startup directory command instead of passing it through local filesystem validation.
  • Fixed Dev Ops setup so the saved Project Execution Target appears as the selectable Machine, including SSH Machines that do not have a legacy runner row.
  • Added Connect Existing GitHub Repo and Connect Existing Azure DevOps Repo setup paths that save the selected Machine project path and provider without cloning, fetching, checking out, or changing files.
  • Made existing-checkout setup use a single repo status readiness probe and ignore stale clone failures so it does not briefly fail and then swap to ready after background metadata catches up.
  • Kept SSH existing-checkout readiness lightweight by skipping optional origin/HEAD and remote enumeration during the initial status probe, while preserving the selected Git provider from saved setup metadata.
  • Added read-only SSH Git metadata probes for Dev Ops status, branches, commit history, compare, and GitHub auth checks through quoted OpenSSH commands.
  • Raised SSH Git read command timeout to two minutes so slower SSH connections do not fail basic existing-repo metadata probes such as git remote -v.
  • Detached stdin for SSH Git subprocesses so OpenSSH cannot inherit terminal input while the engine is collecting read-only metadata.

In-App IDE

  • Added an in-app IDE in the Files tab: browse, open, edit, create, rename, and delete files and folders across the project's git working tree (project_path), with an open-files tab bar, dirty tracking, and an unsaved-changes guard. Replaces the old assets/-only Files page scope.
  • Chose Monaco (the VS Code editor core) as the editor, with self-hosted offline workers: the editor and language workers are bundled by Vite and served from the app — nothing from a CDN — so the IDE works fully offline and behaves identically across Linux/macOS/Windows in the desktop webview. Monaco was chosen over a lighter editor for room to expand (multi-language intelligence, diffs) without re-platforming.
  • Scoped the IDE to the whole repo working tree, with .git/ and .skybolt/ hidden from the tree and write-protected on every read/write/rename/delete/gitignore path. Source stays local; no new sync or exfiltration (ADR-0034/ADR-0035 privacy posture preserved).
  • Added git-status badges (from git status --porcelain; non-repo trees report no statuses rather than erroring) and a tiered filename + content search (ripgrep → git grep → a bounded os.walk floor that always works, no external indexer).
  • Added an auth-aware image viewer (Blob object-URL with the bearer credential, revoked on unmount) and a right-click "Add to .gitignore" that appends idempotently as a plain file edit, not a high-authority Git action.
  • Added the backend workspace API (/projects/{id}/workspace/*: tree, git-status, file, raw, entries[create], rename, entries[delete], search, gitignore) in apps/engine/skybolt_engine/app.py, with reads gated to project membership and writes gated to run-capable roles (_project_can_run). Path safety reuses _safe_relative_path + _path_inside plus a .git/.skybolt blocklist.
  • Added the frontend apps/web/src/app/project/ide/ module (CodeWorkspace, useCodeWorkspace, file-tree, context menu, open-files bar, editor pane, lazy Monaco editor, image viewer, search panel, new-entry dialog, fileLanguage, monacoSetup), wired through FilesTab/SkyboltApp, with new workspace API client functions/types in api.ts and new icons. Raised the Vite workbox maximumFileSizeToCacheInBytes to 8 MB so service-worker precache generation tolerates Monaco's code-split local worker chunks.
  • Setup scripts (scripts/setup.ps1, scripts/setup.sh) now ensure git (required) and install ripgrep (optional, for fast/regex IDE search) so search uses the ripgrep engine on a freshly set-up machine; the engine resolves them off the OS PATH via cli_resolution.resolve_local_cli, falling back to git-grep/os.walk when rg is absent. documents/setup.md updated to match.
  • Recorded the decision as ADR-0039 (in-app IDE workspace file access).

Machines

  • Fixed Machine and Dev Ops settings appearing unconfigured after every app restart. The project listing (/auth/me, /projects) read display-only rows from the global project_registry, which no longer carries per-project settings after the ADR-0037 storage split, so isolation_policy, git_settings, and the project paths came back null on launch and the app prompted re-setup. The saved values were never lost — they live in the per-project project.db — so listings now overlay the authoritative settings row from each per-project database (_list_user_projects).
  • Fixed Dev Ops Machine labels so a saved SSH Project Execution Target without a legacy runner row appears as available instead of stale.
  • Removed the user-facing legacy machine-local terminal capacity from the renderer. Terminal badges, Machine rows, overview readiness, and new-terminal gating now use only project terminal capacity when the Execution Engine reports it.

SSH Machines

  • Disabled SSH config-defined port forwarding for normal SSH terminal, probe, command, and Git operations so saved config entries do not accidentally bind preview ports such as 127.0.0.1:8000.
  • Added a project Machine probe endpoint and wired the Environment tab SSH tag to show green when the saved SSH Machine can connect and red when the probe fails.
  • Fixed SSH Machine terminal creation so the Execution Engine launches an OpenSSH PTY with the saved SSH config entry or host/user/port and starts in the remote project path instead of validating the remote path as a local folder.

Browser Redesign

  • Removed the sessions rail New button and made Kill All full width.
  • Moved New and Kill All into the sessions rail and removed the visible top Browser toolbar.
  • Removed the visible target/save-target controls from the desktop Browser UI, added inline session rename, changed session close to an X, and removed the success status-message copy.
  • Removed internal session status and browser-kind tags from the desktop Browser UI.
  • Replaced the desktop Browser console panel with a left-side vertical sessions rail and DevTools inspection.
  • Routed embedded Browser new-window link requests back into the Skybolt Browser pane instead of Tauri shell.open.
  • Added a DevTools action for the embedded Tauri Browser webview.
  • Switched the visible desktop Browser pane from renderer iframe embedding to a native Tauri child webview so external sites with frame restrictions can load inside Skybolt.
  • Added Execution Engine browser target/session endpoints for local Chromium launch.
  • Updated the Browser tab to use desktop Chromium sessions with a local console panel.
  • Created Browser Redesign feature docs.
  • Documented local browser automation and SSH-forwarded URLs as the active direction.
  • Demoted runner WebRTC streaming to fallback.

Desktop App

  • Fixed desktop sidecar preparation so it creates Tauri's expected externalBin before Tauri's Cargo build validates bundled resources.
  • Added the standard Tauri desktop icon set generated from the existing Skybolt web app icon so Windows resource generation can complete.
  • Aligned the Tauri desktop dev URL with Vite's localhost bind so Tauri stops polling 127.0.0.1:50000 while Vite is listening on localhost:50000.
  • Set the desktop Cargo default binary and removed the extra literal -- from the Vite dev command so Tauri launches skybolt-desktop and Vite honors --strictPort.
  • Moved Skybolt-owned local dev ports to 50000 and above so lower ports remain available for SSH development sites and forwarded project previews.
  • Fixed the desktop sidecar reload flag so the Rust sidecar accepts --reload and forwards it to the Python Execution Engine.
  • Routed the desktop runtime directly into the app shell so launch goes to login, first-run setup, or the dashboard based on the normal auth session check instead of showing the marketing site.
  • Wired desktop renderer API calls to the Tauri-managed engine session and added local engine auth/session endpoints so desktop development does not require the optional cloud API on localhost:50002.
  • Added local engine project CRUD, health, runner listing, and empty runtime collection endpoints so creating a project in the desktop app no longer returns {"detail":"Not Found"}.
  • Added Windows PowerShell setup/dev scripts for the desktop flow and documented .\scripts\setup.ps1 plus .\scripts\dev.ps1.
  • Wired desktop development to set the Python engine reload flag while keeping Tauri/Vite hot reload as the default renderer path.
  • Created the desktop-first feature doc set.
  • Documented Tauri as the active shell direction.
  • Documented the expected bash scripts/dev.sh, pnpm desktop:dev, and pnpm desktop:build startup paths.
  • Added the apps/desktop Tauri scaffold with engine sidecar launch, per-launch token creation, desktop app-data database wiring, and shutdown cleanup.
  • Documented the temporary sidecar health-stub fallback while full Python engine bundling is finished.

Execution Engine

  • Fixed Codex/Claude Code Project Board planning sessions failing to enter plan mode. Root cause: the programmatic terminal-input queue advanced inside an impure setState updater that mutated a backlog ref. Because the app runs under React.StrictMode, which double-invokes updaters, the second invocation saw the already-popped backlog and committed a state with the item dropped, so queued inputs (/plan, Enter, the planning prompt) were lost intermittently — /plan typed but never submitted, or not typed at all. The queue is now a single per-terminal FIFO array in state with pure updaters and a monotonic id counter (no Date.now() id collisions), so every queued input is delivered in order. Manual typing was unaffected because it bypasses the queue. This also fixes the same drop for any other programmatic terminal input (e.g. send-to-console). Regression guard: the planning CLI test now renders under StrictMode (it times out against the old queue).
  • Planning also waits (best-effort) for the CLI to finish booting — recognized banner or settled output, with a fixed-delay fallback so /plan is always typed even when the raw-ANSI banner isn't recognized — captures the /plan echo marker right before typing, lets the slash-command autocomplete menu render, then submits with Enter plus a second spaced Enter (a redundant empty Enter is ignored by the CLI). These harden the typed sequence on top of the queue fix.
  • Raised the Execution Engine to Python 3.14 and added CLI reload support for desktop development.
  • Created the Execution Engine feature docs.
  • Documented apps/runner as migration source for apps/engine.
  • Documented neutral engine naming and local-first privacy boundaries.
  • Added the apps/engine Python package scaffold with FastAPI loopback API, token-protected session/probe endpoints, fresh SQLite schema, metadata redaction helpers, and Local/SSH adapter probes.
  • Added local project board card persistence and CRUD endpoints to the Execution Engine.
  • Removed the renderer Project Board planning drafts strip; the board now loads direct card data without planning-draft reads.
  • Added local built-in agent persona persistence plus the seed/list endpoints used by the Agents tab.
  • Expanded the built-in persona library to 50 business and software delivery roles, with reseeding support that refreshes existing built-in rows.
  • Added desktop-engine chat thread persistence and terminal request persistence for the Chats page.
  • Added a compatibility Local Machine response for the current renderer while the UI still consumes legacy runner-shaped API fields.
  • Added local model provider, catalog model, and CLI seat persistence so Chats can offer only Settings-enabled chat targets.
  • Added an engine-local terminal broker, terminal attach endpoint, and WebSocket bridge so Codex/Claude Code chat terminals move from requested to live without the legacy runner adapter.
  • Added non-destructive local schema repair for older terminal_sessions tables so terminal creation does not fail on existing desktop databases.
  • Reconciled local terminal rows against the engine terminal broker so CLIs that exit before attach are marked exited with an actionable install/environment message instead of retrying stale WebSocket attachments.
  • Added local project-agent persistence so personas can be added to Active Agents in desktop mode.
  • Added an Agents tab persona detail view and first-name active-agent display names for built-ins.
  • Added desktop-engine Dev Ops Git operation and approval routes so the Dev Ops page can run local Git status/history/branch/stage operations and route commit/push/PR creation through approvals instead of hitting placeholder routes that returned 405.
  • Refined the Dev Ops Git workflow with branch checkout, setup-only GitHub repo clone controls, combined commit/push approval, and broader Windows GitHub CLI discovery for the desktop engine.
  • Added Dev Ops branch creation and adjusted the commit controls so stage, unstage, commit, push, and commit/push share one action row.
  • Moved Dev Ops branch checkout and branch creation controls into a dedicated top-right Branches panel.
  • Added select-all for changed files, made Push a primary action, and surfaced GitHub CLI install plus auth script blocks in the GitHub auth warning.
  • Changed GitHub CLI install links to explicit "GitHub CLI install instructions" buttons that open the install page directly.
  • Clarified Windows GitHub CLI auth scripts by labeling PowerShell and Git Bash separately and avoiding Bash $PATH syntax that PowerShell parses badly.
  • Removed the redundant GitHub Auth button, changed unready Dev Ops status to red "Not Ready", and added explicit GitHub vs Azure DevOps setup choices.
  • Changed new/unready projects to show only the Dev Ops setup flow until the repo verifies as ready.
  • Gated fresh Dev Ops setup behind an Environments Machine, made setup require Machine, provider, repo, default branch, and branch prefix, and added the Complete Setup action.
  • Changed fresh GitHub setup to use the repository default branch from GitHub CLI metadata when available, while keeping feature/ as the default branch prefix.
  • Changed fresh GitHub setup to load available branches for the selected repo and show Default branch as a dropdown that starts on GitHub's marked default branch.
  • Removed manual Load Repos and Load Branches buttons from Dev Ops setup; repositories and branches now refresh automatically as the setup choices change.
  • Added a modern native Windows/Linux folder browser for local Machine setup and kept Complete Setup blocked when the selected project folder already exists or may contain files.
  • Changed Dev Ops setup to use the selected Machine's saved project path as the clone target and the default sibling worktree root without showing project folder or worktree root inputs. The Git settings card also hides worktree root and derives the same default on save.
  • Changed ready Dev Ops Git Setup to hide repo path, show branch defaults as dropdown data, and refresh branch metadata after branch creation or checkout. The Git Setup card no longer repeats ready/provider tags.
  • Added explicit inline loaders for setup clone and branch checkout operations.
  • Removed Verify Setup, changed existing project folders to a Yes/No confirmation modal, and kept blocked or failed clone messages from duplicating as page-level errors.
  • Blocked all Dev Ops actions when a configured project no longer has a Machine, without forcing a full Dev Ops re-setup, and added automatic repo readiness checks when the Dev Ops tab opens.
  • Added the renderer AI Planning Session Mission Room for the Project Board, using durable planning chat threads and staged card creation without adding backend planning-draft schema.
  • Added Planning Session model selection across enabled provider models and Codex/Claude Code seats.
  • Changed Codex and Claude Code CLI seats from account/user scope to project/user scope so new projects start with independent local CLI settings.
  • Blocked Dev Ops Git status operations until the project has an explicit repo path so fresh projects cannot inspect the Skybolt checkout through execution-target or engine-cwd fallback.
  • Fixed local terminal output streaming so Windows shells and short-running commands render prompt bytes before the process exits instead of waiting for a full pipe buffer.
  • Improved planning terminal startup errors so failed or non-live terminal sessions surface the engine status message instead of continuing into a generic attach failure.
  • Added Windows PTY support through pywinpty so local interactive CLIs such as Codex run with a real terminal instead of exiting with stdin is not a terminal.
  • Added planning-seat availability checks for Codex and Claude Code so missing local CLIs are blocked before Skybolt opens a terminal session.
  • Fixed Local Machine CLI capability probing so the Project Board sees Codex and Claude Code when they are installed in common Windows locations even if the desktop sidecar starts with a narrower PATH than the user's terminal.
  • Sanitized terminal-control output before showing Codex/Claude Code planning responses in the Mission Room transcript, and launch Codex with --no-alt-screen to reduce full-screen TUI redraws.
  • Changed Codex/Claude Code planning sessions to show an embedded live console in the Mission Room instead of relying only on sanitized terminal text in chat bubbles.
  • Removed the guided answer controls from CLI-backed planning sessions so the embedded terminal can fill the Mission Room console panel.
  • Restored automatic Mission Room prompt send for Codex/Claude Code planning sessions by queuing /plan, waiting until it is visible in the live console, sending Enter, then sending the generated planning prompt in order.
  • Added a Mission Room planning session manager with Start New, Resume, Start Over, and Delete actions. Planning sessions remain hidden from normal Chats and resume from local-only planning_state snapshots stored in existing chat thread metadata.
  • Changed Mission Room Start New so the model selected on the manager screen is used immediately, with no duplicate model dropdown or second start button in the session view, and added Back from the active session to the manager.
  • Added Planning Session cleanup for attached CLI terminals so deleting or starting over a Codex or Claude Code planning session stops stale planning terminals from piling up.
  • Wrapped multiline planning terminal prompts in bracketed paste markers so Codex and Claude Code TUIs receive one pasted prompt instead of scrambled raw keystrokes.
  • Fixed terminal Ctrl+V paste in the desktop renderer by reading clipboard text during the key gesture while keeping browser paste events as a fallback.
  • Changed blank Windows terminal launches to prefer detected PowerShell over %COMSPEC%, changed Linux/macOS blank terminal launches to default to Bash, and wired terminal creation to use the selected Machine's default shell override.
  • Applied terminal attach/resize dimensions to the Windows PTY and refreshed the xterm viewport after output writes so new terminal cursors render at the live prompt before first input.
  • Removed the synthetic metadata-only notice from the xterm buffer so Windows shells that use absolute cursor positioning, such as PowerShell with PSReadLine, do not render first input on the wrong row.
  • Filtered xterm-generated terminal response sequences, including device-attribute replies such as ESC[?1;2c, before forwarding renderer input to the PTY so PowerShell startup probes do not appear as typed continuation text.
  • Added project-local Brain context packages that include Brain-enabled source materials, general chats, and Files assets. The Files page manages project assets under the selected assets folder, defaults new uploads to Brain Off, lets users mark individual files in or out of Brain, and stays blocked until the project has a saved Machine project folder.

Legacy Control Plane

  • Marked the control-plane feature docs as superseded by ADR-0035.
  • Demoted runner-first setup, pairing, enrollment, and direct runner WSS instructions to historical context.
  • Pointed future work at Desktop App, Machines, Execution Engine, SSH Machines, and Browser Redesign docs.

Machines

  • Added a Windows-only Local Machine default-shell dropdown on the Environment tab using detected PowerShell and Command Prompt locations.
  • Changed Environment saved actions from Compose-specific presets to named command actions with a target Machine and an auto-run-on-new-terminal option.
  • Folded underlying runner status into saved Machine rows so refreshing with an online local runner does not show two Local Machine entries.
  • Changed Add Machine to open a setup modal and made saved Machine settings render as rows with Edit and Delete actions.
  • Moved Machine environment choice and source-material path into the Environment tab.
  • Moved Git repo/worktree settings to Dev Ops so provider-specific Git, GitHub, and Azure DevOps workflows share one surface.
  • Made desktop-engine terminal requests use the saved Local or SSH Machine project path as the startup directory.
  • Added user-facing Machine delete flow that removes the Machine from the active list while preserving runtime history.
  • Created the Machines feature docs.
  • Established Machine and Project Execution Target as the active product language.
  • Documented the replacement of user-facing Runner concepts.

SSH Machines

  • Created SSH Machines feature docs.
  • Documented Linux/macOS remotes as v1 scope.
  • Documented host-key and credential-storage requirements.