Order-A Rollout — External Data Ingestion
Execution log + forward plan for the multi-PR migration that moves Verifluence's external-data path (Kick pull cron + push webhooks) out of per-env stage workloads and into the shared data namespace.
The end state: stage and the future production env both consume Kick data as projector outputs over an HMAC-signed envelope bus; there is one Kick API caller in the cluster, one shared timeseries store, and each env still owns its own marketplace tables.
Goal
┌────────────────────────────────────────────┐
│ data namespace (shared infra) │
│ │
Kick public ──▶ │ ingest pod │
API + webhooks │ ├── viewer-poll cron (every 1 min) │
│ ├── /webhooks/kick (push receiver) │
│ ├── writeRawEvent → vf.raw_events (CH) │
│ └── outbox + fan-out worker │
└──────────────┬─────────────────────────────┘
│ HMAC-signed envelopes
┌───────────────────┴────────────────────┐
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ stage api │ │ production api │
│ /internal/events │ │ /internal/events │
│ per-env projectors │ │ per-env projectors │
│ marketplace tables │ │ marketplace tables │
└──────────────────────┘ └──────────────────────┘Shipped
| PR | Subject | Commit |
|---|---|---|
| 1 | data namespace + vf-pg-dm CNPG cluster | shipped 2026-05-13 |
| 2 | ClickHouse chi-vf ExternalName alias | shipped 2026-05-13 |
| 3 | dm-migrator + dm.* / ingest.* schemas | shipped 2026-05-13 |
| 4 | Ingest workload (empty boot + park) | shipped 2026-05-14 |
| 5 | Fan-out worker + HMAC envelope delivery | shipped 2026-05-14 |
| 6a–d | Projector path for viewer-poll (cron move + per-probe writes) | shipped 2026-05-14 |
| 7a | Kick webhook receiver in ingest (DRY-RUN) | fb7c06c 2026-05-15 |
| 7b | HTTPRoute for wh.verifluence.io → ingest:80 (+ opened cross-ns scope on verifluence-https listener) | c87e051 + 4ce35ef 2026-05-15 |
| 7c | Per-env projectors for 10 Kick push events | 791ffcb 2026-05-15 |
| 7d | Activate fan-out for Kick webhook envelopes | 359ee59 2026-05-15 |
| 7e | Kick portal URL flipped to wh.verifluence.io (+ bulk re-subscribe of all 87 broadcasters: Kick stamps the webhook URL into each subscription at creation time, so a portal-URL change doesn't redirect existing subs) | manual 2026-05-15 |
| 7f-1 | Port 3 admin read paths off webhooks.incoming → vf.raw_events (CH) | ddb76c2 2026-05-16 |
| 7f-2 | DROP webhooks.incoming + retire /wh/kick (410 Gone) | a37e195 2026-05-16 |
| 7f-3 | Remove 340 LOC dead handleKickWebhook + helpers | 158bfb3 2026-05-16 |
| 7g | Sumsub webhook receiver in ingest + projector, dashboard URL flipped, /api/kyc/hook + /wh/sumsub retired (410), 200 LOC removed | 34253db (receiver) + f087338 (retirement) + f83ec90 (cleanup) 2026-05-15/16 |
| 8a-d | Production app stack (api, worker, migrator, HTTPRoute, fan-out target) + vf-pg recreate from clean baseline | 5c202ab (manifests) + manual cluster recreate 2026-05-16 |
| 8/0125 | Grants migration replaces manual step 3b race | 31b35e6 → 4bdb330 (CI fix) 2026-05-16 |
| Schema cleanup | Re-baselined 0001-0124 + stripped _migrations from dump + CI bumped to PG18 + canonicalised parts/00_baseline.sql | 416eda5 + dc46d83 + e6b50cc + b55a8e8 2026-05-16 |
| Bug fix | chat-histogram 30-day truncation → port to vf.raw_events | 0d5cce7 2026-05-16 |
| Maintenance | processed_events daily purge cron | 5af6188 2026-05-16 |
Current live path (today, 2026-05-15):
- Kick →
wh.stage.verifluence.io→ stagewebhookspod →webhooks.incomingtable + inline projector logic (unchanged) - Kick public API ← ingest viewer-poll cron →
vf.raw_events→ fan-out → stage/internal/events→ per-env projector (PR 6)
The 7a receiver is shadow-only — Kick still POSTs to stage. Manual probe (POST http://ingest.data/webhooks/kick) returns 401 on bad sig, which confirms the signature path is live.
Pending
PR 7d — Activate fan-out for Kick webhook envelopes
7a–7c are all dry-run. The receiver still passes targets: [] to writeRawEvent (ingest-kick-webhook.ts), so no rows go into ingest.ingest_outbox_delivery and the fan-out worker stays inert on Kick webhook envelopes. PR 7d removes that override, letting fan-out POST envelopes to stage's /internal/events where the new projectors process them. One-line code change — but the test gate is real: replay a small set of envelopes via direct curl first, verify projector parity against stage's existing webhooks.incoming inline path, then ship.
PR 7d — (skipped, per atomic-flip decision)
Proxy-during-soak was discussed. User chose atomic flip (no parallel write soak). 7d is intentionally absent.
PR 7e — The flip
User chose shape A: single Kick app, single webhook URL, atomic DNS or portal change.
Recommended control point: DNS A-record for wh.stage.verifluence.io (or whatever hostname the Kick portal points at) flipped from the stage Envoy IP to the data Envoy IP. Rolling back is a one-line DNS change. The Kick developer portal URL itself never changes.
Kick retries non-2xx for ~minutes, so a brief drop window is tolerated.
PR 7f — Cleanup (deferred ~30d post-7e)
After the new path has carried prod traffic for the 30-day webhooks.incoming retention window:
- Remove
kick_webhook.tshandler registration from stage'swebhooks-server.ts - Drop
webhooks.incomingtable (separate migration) - Remove
KICK_PUBLIC_KEYenv var from stage's webhooks release - Strip the now-unused stats queries (
kick_webhook.ts:546,664) and port them to query the shared ClickHouse store if still needed
PR 8–10 — Production app stack — DONE
Shipped 2026-05-16: production api + worker + migrator HelmReleases in clusters/prod/production/, HTTPRoute for api.verifluence.io, fan-out target extended to stage,production. vf-pg cluster recreated from a clean baseline so production starts with zero legacy schema drift. Real Kick + Sumsub traffic now lands in production via fan-out (verified: 895 chat rows + 1 applicantCreated within 10 minutes of cutover).
Still operationally open
api.verifluence.ioDNS CNAME →public.verifluence.io(DNS only / gray cloud). Same shape aswh.verifluence.io.kycdocs-prodS3 bucket at Hetzner Object Storage, hel1 region. Rotate the inlinedS3_ACCESS_KEY/S3_SECRET_KEYinproduction/releases/{api,worker,migrator}.yamlto a prod-scoped pair.- Frontend Cloudflare Pages production env — promote
app.verifluence.ioto point atapi.verifluence.io. Separate repo- CF Pages dashboard.
PR 9 — kick_chat_messages to shared store
Per-env duplication identified during the Q&A pass. Slice 1 shipped (chat-histogram → vf.raw_events, fixed 30-day truncation bug). Remaining slices: port chat-count in the kick_viewer_count projector to CH, rewrite chat-export to read from CH instead of per-env PG, stop per-env writes, drop the per-env table. Multi-day rollout with soak windows.
PR 10 — Worker job consolidation to ingest
5 env-agnostic worker jobs (kick-token-refresh, x-follower-poll, kick-follower-backfill, chat-export, kick-subscriptions) can move to the shared ingest tier. After that the per-env worker shrinks to ~3 jobs (trust-score-refresh, kick-signup-resume-reminder, daily-maintenance + stale-session-cleanup). Multi-day rollout.
Production DB role hygiene (small follow-up)
Sync production/releases/vf-pg.yaml to stage's managed.roles + postInitApplicationSQL pattern so vfmigrator/vfapp passwords are CNPG-managed rather than inlined plaintext. Destructive (requires another cluster recreate), so defer until there's a reason to bounce the cluster anyway.
Key files (cheat-sheet for the next session)
| Concern | File |
|---|---|
| DomainEvent envelope shape | api/src/ingest.ts |
| Writer (CH + outbox) | api/src/ingest-writer.ts |
| Fan-out worker (HMAC, retries) | api/src/ingest-fanout.ts |
| Internal events receiver (stage side) | api/src/internal-events.ts |
| Kick sig verify (shared) | api/src/kick_signature.ts |
| Kick receiver — stage (live) | api/src/kick_webhook.ts |
| Kick receiver — ingest (shadow, PR 7a) | api/src/ingest-kick-webhook.ts |
| Viewer-poll cron entry | api/src/ingest-server.ts |
| Viewer-count projector | api/src/projectors/kick_viewer_count.ts |
| Ingest helm release | operations clusters/prod/data/releases/ingest.yaml |
| Stage webhooks release | operations clusters/prod/stage/releases/webhooks.yaml |
| Ingest alert rules | operations clusters/prod/data/monitoring/ingest-rules.yaml |
Things easy to forget
ingest.processed_webhooksis the new cross-restart idempotency log for the ingest-side receiver — applied by migrationmigrations-dm/0004_processed_webhooks.sql. It is shared (one row per Kickmessage_id), not per-env. The stage-sidewebhooks.incomingremains per-env until 7f.writeRawEvent({ targets: [] })is the dry-run knob in PR 7a. Removes only the per-env delivery rows; CHvf.raw_eventsand the outbox envelope are still written so replay tooling can re-project if 7c projectors need rebuilding.- Image-policy annotation format:
{"$imagepolicy": "flux-system:ingest"}on theimage:line as a comment — NOT…:tag, that variant only works on a sub-field. - Envoy Gateway, not k8s Ingress. The cluster uses Gateway API. Per-host routing is via
HTTPRouteresources attached to theenvoy-publicGateway inopsns. - ClickHouse
DateTime64(3)wants milliseconds, not seconds — passing seconds parses as ms-since-epoch and the TTL merge nukes the row inside a minute.writeRawEventalready handles this; copy the pattern when adding new CH writers.