Skip to content

Order-A Rollout — External Data Ingestion

Execution log + forward plan for the multi-PR migration that moves Verifluence's external-data path (Kick pull cron + push webhooks) out of per-env stage workloads and into the shared data namespace.

The end state: stage and the future production env both consume Kick data as projector outputs over an HMAC-signed envelope bus; there is one Kick API caller in the cluster, one shared timeseries store, and each env still owns its own marketplace tables.

Goal

                   ┌────────────────────────────────────────────┐
                   │   data namespace (shared infra)            │
                   │                                            │
   Kick public ──▶ │   ingest pod                               │
   API + webhooks  │   ├── viewer-poll cron (every 1 min)       │
                   │   ├── /webhooks/kick   (push receiver)     │
                   │   ├── writeRawEvent  → vf.raw_events (CH)  │
                   │   └── outbox + fan-out worker              │
                   └──────────────┬─────────────────────────────┘
                                  │  HMAC-signed envelopes
              ┌───────────────────┴────────────────────┐
              ▼                                        ▼
    ┌──────────────────────┐                ┌──────────────────────┐
    │  stage api           │                │  production api      │
    │  /internal/events    │                │  /internal/events    │
    │  per-env projectors  │                │  per-env projectors  │
    │  marketplace tables  │                │  marketplace tables  │
    └──────────────────────┘                └──────────────────────┘

Shipped

PRSubjectCommit
1data namespace + vf-pg-dm CNPG clustershipped 2026-05-13
2ClickHouse chi-vf ExternalName aliasshipped 2026-05-13
3dm-migrator + dm.* / ingest.* schemasshipped 2026-05-13
4Ingest workload (empty boot + park)shipped 2026-05-14
5Fan-out worker + HMAC envelope deliveryshipped 2026-05-14
6a–dProjector path for viewer-poll (cron move + per-probe writes)shipped 2026-05-14
7aKick webhook receiver in ingest (DRY-RUN)fb7c06c 2026-05-15
7bHTTPRoute for wh.verifluence.io → ingest:80 (+ opened cross-ns scope on verifluence-https listener)c87e051 + 4ce35ef 2026-05-15
7cPer-env projectors for 10 Kick push events791ffcb 2026-05-15
7dActivate fan-out for Kick webhook envelopes359ee59 2026-05-15
7eKick portal URL flipped to wh.verifluence.io (+ bulk re-subscribe of all 87 broadcasters: Kick stamps the webhook URL into each subscription at creation time, so a portal-URL change doesn't redirect existing subs)manual 2026-05-15
7f-1Port 3 admin read paths off webhooks.incomingvf.raw_events (CH)ddb76c2 2026-05-16
7f-2DROP webhooks.incoming + retire /wh/kick (410 Gone)a37e195 2026-05-16
7f-3Remove 340 LOC dead handleKickWebhook + helpers158bfb3 2026-05-16
7gSumsub webhook receiver in ingest + projector, dashboard URL flipped, /api/kyc/hook + /wh/sumsub retired (410), 200 LOC removed34253db (receiver) + f087338 (retirement) + f83ec90 (cleanup) 2026-05-15/16
8a-dProduction app stack (api, worker, migrator, HTTPRoute, fan-out target) + vf-pg recreate from clean baseline5c202ab (manifests) + manual cluster recreate 2026-05-16
8/0125Grants migration replaces manual step 3b race31b35e64bdb330 (CI fix) 2026-05-16
Schema cleanupRe-baselined 0001-0124 + stripped _migrations from dump + CI bumped to PG18 + canonicalised parts/00_baseline.sql416eda5 + dc46d83 + e6b50cc + b55a8e8 2026-05-16
Bug fixchat-histogram 30-day truncation → port to vf.raw_events0d5cce7 2026-05-16
Maintenanceprocessed_events daily purge cron5af6188 2026-05-16

Current live path (today, 2026-05-15):

  • Kick → wh.stage.verifluence.io → stage webhooks pod → webhooks.incoming table + inline projector logic (unchanged)
  • Kick public API ← ingest viewer-poll cron → vf.raw_events → fan-out → stage /internal/events → per-env projector (PR 6)

The 7a receiver is shadow-only — Kick still POSTs to stage. Manual probe (POST http://ingest.data/webhooks/kick) returns 401 on bad sig, which confirms the signature path is live.

Pending

PR 7d — Activate fan-out for Kick webhook envelopes

7a–7c are all dry-run. The receiver still passes targets: [] to writeRawEvent (ingest-kick-webhook.ts), so no rows go into ingest.ingest_outbox_delivery and the fan-out worker stays inert on Kick webhook envelopes. PR 7d removes that override, letting fan-out POST envelopes to stage's /internal/events where the new projectors process them. One-line code change — but the test gate is real: replay a small set of envelopes via direct curl first, verify projector parity against stage's existing webhooks.incoming inline path, then ship.

PR 7d — (skipped, per atomic-flip decision)

Proxy-during-soak was discussed. User chose atomic flip (no parallel write soak). 7d is intentionally absent.

PR 7e — The flip

User chose shape A: single Kick app, single webhook URL, atomic DNS or portal change.

Recommended control point: DNS A-record for wh.stage.verifluence.io (or whatever hostname the Kick portal points at) flipped from the stage Envoy IP to the data Envoy IP. Rolling back is a one-line DNS change. The Kick developer portal URL itself never changes.

Kick retries non-2xx for ~minutes, so a brief drop window is tolerated.

PR 7f — Cleanup (deferred ~30d post-7e)

After the new path has carried prod traffic for the 30-day webhooks.incoming retention window:

  • Remove kick_webhook.ts handler registration from stage's webhooks-server.ts
  • Drop webhooks.incoming table (separate migration)
  • Remove KICK_PUBLIC_KEY env var from stage's webhooks release
  • Strip the now-unused stats queries (kick_webhook.ts:546,664) and port them to query the shared ClickHouse store if still needed

PR 8–10 — Production app stack — DONE

Shipped 2026-05-16: production api + worker + migrator HelmReleases in clusters/prod/production/, HTTPRoute for api.verifluence.io, fan-out target extended to stage,production. vf-pg cluster recreated from a clean baseline so production starts with zero legacy schema drift. Real Kick + Sumsub traffic now lands in production via fan-out (verified: 895 chat rows + 1 applicantCreated within 10 minutes of cutover).

Still operationally open

  • api.verifluence.io DNS CNAMEpublic.verifluence.io (DNS only / gray cloud). Same shape as wh.verifluence.io.
  • kycdocs-prod S3 bucket at Hetzner Object Storage, hel1 region. Rotate the inlined S3_ACCESS_KEY / S3_SECRET_KEY in production/releases/{api,worker,migrator}.yaml to a prod-scoped pair.
  • Frontend Cloudflare Pages production env — promote app.verifluence.io to point at api.verifluence.io. Separate repo
    • CF Pages dashboard.

PR 9 — kick_chat_messages to shared store

Per-env duplication identified during the Q&A pass. Slice 1 shipped (chat-histogram → vf.raw_events, fixed 30-day truncation bug). Remaining slices: port chat-count in the kick_viewer_count projector to CH, rewrite chat-export to read from CH instead of per-env PG, stop per-env writes, drop the per-env table. Multi-day rollout with soak windows.

PR 10 — Worker job consolidation to ingest

5 env-agnostic worker jobs (kick-token-refresh, x-follower-poll, kick-follower-backfill, chat-export, kick-subscriptions) can move to the shared ingest tier. After that the per-env worker shrinks to ~3 jobs (trust-score-refresh, kick-signup-resume-reminder, daily-maintenance + stale-session-cleanup). Multi-day rollout.

Production DB role hygiene (small follow-up)

Sync production/releases/vf-pg.yaml to stage's managed.roles + postInitApplicationSQL pattern so vfmigrator/vfapp passwords are CNPG-managed rather than inlined plaintext. Destructive (requires another cluster recreate), so defer until there's a reason to bounce the cluster anyway.

Key files (cheat-sheet for the next session)

ConcernFile
DomainEvent envelope shapeapi/src/ingest.ts
Writer (CH + outbox)api/src/ingest-writer.ts
Fan-out worker (HMAC, retries)api/src/ingest-fanout.ts
Internal events receiver (stage side)api/src/internal-events.ts
Kick sig verify (shared)api/src/kick_signature.ts
Kick receiver — stage (live)api/src/kick_webhook.ts
Kick receiver — ingest (shadow, PR 7a)api/src/ingest-kick-webhook.ts
Viewer-poll cron entryapi/src/ingest-server.ts
Viewer-count projectorapi/src/projectors/kick_viewer_count.ts
Ingest helm releaseoperations clusters/prod/data/releases/ingest.yaml
Stage webhooks releaseoperations clusters/prod/stage/releases/webhooks.yaml
Ingest alert rulesoperations clusters/prod/data/monitoring/ingest-rules.yaml

Things easy to forget

  • ingest.processed_webhooks is the new cross-restart idempotency log for the ingest-side receiver — applied by migration migrations-dm/0004_processed_webhooks.sql. It is shared (one row per Kick message_id), not per-env. The stage-side webhooks.incoming remains per-env until 7f.
  • writeRawEvent({ targets: [] }) is the dry-run knob in PR 7a. Removes only the per-env delivery rows; CH vf.raw_events and the outbox envelope are still written so replay tooling can re-project if 7c projectors need rebuilding.
  • Image-policy annotation format: {"$imagepolicy": "flux-system:ingest"} on the image: line as a comment — NOT …:tag, that variant only works on a sub-field.
  • Envoy Gateway, not k8s Ingress. The cluster uses Gateway API. Per-host routing is via HTTPRoute resources attached to the envoy-public Gateway in ops ns.
  • ClickHouse DateTime64(3) wants milliseconds, not seconds — passing seconds parses as ms-since-epoch and the TTL merge nukes the row inside a minute. writeRawEvent already handles this; copy the pattern when adding new CH writers.

Verifluence Documentation