Skip to content

Shared data Namespace — Home for dm, ClickHouse, and the Ingest Tier

Status: Proposed · May 2026

Stand up a new k8s namespace data in the existing prod cluster as the physical home for the environment-agnostic data layer: a dedicated PostgreSQL cluster for the dm schema (plus ingest outbox and pg-boss), the existing ClickHouse instance moved out of the production namespace, and the single scraper tier from the External Data Ingestion proposal.

This is an infrastructure addendum to two in-flight proposals — it doesn't change their app-layer design, only where the shared pieces physically live.


Problem

Two proposals in flight (External Data Ingestion, Datamining Schema (dm)) both want a home for shared, environment-agnostic data and workloads that today live inside production:

  • A single ingest tier (one scraper, one webhook endpoint) plus its outbox + pg-boss queue.
  • The dm.* schema (channel identity, slug history, probe schedule, stream sessions, event subscriptions, scrape log, webhook dedupe, chat-export bookkeeping).
  • ClickHouse — already shared by both envs today, but lives in the production namespace where its name implies otherwise.

Default placement in the two proposals (open-question #2 of External Data Ingestion: "reuse production's PG with an ingest. schema") works, but couples three independent failure domains:

  1. Ingest backlog can pile up against the same Postgres the production app commits user-visible transactions to.
  2. Backup, PITR, and resource policies are forced to a single common denominator.
  3. The kick-external-schema proposal explicitly contemplates a future physical extraction of dm — landing it in production's Postgres now means doing that extraction twice.

The pre-production state is the only cheap moment to lay this layout down. Once production is live, every move is a coordinated cutover.


What's on the cluster today

NamespaceContainsRole
productionvf-pg, chi-vf-prod (ClickHouse), cloudflaredWill hold production app stack + currently hosts ClickHouse
stageapi, worker, webhooks, migrator, vf-pg, cloudflaredStage app stack
operatorscnpg, cho (Altinity ClickHouse Operator)DB operators
monitoring / ops / systemGrafana / Prom / TLS / FluxCluster plumbing

ClickHouse already sits in production and is consumed by both envs — the naming lies. Stage's vf-pg is per-env user data only; production's vf-pg is being provisioned for the same role.


Proposal

Add one namespace, data, that owns the shared infrastructure layer:

clusters/prod/data/
  releases/
    vf-pg-dm.yaml       # CNPG cluster — dm schema + ingest outbox + pg-boss
    chi-vf.yaml         # ClickHouse — moved from production/, renamed (drop -prod)
    ingest.yaml         # the single scraper tier from external-data-ingestion
  routes/               # wh.verifluence.io ingress
  monitoring/           # data-namespace alert routes

1. New PostgreSQL cluster vf-pg-dm in data

Owns three schema families, none of which belong in a per-env app DB:

SchemaSourceWhy here
dm.*Datamining SchemaEnvironment-agnostic; both envs soft-FK into it via public.streamer_channel_link.channel_id
ingest.* (ingest_outbox, ingest_outbox_delivery)External Data Ingestion §INGEST-7Transactional outbox for the fan-out; colocated with its producer (ingest tier)
pgbossToday in stage's vf-pgIngest fan-out job queue; belongs with the outbox, not with stage app data

Stage and production app DBs (stage/vf-pg, production/vf-pg) hold only public.* plus auth.* after this lands. They soft-reference dm.streamer_channel.id per the kick-external-schema design — same code, different physical DB.

2. Cross-namespace alias for ClickHouse (CHI stays in production)

Updated after evaluation: the original draft proposed moving the ClickHouseInstallation CRD into data. We reconsidered: PVCs are namespaced, so a clean move requires a PV re-bind dance with ClickHouse downtime (or a backup/restore). The only functional gain was naming honesty, which an ExternalName Service delivers at zero cost. The CHI itself stays in production.

Add a permanent ExternalName Service chi-vf in data that CNAMEs to clickhouse-vf.production.svc.cluster.local. Every consumer points at the namespace-neutral name chi-vf.data.svc.cluster.local; kube-dns resolves the CNAME directly with no proxy hop.

The Prometheus per-pod scrape continues to target the real headless chi-vf-prod-0-0.production.svc.cluster.local — per-pod scraping needs a real Service, not a cluster ExternalName, and the namespace stays consistent with the CHI's actual home.

3. Move the ingest workload to data

The External Data Ingestion proposal places ingest in the production namespace because it's "shared infra living somewhere." data is a better home — colocated with its outbox DB (vf-pg-dm) and its raw-event sink (chi-vf). wh.verifluence.io becomes a data-namespace ingress.

This resolves open questions §1 and §2 of External Data Ingestion in one move.


Why this is better than the proposals' default placement

ConcernReuse production PG (default)New data namespace
Failure domainIngest pg-boss outage piles up on prod app's WAL/locksIsolated DB; production app unaffected
Backup / PITRSame policy for both — ingest's churny outbox forces prod's policySeparate policy; dm can have tighter PITR for analytics, prod-app keeps its own
Resource contentionClickHouse + scrapers compete with prod app for CPU/IOSchedule data workloads to a separate node-pool or taint
Schema ownership claritydm.*, ingest.*, pgboss.* co-mingled with public.* in prodOne DB per concern; one Flux folder per namespace; one secret per role
Future DB split (kick-external-schema Phase B)Need to extract dm out of prod — pg_dump --schema=dm + cutoverDone by construction — no second migration
Compliance postureSumsub PII payloads live next to operator/streamer accountsRaw-payload store in its own namespace with its own RBAC + NetworkPolicy
Naming honestyproduction namespace ≠ "shared infra"data = shared infra; production = production app stack

Touchpoints

Flux

  • New clusters/prod/system/namespaces/data.yaml
  • New clusters/prod/data/ kustomization (mirrors stage//production/ shape)
  • Add clusters/prod/data/releases/chi-vf-alias.yamlExternalName Service pointing at clickhouse-vf.production; chi-vf-prod.yaml stays in production/releases/
  • New clusters/prod/data/releases/vf-pg-dm.yaml (CNPG Cluster)
  • New clusters/prod/data/releases/ingest.yaml (the scraper tier from External Data Ingestion §INGEST-1)
  • Ingress for wh.verifluence.io moves from stage/production to data

App secrets

  • Stage + prod app pods gain DM_DATABASE_URL (read/write into vf-pg-dm)
  • Ingest tier gains INGEST_DATABASE_URL (same DB; writes outbox)
  • Stage + prod fan-out clients gain PGBOSS_DATABASE_URL if they need direct queue access (most don't — fan-out is owned by ingest)
  • Existing DATABASE_URL keeps pointing at the env-local Postgres for public.* / auth.* only

NetworkPolicies

  • stage and production namespaces: egress allowed to data/vf-pg-dm:5432 and data/chi-vf:8123,9000
  • data namespace: egress to the public internet (Kick, Sumsub, X, Scrape.do, ScraperAPI); ingress only on wh.verifluence.io
  • Default deny everywhere else

Migrations

  • New migrator release dm-migrator owns dm.* and ingest.* migrations against vf-pg-dm. Distinct from the per-env app migrator that runs public.*.
  • Phase A of kick-external-schema (migration 0123) splits into two scripts:
    • In env DBs: create public.streamer_channel_link (no FK — soft reference across DBs)
    • In vf-pg-dm: create dm.streamer_channel, dm.streamer_channel_slug_history, move/rebuild dm.* descendants
  • The Phase A backfill becomes a one-off Job with two DSNs — reads from production/vf-pg, writes to data/vf-pg-dm. Run once during cutover.

Code

  • Add DM_DB to the Env interface alongside DB; postgres() driver instances for each.
  • api/src/streamer_channel.ts helper queries DM_DB; everything in public.* keeps querying DB.
  • One-line touch on every existing call site that crosses the boundary — typed at the env level, so the compiler enforces it.

Migration sequence

  1. Land the namespace skeleton. data.yaml namespace + empty Flux kustomization. No workloads yet. Verify CHO, CNPG operators see it.
  2. Add ClickHouse alias. Create the ExternalName Service chi-vf in data pointing at clickhouse-vf.production. Rolling-deploy stage apps with CLICKHOUSE_URL repointed to chi-vf.data.svc.cluster.local. The CHI itself does not move.
  3. Provision vf-pg-dm. CNPG Cluster + initial backup target. Empty schemas. No app traffic yet.
  4. Stand up ingest tier in data. Per External Data Ingestion §INGEST-1, write-only to vf-pg-dm and chi-vf. Pollers in dry-run mode (outbox writes, fan-out paused).
  5. Run kick-external-schema Phase A (migration 0123) against vf-pg-dm + env DBs in parallel. Backfill dm.streamer_channel from production/vf-pg.public.streamer_channels.
  6. Switch app code to use DM_DB. Repoint reads/writes per the kick-external-schema code-change-scope table.
  7. Enable fan-out (External Data Ingestion §INGEST-8). Stage starts consuming. Once green, strip stage's direct pollers (§INGEST-2).
  8. Production launches against the final layout — no second cutover.

Phases 1–3 are safely idempotent and reversible. Phase 4 onward inherits the rollout plan in the External Data Ingestion proposal.


Open questions

  1. CNPG topology for vf-pg-dm. Single primary + WAL archive enough for dm (mostly append/upsert, ~10–20k writes/h)? Or 1 primary + 1 replica from day one?
  2. auth.* placement. Stays per-env (user-account data, not env-agnostic). Confirm — nothing in the proposals suggests otherwise.
  3. ClickHouse cutover safety — resolved: CHI stays in production; the data/chi-vf ExternalName provides cross-namespace DNS without moving the workload.
  4. Naming. data vs dm vs ingest vs shared. Recommendation: data — covers PG + CH + ingest workload + future Kafka/JetStream without renaming.
  5. Backups across two PG clusters. Standardise on the same CNPG backup secret + S3 bucket prefix per cluster (vf-pg-prod/, vf-pg-stage/, vf-pg-dm/). Different retention per cluster is fine.
  6. Production launch coupling. This proposal is on the critical path only if we want production to launch against the final layout. If launch slips, this can ship after — at the cost of one extra dm-extraction migration later.

Verifluence Documentation