Skip to content
Fusion StackLab
Back to journal
AI Automation8 min read

AI marketing systems that actually ship

MO
Maya Okonkwo
Head of AI Engineering
April 22, 2026

Pilots are easy. Production is the discipline. In the last two years we've shipped AI marketing automations across HubSpot, Klaviyo, Customer.io, and a half-dozen custom stacks — and we've watched many more die before they ever touched a real customer. The pattern that separates the two isn't model choice. It's architecture.

The teams that succeed treat AI features like any other production system: with explicit data contracts, observability, evaluation, and a deliberate failure mode. The teams that don't, ship a prompt wrapped in a Zap and pray.

Stop building toys.

An AI feature that depends on a single prompt and a single model is a demo. A system depends on data, evaluation, escalation, and observability. Treat them as first-class concerns from day one or accept that you're shipping a toy.

The clearest tell that you're building a toy: there's no plan for what happens when the model is wrong. Every shipped AI marketing system needs a graceful fallback — a deterministic path that runs when confidence is low, when the API is down, when a regex catches a forbidden phrase. If your spec doesn't have that section, you're not done designing.

Three layers, every time.

Foundation: identity, events, and consent — clean. Logic: deterministic rules sit above LLM calls, never below. Surface: the channel-specific renderer that handles delivery, personalization, and fallback. Skip any layer and the system is fragile within a quarter.

We've seen teams put the LLM at the top of the stack and try to bolt rules on after the fact. It never works. Rules can wrap models cleanly; the inverse always leaks. Put the deterministic guardrails first, then let the model fill in the creative gaps.

Measure incrementality, not output.

If you can't run a holdout, you can't claim the system worked. We instrument incrementality from the first week. It changes the conversation with leadership and it keeps the team honest.

The cheapest holdout is the simplest: 90% of qualifying contacts get the new AI flow, 10% get the control. Compare conversion at the same horizon. If the lift isn't real, you find out in weeks instead of quarters — and you avoid the soul-crushing experience of defending a system that doesn't actually work.

Where this falls down.

Two things kill these systems in month four: cost creep and quality drift. Cost creep happens when you don't price every call and watch a per-customer margin number weekly. Quality drift happens when the eval suite isn't actually being run — or worse, it's being run but nobody acts on the failures.

The fix for both is unglamorous: weekly cost reviews, an eval suite that runs on every prompt change, and a clear owner who's allowed to revert. Treat the AI like a production service, not a research project, and it'll stay one.

(06)Let's make something

Got an idea
that's been
sitting
on a napkin —
let's ship it.

FUSION STACK LAB