Layer 03 · Orchestration

One brain isn't enough. Teams of brains, are.

Synapse routes each task to the right model — Claude, GPT, Gemini, Llama or a proprietary Bluey model. Simple questions go to cheap models. Heavy analysis goes to top models. You only pay for what makes sense.

See full architecture
6+
model families supported
−40-60%
token cost vs. naive usage
99.9%
uptime with automatic failover
JM
"Compare Q3 consolidated results to Q2 and surface the 3 largest variances by product line."
router · task: complex analysis · pick: Claude Opus
Claude
Haiku
Claude
Opus
GPT-5
Gemini
Flash
B
Top 3 Q3 vs Q2 variances: Premium line +18.4%, Generics −7.2%, Distribution +12.1%. Consolidated margin up 1.3pp. Want me to drill into each?
The "single LLM" problem

Betting on a single model is fragile — and expensive.

Mid-market companies adopting AI through a single provider's API discover three pains within months: the bill explodes, operations get locked in, and when the model goes down, the company stops with it. A multi-LLM router neutralizes all three risks at once.

Vendor lock-in

Prices go up without notice, models get deprecated, usage policies change. Code tied to a single provider gets rewritten — or you pay whatever they ask.

Paying top-tier for trivial work

Premium models cost 10-30× more than efficient ones. When every prompt hits the top model, the bill grows with no quality gain. Trivial tasks don't need expensive brains.

Model down — ops down with it

Provider downtime means squad down, support down, pipeline down. Without automatic failover to another model, your continuity depends on an SLA you don't control.

The solution: smart router

Four steps, one router. Decision in milliseconds.

The Synapse router isn't a manual switch — it's a decision layer that classifies the task, compares it to agent policy and picks the right model. All before the prompt leaves your infrastructure.

Step 01

Task classification

The router reads the prompt and identifies: simple, medium or complex? Text, number, code, multimodal? Critical or routine? This sets the eligible model tier.

Intent detectionModalityCriticality
Step 02

Agent policy

Each squad has its own rule. Finance squad never uses open-source models. Marketing squad can use Gemini for images. You define — the router obeys.

Per agentPer data tagAuditable
Step 03

The right model wins

Within the eligible tier, the router picks for best cost-quality ratio in real time: current latency, provider queue, price per token, historical accuracy for the task type.

Cost-qualityLive latencyBluey decision
Step 04

Automatic failover

If the chosen model fails or stalls, the router redirects to the next in the queue without the user noticing. Operational continuity without depending on a single SLA.

RetryFailover99.9% uptime
Model catalog

Each brain is good at one thing. The router knows them all.

You don't pick the model — you pick the outcome. But if you want to know who's behind the scenes, here is the platform's current catalog. New models ship continuously, without changing your squads' code.

Premium

Claude (Anthropic)

Opus, Sonnet and Haiku family

Deep reasoning, long-context analysis and reliable structured generation. Opus for critical analysis, Haiku for high-volume, low-cost workflows.

Best for
Financial analysisContractsCritical decisions
Balanced

GPT (OpenAI)

GPT-5 family and mini variants

Strong at natural language generation, code interpretation and multimodal. Excellent general baseline for communication and productivity squads.

Best for
Customer supportContentMultimodal
Efficient

Gemini (Google)

Pro and Flash family

Generous context window, great cost at scale and native visual processing. Ideal for ingesting large documents and visual tasks at volume.

Best for
Long documentsImagesVolume
Open-source

Llama (Meta)

Open-source for on-prem scenarios

Open model running on the customer's own infra — for cases where sensitive data cannot leave the network. The router treats Llama like any other provider.

Best for
On-premData sovereigntyRegulated industries
Efficient

Mistral / DeepSeek / Qwen

Alternative efficient families

Aggressive pricing tier for classification, extraction and high-volume workflows. The router scales automatically when cost drives the decision.

Best for
ClassificationExtractionScale pipelines
Proprietary

Bluey models

In-house reranker and classifiers

Proprietary models trained by Bluey for the internal pipeline: RAG reranker, task classifier, PII detector. Components where Bluey controls accuracy end-to-end.

Best for
RAG rerankRoutingLGPD compliance
Technical specification

The details your tech team will want to see.

Serious multi-LLM isn't just "having APIs for multiple models" — it's architecture that decides, measures, fails over, audits and isolates per tenant. Here is how Synapse implements each piece.

Router per task, not per key

Not a manual selector: each agent declares a policy and the router decides per call based on task type, criticality and each provider's current SLA.

Dynamic decisionPer agentPer call

Transparent failover and retry

Primary provider down? Router redirects to the secondary at the same quality tier — without the user noticing. Retry with backoff and controlled degradation when all providers degrade.

Failover <1s99.9% uptimeZero client code

Real-time cost telemetry

Every call logs model, tokens, latency and cost. Dashboard shows how much each squad consumes per day, week, month — and where you can optimize by switching tiers.

Cost per callPer squadSpike alerts

Semantic cache

Equivalent questions don't need to hit the LLM twice. The cache identifies semantic similarity, returns the validated answer and saves tokens on repeating workflows.

Cache hit ratioConfigurable TTLSource-based invalidation

Policy per data, not just per agent

Data tagged as sensitive never goes to an external provider — only on-prem Llama or Bluey models. Native PII detection blocks leaks before the call leaves the network.

PII redactionTag-based routingLGPD by design

Multi-tenant isolation

Each customer has separate quota, keys and telemetry. No data enters model fine-tuning — Bluey or third-party. The Branches system applies isolation between units of the same group.

No cross-trainingPer-tenant quotaIsolated branches
Platform integration

The router is the central brain — every other layer goes through it.

No Synapse AI decision goes directly to an external model. Squads, Your Base and Governance all flow through the router to guarantee cost, quality and compliance end-to-end.

Squads call the router

Each squad declares its policy — the router resolves which model runs each workflow step. The squad doesn't know providers, it just knows the expected outcome.

Your Base provides context

RAG retrieves the right documents, the router picks the brain that interprets them. Heavy question goes to Opus, simple question goes to Haiku — context stays the same.

Governance audits the choices

Every call logs the chosen model, reason, cost and outcome. Audit shows compliance which brain processed what — always traceable.

Want to see the router running on your real prompts?

In a 45-minute technical demo, we run one of your team's workflows in parallel — single model vs. multi-LLM router. You see cost, latency and quality side by side, with real data.

See full architecture