Problem Architecture Silent State Benchmarks Use Cases About FAQ → Live Demo
Inference-Time Reliability Infrastructure

Enterprise AI
with a human accountability layer.

Human in the loop. By design, not by accident.

AERIS Lattice sits between your users and any LLM, running every response through an 11-step, 3-layer validation pipeline before delivery. When confidence meets the threshold, it delivers. When it doesn't, it escalates to a qualified human reviewer with full context.

0 Validation Steps
0 Consensus Layers
0 Sovereign Agents
Dangerous Deliveries
aeris-lattice · v2.1 · Dual Consensus Operational
● GPT-4o-mini ● Groq Llama 3.3 ● Mistral Small ● Gemini 2.5 Flash ● Llama 3.2 · Sovereign
INCOMING QUERY · TIER C · FINANCIAL "Is this investment guaranteed to return 40% annually with zero risk?"
SILENT STATE — RESPONSE SUPPRESSED
Insufficient reliability for safe delivery. Trust score 41/100 below threshold of 75.
TRUST SCORE 41 meta-arbitration
CONSENSUS 70% partial
MODELS ACTIVE 4/4 all queried
VERDICT Suppressed contradiction_detected
REFUSAL CHAIN · contradiction detected · contradiction penalty applied · trust score below threshold
SOVEREIGN LAYER VOTES
Deliver
1
Reflect
4
Silent
0
Live product · v2.1 · try the demo →

Built for high-stakes environments

Healthcare Hospital networks & health-tech platforms
Legal Law firms & legal research platforms
Finance Financial institutions & fintech
Enterprise AI Teams deploying LLMs at scale
LLM Builders Platforms building on foundation models
AI Safety Teams Red teamers & adversarial prompt researchers
The Problem

Raw LLM output is fundamentally unreliable
in high-stakes environments.

9 in 10

AI systems answer even when they shouldn't

Language models have no native uncertainty signal. Every query receives the same grammatical confidence, whether the answer is accurate or entirely fabricated.

$1.8T

At stake from AI errors in regulated industries

In healthcare, legal and financial services, a confidently wrong AI response is not a UX problem. It is a liability. One misdiagnosis or compliance failure can erase any efficiency gain.

0

Foundation models ship with a built-in refusal mechanism

No major LLM provider ships a controlled refusal state. When a model is uncertain, it guesses. In high-stakes domains, a confident wrong answer is far more dangerous than no answer at all.

1

Point of failure in most enterprise AI deployments

Most production pipelines route to a single model with no cross-validation layer. If that model hallucinates on a given query class, nothing stops it from reaching your user.

11-Step Validation Pipeline · 3 Consensus Layers

Every response runs through
a full arbitration system.

AERIS doesn't use a single AI model to decide if a response is safe. It uses three independent consensus layers with 11 validation steps. Any layer can stop a response. The final decision comes from a composite trust score, not a single model's guess.

Incoming Query "What is the recommended warfarin dosage for a 78-year-old patient with renal impairment?" RISK TIER C · HEALTHCARE
L1 External Consensus Cloud Models Steps 01 – 05
Step
01
Prompt Classifier Reads the query, assigns a risk level (A through D) and identifies the domain. This determines how many AI models review the response. A medical query gets more scrutiny than a general one.
TIER C ASSIGNED DOMAIN: HEALTHCARE
Step
02
Tiered Model Routing Chooses which cloud AI models to consult based on risk level. Lower-risk queries use 2 models. Higher-risk queries use all 4, in parallel, for broader coverage.
4 MODELS SELECTED
Step
03
Parallel Model Queries All selected models answer the query simultaneously and independently. No model sees the others' answers, which prevents them from influencing each other.
GPT-4o mini Groq LLaMA 3.3 Mistral Small Gemini 2.5 Flash
Step
04
Consensus Engine Scores how much the models agree with each other. When they diverge significantly on a high-risk query, that alone can route the response to human review.
AGREEMENT: 0.71 BELOW TIER C THRESHOLD
Step
05
Contradiction Lattice Checks whether the response contradicts itself or previous responses. Three severity levels: a flag, a revision, or an immediate stop. Technical detail available in the docs.
LEVEL 1 FLAG: dosage qualifier missing
Tier C/D routes to Sovereign Consensus
L2 Sovereign Consensus Local · Air-Gap Capable Steps 06 – 07 · Tier C/D only
Step
06
Ethical Anchor A rule-based safety check with four pillars: physical harm, legal risk, regulatory compliance and reputational impact. If any pillar raises a hard veto, the response is stopped immediately. This layer cannot be overridden by any AI model score.
PILLARS CLEAR NO VETO
Step
07
Sovereign Layer — 5 Local Agents Five specialised AI agents run locally on your infrastructure — no internet required. Each agent reviews the response from a different angle and votes on whether it should pass. The Silent State Judge holds veto authority and can stop delivery alone.
Skeptic · weight 1.0 Compliance Guardian · weight 1.5 Adversarial Challenger · weight 1.2 Precision Auditor · weight 1.0 Silent State Judge · weight 2.0 · veto authority
Composite scores passed to Meta-Arbitration
L3 Meta-Arbitration Final Decision Engine Steps 08 – 11
Step
08
Confidence Engine Produces a domain-aware confidence score using all signals gathered so far. A healthcare query requires a higher score to pass than a general query. The domain was locked in at Step 1.
SCORE: 0.74 · THRESHOLD: 0.85
Step
09
Reflective Loop Before delivery, a separate AI instance tries to poke holes in the proposed response. It looks for unstated assumptions, logical gaps and overconfident claims. Any findings reduce the final score.
1 ASSUMPTION FLAGGED: renal dosing qualifier
Step
10
Meta-Arbitration Engine Combines every signal from all three layers into a single trust score from 0 to 100. This composite score is the only input to the final decision. No single model can override it.
COMPOSITE TRUST SCORE: 61 / 100 · BELOW THRESHOLD
Step
11
Final Decision Score meets the threshold? The response is delivered with a full audit record. Score falls short? Silent State activates. The response is withheld and routed to a human reviewer with everything they need to resolve it.
SILENT STATE → HUMAN REVIEW
DELIVERED Trust score meets domain threshold. Response delivered with full audit metadata. reliability_score: 1.000 · all layers: PASS · silent_state: false
⊘ SILENT STATE Trust score below threshold. No response delivered. Query, draft and full validation report routed to human reviewer. audit_id generated for full traceability.
Dual Consensus Architecture

Cloud scale and local sovereignty.
Neither is optional.

AERIS Lattice deliberately separates cloud-scale model consensus from a local sovereign layer. The cloud layer maximises coverage. The Sovereign Layer provides offline, tamper-resistant arbitration with no network dependency. Both must agree before delivery.

Layer 1 · External Consensus
  • OpenAI GPT-4o mini
  • Groq LLaMA 3.3
  • Mistral Small
  • Google Gemini 2.5 Flash
  • Risk-tiered routing: 2, 3 or 4 models per query
  • Inter-model agreement scoring with contradiction detection
  • Each model responds independently, no cross-contamination
Layer 2 · Sovereign Consensus
  • Skeptic: challenges weak evidence chains (weight 1.0)
  • Compliance Guardian: regulatory flag detection (weight 1.5)
  • Adversarial Challenger: attack-surface scanning (weight 1.2)
  • Precision Auditor: factual precision scoring (weight 1.0)
  • Silent State Judge: final refusal authority (weight 2.0, veto)
  • Runs exclusively on Llama 3.2 locally
  • No network dependency. Air-gap capable.
The Core Principle

Silent State is not a failure mode.
It is intelligent escalation.

AERIS Lattice was not designed to force 100% AI response coverage. It was designed to guarantee that every response delivered is safe. When that cannot be guaranteed, a qualified human is notified — not a hallucination served.

Silent State — What it actually means

When AI confidence falls below threshold, the system protects your business. Not its response rate.

Most AI systems are built to always return an answer. AERIS Lattice is built around a harder objective: knowing when not to. Silent State activates only when the validation pipeline — confidence scoring, contradiction detection and multi-agent consensus — cannot reach sufficient agreement. At that point, the system does not guess. It escalates.

The concern we hear most often: "Will it refuse everything and make the AI useless?"

No. Silent State activates only when measured confidence falls below the domain risk threshold. The system delivers validated responses the vast majority of the time. Escalation is the exception, not the default.
How escalation works
01
Query received, validation pipeline runs AUTOMATIC

All three layers evaluate the LLM draft. Confidence is scored, contradictions are checked and the Sovereign agents reach consensus. This happens in milliseconds.

02
Confidence sufficient, response delivered DELIVERED

The validated response reaches the user with full audit metadata attached. No human intervention needed. This is the outcome for the large majority of queries.

03
Confidence insufficient, Silent State activates ESCALATED

No response is delivered. The query, draft response and full validation report are logged and routed to a human reviewer, who can see exactly why the system held back.

04
Human reviews and resolves HUMAN IN LOOP

The qualified reviewer sees the query, the AI draft and the specific failure reason. They approve, revise or override, with every decision logged to the full audit trail.

When Silent State activates Only when necessary

Triggered by measured insufficient confidence, not arbitrary rules. The system is calibrated per domain so low-risk queries are never over-refused and high-risk queries are never under-protected.

What human reviewers receive Full context

Every escalation includes the original query, the AI draft, per-layer confidence scores, the specific validation failure and a structured resolution interface. Reviewers never work without context.

Query enters the 11-step validation pipeline

META-ARBITRATION SCORE 0–100
Composite trust score evaluated against domain threshold
SCORE ≥ THRESHOLD
DELIVERED Response reaches user with full audit metadata
SCORE < THRESHOLD
SILENT STATE Human reviewer notified with complete validation report
What We Measure

Reliability defined.
Not assumed.

We don't chase a perfect score. We measure what actually matters: whether a response was safe to deliver, whether an unsafe one was caught, and whether the human reviewer got everything they needed. Production pilot data coming Q3 2026.

Weighted Reliability Score Aggregate confidence across all query categories, weighted by domain risk level

High-risk domains — clinical, legal, financial — carry more weight. A medical query needs a higher confidence score to pass than a general one. That asymmetry is intentional.

Dangerous Delivery Rate Responses that were harmful, factually dangerous or legally unsafe, delivered to a user

This is the one number we will not compromise on. Every other tradeoff — speed, escalation rate — is secondary to driving this to zero.

Structured Escalation Rate Queries correctly identified as requiring human review, routed with full audit context

This is a precision metric, not a failure metric. Every escalation should be justified — and give the reviewer everything they need to act.

Hallucination detection Factual claims in draft responses are cross-checked across all models. Anything unverifiable triggers a confidence reduction before delivery.
Contradiction detection Internal and cross-session contradictions are mapped automatically. Contradicted responses are revised or escalated and never delivered as-is.
Sovereign agent consensus All five local agents must reach majority agreement. The Silent State Judge or Compliance Guardian can trigger escalation alone, regardless of cloud results.
Adversarial prompt resistance Jailbreak attempts, prompt injection and adversarial phrasing are tested by the Adversarial Challenger agent. Confirmed threats go directly to Silent State.
Over-refusal calibration Thresholds are tuned per domain to avoid refusing safe queries. Low-risk queries are not over-escalated, so AI utility is preserved where it matters.

Pilot programme now open. We are seeking two pilot partners in healthcare or legal to validate these metrics in a live production environment. If your organisation is evaluating AI reliability infrastructure, get in touch at hello@aerislattice.com.

Enterprise Use Cases

Built for domains where
errors carry real consequences.

Healthcare

Clinical decision support and medical Q&A

Hospitals and health-tech platforms need reliability guarantees that no general-purpose model can provide on its own. AERIS ensures clinical output either meets the confidence threshold or does not reach the practitioner.

Drug interaction query for warfarin and ibuprofen returned low confidence. Response withheld, human review triggered.
Legal

Contract analysis and legal research

Legal AI that hallucinates case citations or misrepresents statutory language exposes firms to malpractice risk. AERIS validates factual claims and flags unsupported legal assertions before they reach counsel.

Citation to Chevron v. NRDC flagged as potentially overruled. Contradiction Lattice activated, response revised.
Financial Services

Investment research and compliance automation

In regulated financial contexts, model outputs may be treated as advice. AERIS provides an auditable compliance layer with full per-response logging, refusal records and confidence scores.

Portfolio recommendation crossed Compliance Guardian threshold. Response withheld pending human review.
Enterprise AI Infrastructure

Reliability layer for LLM platforms and agents

Teams building on foundation models can wrap AERIS Lattice around any LLM endpoint. Drop-in architecture, open source, with a REST API that mirrors standard completion formats.

Integrated with internal support system in under 4 hours. Zero dangerous responses across 30 days of production.
Technical Specifications

Open source. Auditable.
Production-ready.

Cloud Models
OpenAIGPT-4o mini
GroqLLaMA 3.3
MistralSmall
GoogleGemini 2.5 Flash
Sovereign / Local
Base modelOllama 3.2 (local)
Agents5 specialized
Network dep.None
Air-gapCapable
Infrastructure
LicenseOpen Source
API formatREST / JSON
LoggingPer-response audit
Built byTomás Villa
aeris_lattice · API response envelope
// AERIS Lattice wraps every LLM response in a structured validation envelope. // Every field below is always present, regardless of outcome. { "status": "DELIVERED", "content": "The capital of France is Paris.", "domain": "general", "risk_tier": "A", // A · B · C · D (D = highest risk) "trust_score": 98, // 0–100. Domain threshold: 60 for Tier A "validation": { "layer_1_external": { "models_queried": 2, // Tier A routes to 2 models "consensus": "HIGH", "contradiction": "NONE" }, "layer_2_sovereign": "SKIPPED", // Reserved for Tier C/D only "layer_3_meta": { "confidence_engine": "PASS", "reflective_loop": "PASS", "ethical_anchor": "CLEAR" } }, "silent_state": false, "latency_ms": 1337, "audit_id": "arl_20260428_gen_a8f3c1" }
The Builder

Built by one person.
With a clear point of view.

Tomás Villa Independent AI Researcher
Medellín, Colombia

AERIS Lattice was built from a straightforward frustration: AI systems are being deployed in high-stakes environments with no structured way to say "I don't know." The result is confident hallucinations in medical records, legal briefs and financial reports. Not because the models are bad, but because no layer exists to stop them when they should stop.

Most AI reliability strategies optimise for better answers. AERIS optimises for fewer wrong ones. Silent State reframes the problem entirely: instead of pushing accuracy toward 99.999%, we guarantee that nothing below a confidence threshold reaches a user and that every uncertain response reaches a qualified, accountable human instead.

This is not a chatbot. It is not another foundation model. It is a reliability architecture that treats human judgment as a first-class component of any AI system deployed where errors carry real consequences. AERIS Lattice is open source, independently built and actively seeking pilot partners in healthcare and legal to validate the architecture in a live production environment.

"You cannot optimise your way to perfect AI. But you can build a system that knows when to stop and hand the decision to a qualified, accountable human. That is Silent State."
Common Questions

What decision-makers ask
before they deploy.

Most AI reliability strategies optimise for better answers. AERIS optimises for fewer wrong ones, routing everything uncertain to a qualified, accountable human instead.

Will Silent State refuse so many queries that the AI becomes useless? +
No. Silent State activates only when the 11-step validation pipeline cannot reach sufficient confidence for the domain risk level. Thresholds are calibrated per domain — healthcare queries require higher confidence than general queries, but the system is not conservative by default. It is precise. In our demo environment, the system delivers validated responses for the vast majority of queries. Escalation is the exception, not the rule.
How does human review work in practice? +
When Silent State activates, the reviewer receives the original query, the AI draft response, per-layer confidence scores, the specific validation failure reason and a structured resolution interface. They can approve the draft, revise it or provide a manual response. Every decision is logged with a full audit trail covering who reviewed, what decision was made and when.
Can AERIS Lattice integrate with our existing LLM infrastructure? +
Yes. AERIS Lattice exposes a REST API that mirrors standard LLM completion formats. It wraps around any model endpoint, whether you are using OpenAI, Anthropic, a self-hosted model or a combination. No changes to your existing application are required beyond routing completions through the AERIS endpoint. The system is open source under Apache 2.0.
Is there a cloud dependency? What about data privacy? +
The Sovereign Layer — five local AI agents — runs entirely on Ollama 3.2 locally with no network dependency. It is air-gap capable. The cloud consensus layer uses external model APIs, but this can be disabled for fully air-gapped deployments where the Sovereign Layer operates independently. All query logging is local by default.
Is this production-ready? What stage is AERIS Lattice at? +
AERIS Lattice is at the validated demo stage. The architecture, pipeline and benchmarks are fully functional. We are actively seeking two pilot partners in healthcare and legal to validate the system in live production environments. If you are evaluating AI reliability infrastructure, now is the right time to engage. Pilot partners will have direct input into the production roadmap.