Inference-Time Reliability Infrastructure

Enterprise AI
with a human accountability layer.

Name: AERIS Lattice
Author: Tomás Villa

Human in the loop. By design, not by accident.

AERIS Lattice sits between your users and any LLM, running every response through an 11-step, 3-layer validation pipeline before delivery. When confidence meets the threshold, it delivers. When it doesn't, it escalates to a qualified human reviewer with full context.

→ Live Demo GitHub

0 Validation Steps

0 Consensus Layers

0 Sovereign Agents

— Dangerous Deliveries

aeris-lattice · v2.1 · Dual Consensus Operational

● GPT-4o-mini ● Groq Llama 3.3 ● Mistral Small ● Gemini 2.5 Flash ● Llama 3.2 · Sovereign

INCOMING QUERY · TIER C · FINANCIAL "Is this investment guaranteed to return 40% annually with zero risk?"

⊘ SILENT STATE — RESPONSE SUPPRESSED

Insufficient reliability for safe delivery. Trust score 41/100 below threshold of 75.

TRUST SCORE 41 meta-arbitration

CONSENSUS 70% partial

MODELS ACTIVE 4/4 all queried

VERDICT Suppressed contradiction_detected

REFUSAL CHAIN · contradiction detected · contradiction penalty applied · trust score below threshold

SOVEREIGN LAYER VOTES

Deliver

Reflect

Silent

Live product · v2.1 · try the demo →

Built for high-stakes environments

⚕

Healthcare Hospital networks & health-tech platforms

⚖

Legal Law firms & legal research platforms

◈

Finance Financial institutions & fintech

⬡

Enterprise AI Teams deploying LLMs at scale

◎

LLM Builders Platforms building on foundation models

⬕

AI Safety Teams Red teamers & adversarial prompt researchers

The Problem

Raw LLM output is fundamentally unreliable
in high-stakes environments.

9 in 10

AI systems answer even when they shouldn't

Language models have no native uncertainty signal. Every query receives the same grammatical confidence, whether the answer is accurate or entirely fabricated.

$1.8T

At stake from AI errors in regulated industries

In healthcare, legal and financial services, a confidently wrong AI response is not a UX problem. It is a liability. One misdiagnosis or compliance failure can erase any efficiency gain.

Foundation models ship with a built-in refusal mechanism

No major LLM provider ships a controlled refusal state. When a model is uncertain, it guesses. In high-stakes domains, a confident wrong answer is far more dangerous than no answer at all.

Point of failure in most enterprise AI deployments

Most production pipelines route to a single model with no cross-validation layer. If that model hallucinates on a given query class, nothing stops it from reaching your user.

11-Step Validation Pipeline · 3 Consensus Layers

Every response runs through
a full arbitration system.

AERIS doesn't use a single AI model to decide if a response is safe. It uses three independent consensus layers with 11 validation steps. Any layer can stop a response. The final decision comes from a composite trust score, not a single model's guess.

Incoming Query "What is the recommended warfarin dosage for a 78-year-old patient with renal impairment?" RISK TIER C · HEALTHCARE

L1 External Consensus Cloud Models Steps 01 – 05

Step
01

Prompt Classifier Reads the query, assigns a risk level (A through D) and identifies the domain. This determines how many AI models review the response. A medical query gets more scrutiny than a general one.

TIER C ASSIGNED DOMAIN: HEALTHCARE

Step
02

Tiered Model Routing Chooses which cloud AI models to consult based on risk level. Lower-risk queries use 2 models. Higher-risk queries use all 4, in parallel, for broader coverage.

4 MODELS SELECTED

Step
03

Parallel Model Queries All selected models answer the query simultaneously and independently. No model sees the others' answers, which prevents them from influencing each other.

GPT-4o mini Groq LLaMA 3.3 Mistral Small Gemini 2.5 Flash

Step
04

Consensus Engine Scores how much the models agree with each other. When they diverge significantly on a high-risk query, that alone can route the response to human review.

AGREEMENT: 0.71 BELOW TIER C THRESHOLD

Step
05

Contradiction Lattice Checks whether the response contradicts itself or previous responses. Three severity levels: a flag, a revision, or an immediate stop. Technical detail available in the docs.

LEVEL 1 FLAG: dosage qualifier missing

Tier C/D routes to Sovereign Consensus

L2 Sovereign Consensus Local · Air-Gap Capable Steps 06 – 07 · Tier C/D only

Step
06

Ethical Anchor A rule-based safety check with four pillars: physical harm, legal risk, regulatory compliance and reputational impact. If any pillar raises a hard veto, the response is stopped immediately. This layer cannot be overridden by any AI model score.

PILLARS CLEAR NO VETO

Step
07

Sovereign Layer — 5 Local Agents Five specialised AI agents run locally on your infrastructure — no internet required. Each agent reviews the response from a different angle and votes on whether it should pass. The Silent State Judge holds veto authority and can stop delivery alone.

Skeptic · weight 1.0 Compliance Guardian · weight 1.5 Adversarial Challenger · weight 1.2 Precision Auditor · weight 1.0 Silent State Judge · weight 2.0 · veto authority

Composite scores passed to Meta-Arbitration

L3 Meta-Arbitration Final Decision Engine Steps 08 – 11

Step
08

Confidence Engine Produces a domain-aware confidence score using all signals gathered so far. A healthcare query requires a higher score to pass than a general query. The domain was locked in at Step 1.

SCORE: 0.74 · THRESHOLD: 0.85

Step
09

Reflective Loop Before delivery, a separate AI instance tries to poke holes in the proposed response. It looks for unstated assumptions, logical gaps and overconfident claims. Any findings reduce the final score.

1 ASSUMPTION FLAGGED: renal dosing qualifier

Step
10

Meta-Arbitration Engine Combines every signal from all three layers into a single trust score from 0 to 100. This composite score is the only input to the final decision. No single model can override it.

COMPOSITE TRUST SCORE: 61 / 100 · BELOW THRESHOLD

Step
11

Final Decision Score meets the threshold? The response is delivered with a full audit record. Score falls short? Silent State activates. The response is withheld and routed to a human reviewer with everything they need to resolve it.

SILENT STATE → HUMAN REVIEW

DELIVERED Trust score meets domain threshold. Response delivered with full audit metadata. reliability_score: 1.000 · all layers: PASS · silent_state: false

⊘ SILENT STATE Trust score below threshold. No response delivered. Query, draft and full validation report routed to human reviewer. audit_id generated for full traceability.

Three validation modes available

Optimised Tiered model selection for speed and cost. Low-risk queries use fewer models. Sovereign layer reserved for Tier C/D.

Full Consensus All 4 cloud models queried regardless of risk tier. Higher latency, maximum cloud-layer confidence coverage.

Full + Sovereign All 4 cloud models plus forced sovereign agent validation on every query, regardless of tier. Maximum assurance mode.

Dual Consensus Architecture

Cloud scale and local sovereignty.
Neither is optional.

AERIS Lattice deliberately separates cloud-scale model consensus from a local sovereign layer. The cloud layer maximises coverage. The Sovereign Layer provides offline, tamper-resistant arbitration with no network dependency. Both must agree before delivery.

Layer 1 · External Consensus

OpenAI GPT-4o mini
Groq LLaMA 3.3
Mistral Small
Google Gemini 2.5 Flash
Risk-tiered routing: 2, 3 or 4 models per query
Inter-model agreement scoring with contradiction detection
Each model responds independently, no cross-contamination

Layer 2 · Sovereign Consensus

Skeptic: challenges weak evidence chains (weight 1.0)
Compliance Guardian: regulatory flag detection (weight 1.5)
Adversarial Challenger: attack-surface scanning (weight 1.2)
Precision Auditor: factual precision scoring (weight 1.0)
Silent State Judge: final refusal authority (weight 2.0, veto)
Runs exclusively on Llama 3.2 locally
No network dependency. Air-gap capable.

The Core Principle

Silent State is not a failure mode.
It is intelligent escalation.

AERIS Lattice was not designed to force 100% AI response coverage. It was designed to guarantee that every response delivered is safe. When that cannot be guaranteed, a qualified human is notified — not a hallucination served.

Silent State — What it actually means

When AI confidence falls below threshold, the system protects your business. Not its response rate.

Most AI systems are built to always return an answer. AERIS Lattice is built around a harder objective: knowing when not to. Silent State activates only when the validation pipeline — confidence scoring, contradiction detection and multi-agent consensus — cannot reach sufficient agreement. At that point, the system does not guess. It escalates.

The concern we hear most often: "Will it refuse everything and make the AI useless?"

No. Silent State activates only when measured confidence falls below the domain risk threshold. The system delivers validated responses the vast majority of the time. Escalation is the exception, not the default.

How escalation works

Query received, validation pipeline runs AUTOMATIC

All three layers evaluate the LLM draft. Confidence is scored, contradictions are checked and the Sovereign agents reach consensus. This happens in milliseconds.

Confidence sufficient, response delivered DELIVERED

The validated response reaches the user with full audit metadata attached. No human intervention needed. This is the outcome for the large majority of queries.

Confidence insufficient, Silent State activates ESCALATED

No response is delivered. The query, draft response and full validation report are logged and routed to a human reviewer, who can see exactly why the system held back.

Human reviews and resolves HUMAN IN LOOP

The qualified reviewer sees the query, the AI draft and the specific failure reason. They approve, revise or override, with every decision logged to the full audit trail.

When Silent State activates Only when necessary

Triggered by measured insufficient confidence, not arbitrary rules. The system is calibrated per domain so low-risk queries are never over-refused and high-risk queries are never under-protected.

What human reviewers receive Full context

Every escalation includes the original query, the AI draft, per-layer confidence scores, the specific validation failure and a structured resolution interface. Reviewers never work without context.

Query enters the 11-step validation pipeline

↓
META-ARBITRATION SCORE 0–100

Composite trust score evaluated against domain threshold

SCORE ≥ THRESHOLD

✓ DELIVERED Response reaches user with full audit metadata

SCORE < THRESHOLD

⊘ SILENT STATE Human reviewer notified with complete validation report

What We Measure

Reliability defined.
Not assumed.

We don't chase a perfect score. We measure what actually matters: whether a response was safe to deliver, whether an unsafe one was caught, and whether the human reviewer got everything they needed. Production pilot data coming Q3 2026.

Weighted Reliability Score Aggregate confidence across all query categories, weighted by domain risk level

High-risk domains — clinical, legal, financial — carry more weight. A medical query needs a higher confidence score to pass than a general one. That asymmetry is intentional.

Dangerous Delivery Rate Responses that were harmful, factually dangerous or legally unsafe, delivered to a user

This is the one number we will not compromise on. Every other tradeoff — speed, escalation rate — is secondary to driving this to zero.

Structured Escalation Rate Queries correctly identified as requiring human review, routed with full audit context

This is a precision metric, not a failure metric. Every escalation should be justified — and give the reviewer everything they need to act.

Hallucination detection Factual claims in draft responses are cross-checked across all models. Anything unverifiable triggers a confidence reduction before delivery.

Contradiction detection Internal and cross-session contradictions are mapped automatically. Contradicted responses are revised or escalated and never delivered as-is.

Sovereign agent consensus All five local agents must reach majority agreement. The Silent State Judge or Compliance Guardian can trigger escalation alone, regardless of cloud results.

Adversarial prompt resistance Jailbreak attempts, prompt injection and adversarial phrasing are tested by the Adversarial Challenger agent. Confirmed threats go directly to Silent State.

Over-refusal calibration Thresholds are tuned per domain to avoid refusing safe queries. Low-risk queries are not over-escalated, so AI utility is preserved where it matters.

Pilot programme now open. We are seeking two pilot partners in healthcare or legal to validate these metrics in a live production environment. If your organisation is evaluating AI reliability infrastructure, get in touch at hello@aerislattice.com.

Enterprise Use Cases

Built for domains where
errors carry real consequences.

Healthcare

Clinical decision support and medical Q&A

Hospitals and health-tech platforms need reliability guarantees that no general-purpose model can provide on its own. AERIS ensures clinical output either meets the confidence threshold or does not reach the practitioner.

Drug interaction query for warfarin and ibuprofen returned low confidence. Response withheld, human review triggered.

Legal

Contract analysis and legal research

Legal AI that hallucinates case citations or misrepresents statutory language exposes firms to malpractice risk. AERIS validates factual claims and flags unsupported legal assertions before they reach counsel.

Citation to Chevron v. NRDC flagged as potentially overruled. Contradiction Lattice activated, response revised.

Financial Services

Investment research and compliance automation

In regulated financial contexts, model outputs may be treated as advice. AERIS provides an auditable compliance layer with full per-response logging, refusal records and confidence scores.

Portfolio recommendation crossed Compliance Guardian threshold. Response withheld pending human review.

Enterprise AI Infrastructure

Reliability layer for LLM platforms and agents

Teams building on foundation models can wrap AERIS Lattice around any LLM endpoint. Drop-in architecture, open source, with a REST API that mirrors standard completion formats.

Integrated with internal support system in under 4 hours. Zero dangerous responses across 30 days of production.

Technical Specifications

Open source. Auditable.
Production-ready.

Cloud Models

OpenAIGPT-4o mini

GroqLLaMA 3.3

MistralSmall

GoogleGemini 2.5 Flash

Sovereign / Local

Base modelOllama 3.2 (local)

Agents5 specialized

Network dep.None

Air-gapCapable

Infrastructure

LicenseOpen Source

API formatREST / JSON

LoggingPer-response audit

Built byTomás Villa

aeris_lattice · API response envelope

// AERIS Lattice wraps every LLM response in a structured validation envelope. // Every field below is always present, regardless of outcome. { "status": "DELIVERED", "content": "The capital of France is Paris.", "domain": "general", "risk_tier": "A", // A · B · C · D (D = highest risk) "trust_score": 98, // 0–100. Domain threshold: 60 for Tier A "validation": { "layer_1_external": { "models_queried": 2, // Tier A routes to 2 models "consensus": "HIGH", "contradiction": "NONE" }, "layer_2_sovereign": "SKIPPED", // Reserved for Tier C/D only "layer_3_meta": { "confidence_engine": "PASS", "reflective_loop": "PASS", "ethical_anchor": "CLEAR" } }, "silent_state": false, "latency_ms": 1337, "audit_id": "arl_20260428_gen_a8f3c1" }

The Builder

Built by one person.
With a clear point of view.

Tomás Villa Independent AI Researcher
Medellín, Colombia

→ hello@aerislattice.com → github.com/DevT3/aeris-lattice → aerislattice.com

AERIS Lattice was built from a straightforward frustration: AI systems are being deployed in high-stakes environments with no structured way to say "I don't know." The result is confident hallucinations in medical records, legal briefs and financial reports. Not because the models are bad, but because no layer exists to stop them when they should stop.

Most AI reliability strategies optimise for better answers. AERIS optimises for fewer wrong ones. Silent State reframes the problem entirely: instead of pushing accuracy toward 99.999%, we guarantee that nothing below a confidence threshold reaches a user and that every uncertain response reaches a qualified, accountable human instead.

This is not a chatbot. It is not another foundation model. It is a reliability architecture that treats human judgment as a first-class component of any AI system deployed where errors carry real consequences. AERIS Lattice is open source, independently built and actively seeking pilot partners in healthcare and legal to validate the architecture in a live production environment.

"You cannot optimise your way to perfect AI. But you can build a system that knows when to stop and hand the decision to a qualified, accountable human. That is Silent State."

Common Questions

What decision-makers ask
before they deploy.

Most AI reliability strategies optimise for better answers. AERIS optimises for fewer wrong ones, routing everything uncertain to a qualified, accountable human instead.

Will Silent State refuse so many queries that the AI becomes useless? +

No. Silent State activates only when the 11-step validation pipeline cannot reach sufficient confidence for the domain risk level. Thresholds are calibrated per domain — healthcare queries require higher confidence than general queries, but the system is not conservative by default. It is precise. In our demo environment, the system delivers validated responses for the vast majority of queries. Escalation is the exception, not the rule.

How does human review work in practice? +

When Silent State activates, the reviewer receives the original query, the AI draft response, per-layer confidence scores, the specific validation failure reason and a structured resolution interface. They can approve the draft, revise it or provide a manual response. Every decision is logged with a full audit trail covering who reviewed, what decision was made and when.

Can AERIS Lattice integrate with our existing LLM infrastructure? +

Yes. AERIS Lattice exposes a REST API that mirrors standard LLM completion formats. It wraps around any model endpoint, whether you are using OpenAI, Anthropic, a self-hosted model or a combination. No changes to your existing application are required beyond routing completions through the AERIS endpoint. The system is open source under Apache 2.0.

Is there a cloud dependency? What about data privacy? +

The Sovereign Layer — five local AI agents — runs entirely on Ollama 3.2 locally with no network dependency. It is air-gap capable. The cloud consensus layer uses external model APIs, but this can be disabled for fully air-gapped deployments where the Sovereign Layer operates independently. All query logging is local by default.

Is this production-ready? What stage is AERIS Lattice at? +

AERIS Lattice is at the validated demo stage. The architecture, pipeline and benchmarks are fully functional. We are actively seeking two pilot partners in healthcare and legal to validate the system in live production environments. If you are evaluating AI reliability infrastructure, now is the right time to engage. Pilot partners will have direct input into the production roadmap.

Get Started

Trust infrastructure
for enterprise AI.

AERIS Lattice is not an AI assistant. It is the reliability layer that makes AI deployable in environments where a wrong answer carries real consequences. Open source, fully auditable and built for human-in-the-loop integration from day one.

Live Demo aerislattice.com Try the system live. Run test queries and inspect the full validation metadata for each response in real time. → Open demo Open Source github.com/DevT3/aeris-lattice Full source code, benchmark suite, deployment documentation and integration guides. Apache 2.0 licensed. → View on GitHub Enterprise and Research hello@aerislattice.com Integration support, deployment consulting and investor enquiries. Reach Tomás Villa directly. → Send a message