Architecture Overview

AI Security Engine — Architecture

Our AI stack spans knowledge-connected reasoning and production-grade edge inference — each engineered for its operational environment. This page documents the technical architecture for security architects, CISOs, and engineering leaders.

Layer 1 — Knowledge-Connected Reasoning (RAG)

Retrieval-augmented generation grounded in allowlisted, versioned sources

Knowledge Sources

  • Public documentation (versioned, re-indexed on change)
  • Solutions content and release notes
  • Live Intelligence Feed (/api/ai/intel/feed)
  • Deterministic DB data (products, plans, pricing)

Security Controls

  • Allowlisted retrieval only — no open web access
  • Context sanitization before LLM injection
  • Prompt injection defense (strict refusal policy)
  • Self-hosted Mistral-class model via Ollama
  • pgvector semantic search (no external APIs)

Layer 2 — Production Edge Inference (QuickSecure)

ONNX-based inference running on endpoint agents with governed model lifecycle

Inference Pipeline

  • ONNX Runtime on endpoint agents (<15ms latency)
  • Fallback chain: ONNX → Random Forest → Rules
  • Autonomous decision projection (≥85% confidence)
  • Shadow evaluation before production promotion

Telemetry Pipeline

  • Batch event ingestion (POST /api/ml/events)
  • Idempotency + feature-hash dedup (5-min buckets)
  • Ground-truth labeling with anti-poisoning (TP/FP/FN/TN)
  • Drift snapshot collection from endpoints

Layer 3 — ML Governance & Lifecycle

Versioned model registry, drift monitoring, and atomic promotion with rollback

Model Registry

Versioned, signed ONNX artifacts. Stable + Canary promotion tiers. Tenant-scoped or global. Every state transition is audit-logged.

Promotion & Rollback

Atomic promotion with post-save race detection. Rollback re-activates the previous stable version. Canary traffic percentage is configurable per tenant.

Drift Monitoring

Population Stability Index (PSI) scoring. Feature-level drift detection. Automatic severity classification. Retraining triggers when thresholds are breached.

Label Validation

Anti-poisoning logic: FP labels require confirmation from ≥3 distinct endpoints. Rate limiting on FP submissions (50/hr per endpoint). Admin labels auto-approved.

Performance Metrics

Daily confusion matrix aggregation. Precision, recall, FPR, and latency P95 tracked per model version. 90-day retention window.

Audit Trail

Every model promotion, rollback, label, and event ingestion is audit-logged with severity, performer, and entity context. Immutable log trail.

Layer 4 — Security Controls

Operational boundaries enforced at every layer of the AI stack

Stateless LLM Inference

User messages are processed and discarded. No conversation data enters training pipelines. No user content is indexed into knowledge sources.

Zero External Egress

The Ollama inference container has no external internet access. All model weights, embeddings, and retrieval happen on-premise. No data leaves our infrastructure.

Response Filtering (DLP)

Outbound responses are filtered for secrets, tokens, API keys, and credential patterns. System prompt content is never disclosed regardless of user request.

Aggregated Intelligence Only

The intelligence feed publishes only global-scope, anonymized summaries. No feature vectors, file hashes, command lines, endpoint IDs, or tenant data.

Prompt Injection Defense

Strict refusal policy for override attempts. Context sanitization before model injection. System prompt is isolated and non-extractable.

API Authentication

ML endpoints require X-Api-Key validation with hardware binding. Admin routes require JWT/Cookie + Admin role. Rate limiting on all surfaces.

Data Flow

How data moves through the AI Security Engine — simplified

Knowledge Path

Visitor → RAG Retrieval (allowlisted) → Context Sanitization → LLM → DLP Filter → Response

Security Path

Agent → ONNX Inference → Telemetry Ingestion → Aggregation → Intel Feed → RAG Context

Security Guarantees

Designed for operational environments. Engineered for trust.

Reads only from approved, allowlisted sources.
Never trains on user input. Stateless by design.
Never exposes tenant, customer, or endpoint-level data.
Intelligence feed is aggregated and anonymized only.
Designed for operational environments.

Questions About Our AI Architecture?

For detailed architecture reviews, security assessments, or integration discussions — our engineering team is available.