AI Security Engine — Architecture
Our AI stack spans knowledge-connected reasoning and production-grade edge inference — each engineered for its operational environment. This page documents the technical architecture for security architects, CISOs, and engineering leaders.
Layer 1 — Knowledge-Connected Reasoning (RAG)
Retrieval-augmented generation grounded in allowlisted, versioned sources
Knowledge Sources
- Public documentation (versioned, re-indexed on change)
- Solutions content and release notes
- Live Intelligence Feed (
/api/ai/intel/feed) - Deterministic DB data (products, plans, pricing)
Security Controls
- Allowlisted retrieval only — no open web access
- Context sanitization before LLM injection
- Prompt injection defense (strict refusal policy)
- Self-hosted Mistral-class model via Ollama
- pgvector semantic search (no external APIs)
Layer 2 — Production Edge Inference (QuickSecure)
ONNX-based inference running on endpoint agents with governed model lifecycle
Inference Pipeline
- ONNX Runtime on endpoint agents (<15ms latency)
- Fallback chain: ONNX → Random Forest → Rules
- Autonomous decision projection (≥85% confidence)
- Shadow evaluation before production promotion
Telemetry Pipeline
- Batch event ingestion (
POST /api/ml/events) - Idempotency + feature-hash dedup (5-min buckets)
- Ground-truth labeling with anti-poisoning (TP/FP/FN/TN)
- Drift snapshot collection from endpoints
Layer 3 — ML Governance & Lifecycle
Versioned model registry, drift monitoring, and atomic promotion with rollback
Model Registry
Versioned, signed ONNX artifacts. Stable + Canary promotion tiers. Tenant-scoped or global. Every state transition is audit-logged.
Promotion & Rollback
Atomic promotion with post-save race detection. Rollback re-activates the previous stable version. Canary traffic percentage is configurable per tenant.
Drift Monitoring
Population Stability Index (PSI) scoring. Feature-level drift detection. Automatic severity classification. Retraining triggers when thresholds are breached.
Label Validation
Anti-poisoning logic: FP labels require confirmation from ≥3 distinct endpoints. Rate limiting on FP submissions (50/hr per endpoint). Admin labels auto-approved.
Performance Metrics
Daily confusion matrix aggregation. Precision, recall, FPR, and latency P95 tracked per model version. 90-day retention window.
Audit Trail
Every model promotion, rollback, label, and event ingestion is audit-logged with severity, performer, and entity context. Immutable log trail.
Layer 4 — Security Controls
Operational boundaries enforced at every layer of the AI stack
Stateless LLM Inference
User messages are processed and discarded. No conversation data enters training pipelines. No user content is indexed into knowledge sources.
Zero External Egress
The Ollama inference container has no external internet access. All model weights, embeddings, and retrieval happen on-premise. No data leaves our infrastructure.
Response Filtering (DLP)
Outbound responses are filtered for secrets, tokens, API keys, and credential patterns. System prompt content is never disclosed regardless of user request.
Aggregated Intelligence Only
The intelligence feed publishes only global-scope, anonymized summaries. No feature vectors, file hashes, command lines, endpoint IDs, or tenant data.
Prompt Injection Defense
Strict refusal policy for override attempts. Context sanitization before model injection. System prompt is isolated and non-extractable.
API Authentication
ML endpoints require X-Api-Key validation with hardware binding. Admin routes require JWT/Cookie + Admin role. Rate limiting on all surfaces.
Data Flow
How data moves through the AI Security Engine — simplified
Knowledge Path
Security Path
Security Guarantees
Designed for operational environments. Engineered for trust.
Questions About Our AI Architecture?
For detailed architecture reviews, security assessments, or integration discussions — our engineering team is available.