Choose Your Plan

All plans include all 25 safety domains, real-time violation detection, and deterministic safety guarantees. Pay per token -- scale as you grow.

Shared Inference

$3.00 /1M tokens

No base fee

CPU-powered inference on shared infrastructure. Ideal for development, testing, and light production workloads.

  • All 25 safety domains
  • 1M tokens/mo included
  • Up to 5 seats
  • CPU inference (~30-50ms latency)
  • Dashboard & usage analytics
  • Standard rate limits (60 rpm)
  • Email support
  • Best-effort availability
Most Popular

Dedicated

$2.00 /1M tokens

+ $2,500/mo base fee

Dedicated GPU (RTX 4000 Ada) with isolated database. Continual learning on your domain data. 5-10x faster inference.

  • All 25 safety domains
  • 10M tokens/mo included
  • Up to 50 seats
  • Dedicated GPU inference (~5-10ms latency)
  • Isolated database + node pool
  • Continual learning on your data
  • Priority rate limits (300 rpm)
  • Custom domain gates
  • Slack + email support
  • 99.5% uptime SLA

Enterprise Security

$1.50 /1M tokens

+ $5,500/mo base fee

H100 GPU with isolated VPC, HA database, and full audit trail. Built to HIPAA & SOC 2 standards. BAA available.

  • All 25 safety domains
  • 100M tokens/mo included
  • Unlimited seats
  • H100 GPU inference (~3-5ms latency)
  • Isolated HA infrastructure + VPC
  • Built to HIPAA & SOC 2 standards
  • BAA available
  • Full audit trail & explainability
  • Dedicated support engineer
  • 99.9% uptime SLA

Infrastructure at Every Tier

Every plan runs SolaceSentry's custom 350M parameter transformer with 4 judge transformers and dual-model courtroom. Higher tiers unlock dedicated GPU compute and isolated infrastructure.

Feature Shared Dedicated Enterprise
Compute CPU (INT8 quantized) RTX 4000 Ada GPU H100 GPU
Inference Latency ~30-50ms ~5-10ms ~3-5ms
Included Tokens 1M/mo 10M/mo 100M/mo
Infrastructure Shared pool Isolated node + DB HA cluster + VPC
Continual Learning --
Uptime SLA Best-effort 99.5% 99.9%
Add-On

Predictions

AI-powered cost forecasting with Bayesian analysis, regime detection, and Monte Carlo scenario planning across all 25 safety domains.

  • Token cost prediction with Hierarchical Bayes
  • CUSUM regime change-point detection
  • Monte Carlo scenario analysis
  • 25-domain partial pooling
$249 /mo

Available for Dedicated & Enterprise

Frequently Asked Questions

What counts as a token?

Tokens are the units processed by our inference engine. Both input and output tokens are counted. Our custom BPE tokenizer is optimized for safety-domain vocabulary.

Can I switch plans later?

Yes. You can upgrade or downgrade at any time. Changes take effect at the start of your next billing cycle. No lock-in contracts.

What does "Built to HIPAA & SOC 2 standards" mean?

Our Enterprise Security tier is architected following HIPAA and SOC 2 frameworks. Isolated infrastructure, full audit logging, encryption at rest and in transit, and BAA available upon request.

What's the difference between Shared and Dedicated inference?

Shared inference runs on CPU-optimized pooled infrastructure with INT8 quantization (~30-50ms latency). Dedicated and Enterprise tiers include their own GPU -- RTX 4000 Ada for Dedicated, H100 for Enterprise -- giving you 5-10x faster inference, isolated compute, and continual learning on your data.

What are included tokens?

Each plan includes a monthly token allowance (Shared: 1M, Dedicated: 10M, Enterprise: 100M). Usage beyond the included amount is billed at the per-1M-token overage rate. Unused tokens do not roll over.