Choose Your Plan
All plans include all 25 safety domains, real-time violation detection, and deterministic safety guarantees. Pay per token -- scale as you grow.
Shared Inference
No base fee
CPU-powered inference on shared infrastructure. Ideal for development, testing, and light production workloads.
- All 25 safety domains
- 1M tokens/mo included
- Up to 5 seats
- CPU inference (~30-50ms latency)
- Dashboard & usage analytics
- Standard rate limits (60 rpm)
- Email support
- Best-effort availability
Dedicated
+ $2,500/mo base fee
Dedicated GPU (RTX 4000 Ada) with isolated database. Continual learning on your domain data. 5-10x faster inference.
- All 25 safety domains
- 10M tokens/mo included
- Up to 50 seats
- Dedicated GPU inference (~5-10ms latency)
- Isolated database + node pool
- Continual learning on your data
- Priority rate limits (300 rpm)
- Custom domain gates
- Slack + email support
- 99.5% uptime SLA
Enterprise Security
+ $5,500/mo base fee
H100 GPU with isolated VPC, HA database, and full audit trail. Built to HIPAA & SOC 2 standards. BAA available.
- All 25 safety domains
- 100M tokens/mo included
- Unlimited seats
- H100 GPU inference (~3-5ms latency)
- Isolated HA infrastructure + VPC
- Built to HIPAA & SOC 2 standards
- BAA available
- Full audit trail & explainability
- Dedicated support engineer
- 99.9% uptime SLA
Infrastructure at Every Tier
Every plan runs SolaceSentry's custom 350M parameter transformer with 4 judge transformers and dual-model courtroom. Higher tiers unlock dedicated GPU compute and isolated infrastructure.
| Feature | Shared | Dedicated | Enterprise |
|---|---|---|---|
| Compute | CPU (INT8 quantized) | RTX 4000 Ada GPU | H100 GPU |
| Inference Latency | ~30-50ms | ~5-10ms | ~3-5ms |
| Included Tokens | 1M/mo | 10M/mo | 100M/mo |
| Infrastructure | Shared pool | Isolated node + DB | HA cluster + VPC |
| Continual Learning | -- | ||
| Uptime SLA | Best-effort | 99.5% | 99.9% |
Predictions
AI-powered cost forecasting with Bayesian analysis, regime detection, and Monte Carlo scenario planning across all 25 safety domains.
- Token cost prediction with Hierarchical Bayes
- CUSUM regime change-point detection
- Monte Carlo scenario analysis
- 25-domain partial pooling
Available for Dedicated & Enterprise
Frequently Asked Questions
What counts as a token?
Tokens are the units processed by our inference engine. Both input and output tokens are counted. Our custom BPE tokenizer is optimized for safety-domain vocabulary.
Can I switch plans later?
Yes. You can upgrade or downgrade at any time. Changes take effect at the start of your next billing cycle. No lock-in contracts.
What does "Built to HIPAA & SOC 2 standards" mean?
Our Enterprise Security tier is architected following HIPAA and SOC 2 frameworks. Isolated infrastructure, full audit logging, encryption at rest and in transit, and BAA available upon request.
What's the difference between Shared and Dedicated inference?
Shared inference runs on CPU-optimized pooled infrastructure with INT8 quantization (~30-50ms latency). Dedicated and Enterprise tiers include their own GPU -- RTX 4000 Ada for Dedicated, H100 for Enterprise -- giving you 5-10x faster inference, isolated compute, and continual learning on your data.
What are included tokens?
Each plan includes a monthly token allowance (Shared: 1M, Dedicated: 10M, Enterprise: 100M). Usage beyond the included amount is billed at the per-1M-token overage rate. Unused tokens do not roll over.
Select Safety Domains
Choose the domains you need. All plans include access to your selected domains.