AI Models

Frontier LLM Models for On Premise Deployments

A curated selection of the most capable open-source AI models available today, optimized for deployment on NVIDIA Blackwell-powered hardware. These production-ready models deliver state-of-the-art performance while running entirely on your premises — ensuring complete data sovereignty and eliminating cloud dependencies.

Select a server above to highlight the AI models it supports.

Release Date Model Modality Vendor Params
Total
Params
Active
Size Status
2025-04-05 Llama 4 Maverick Meta
Stable
Stable
2024-07-23 Llama 3.1 Meta
Stable
Stable
2026-03-10 Nemotron 3 Super Nvidia
Stable
2025-08-05 GPT oss 120b OpenAI
Stable
2024-07-24 Mistral Large Instruct Mistral
Stable
Experimental
2024-04-10 Mixtral 8x22B Mistral
Stable
2025-12-08 Devstral 2 Mistral
Experimental
2025-12-01 Mistral Large 3 Mistral
Stable
2025-09-01 Apertus Swiss AI
Stable
2026-02-16 Qwen 3.5 Alibaba Cloud
Stable
Experimental
2025-04-28 Qwen 3 Alibaba Cloud
Stable
2025-09-23 Qwen 3 VL Alibaba Cloud
Stable
2025-09-10 Qwen 3 Next Thinking Alibaba Cloud
Stable
2025-09-10 Qwen 3 Next Instruct Alibaba Cloud
Stable
2025-07-22 Qwen 3 Coder Alibaba Cloud
Stable
2026-02-03 Qwen 3 Coder Next Alibaba Cloud
Stable
2026-02-10 GLM 5 Z AI
Stable
2025-12-22 GLM 4.7 Z AI
Stable
Stable
2025-09-30 GLM 4.6 Z AI
Stable
2025-11-06 Kimi K2 Thinking Moonshot AI
Stable
2026-01-26 Kimi K2.5 Moonshot AI
Stable
2026-02-12 MiniMax M2.5 MiniMax AI
Stable
Experimental
2025-01-20 DeepSeek R1 DeepSeek AI
Stable
2025-11-30 DeepSeek V3.2 DeepSeek AI
Stable
Experimental
Stable
2026-01-27 Trinity Large Arcee AI
Experimental
2026-02-11 Step 3.5 Flash Stepfun AI
Stable
Experimental
2026-03-25 Cohere Transcribe Cohere Labs
Stable
2026-03-03 LTX-2.3 Lightricks
Stable
2025-11-25 FLUX.2 Dev Black Forest Labs
Stable

Frequently Asked Questions

  • Parameters refer to the number of learnable weights in the model (measured in billions, e.g., 70B). Size refers to the storage space required for the model files (measured in GB). Quantized models have fewer bits per parameter, resulting in smaller file sizes while maintaining most of the model's capabilities.
  • In production, you need more VRAM than just the model size due to KV cache memory for context handling. With FP8 quantized KV cache (standard in production), plan for roughly 1.4–1.5× the model size. For example, a 550 GB model runs comfortably on 768 GB VRAM with FP8 KV cache. This is virtually lossless — NVIDIA H100/H200 GPUs have native FP8 tensor core support, making it essentially free performance-wise. With default BF16 KV cache, you'd need 1.7–2× model size instead.
  • Experimental models are newer quantizations or configurations that are still being validated. They may offer better performance or efficiency but haven't been thoroughly tested in production environments. Stable models have been verified for reliable operation.
  • Choose based on the models you need. Larger models require more VRAM. The S tier (96GB) handles most 70B models, M tier (384GB) supports multiple large models simultaneously, L (768GB) and XL (1440GB) tiers enable the largest frontier models like Llama 4 Maverick and DeepSeek V3.
  • Yes, if you have sufficient VRAM. The total size of loaded models must fit within your server's available memory. Larger tiers allow running several models concurrently for different use cases.
  • Modalities indicate what types of data a model can process (input) and generate (output). Text models handle written content, image models can analyze or generate visuals, code models are optimized for programming tasks, and multimodal models combine multiple capabilities.