Ektos AI is now in Early Access!Join our Discord 

Available models for Inference with Dedicated Deployments #

The models listed below are available to deploy and use with Dedicated Deployments.

Text Models #

NameGPUGPU countString in APIAvailable QuantizationsMaximum Context LengthLicense
Llama 3.3 70B InstructNVIDIA H1002llama-3.3-70b-instructbf16, fp8131kLlama 3.3 Community License Agreement
-NVIDIA H1001llama-3.3-70b-instructfp8131kLlama 3.3 Community License Agreement
Llama 3.1 70B InstructNVIDIA H1002llama-3.1-70b-instructbf16, fp8131kLlama 3.1 Community License Agreement
-NVIDIA H1001llama-3.1-70b-instructfp8131kLlama 3.1 Community License Agreement
Qwen 2.5 Coder 32B InstructNVIDIA H1001qwen2.5-coder-32b-instructbf1632kApache License 2.0
-NVIDIA H1002qwen2.5-coder-32b-instructbf1632kApache License 2.0
Qwen 2.5 32B InstructNVIDIA H1001qwen2.5-32b-instructbf1632kApache License 2.0
-NVIDIA H1002qwen2.5-32b-instructbf1632kApache License 2.0
Gemma 2 27B InstructNVIDIA H1001gemma-2-27b-itbf164kGemma
-NVIDIA H1002gemma-2-27b-itbf164kGemma
Qwen 2.5 Coder 14B InstructNVIDIA H1002qwen2.5-coder-14b-instructbf1632kApache License 2.0
-NVIDIA L40S1qwen2.5-coder-14b-instructbf1632kApache License 2.0
-NVIDIA H1001qwen2.5-coder-14b-instructbf1632kApache License 2.0
legml-v0.1NVIDIA H1002legml-v0.1bf1632kApache License 2.0
-NVIDIA L40S1legml-v0.1bf1632kApache License 2.0
-NVIDIA H1001legml-v0.1bf1632kApache License 2.0
Qwen 2.5 14B InstructNVIDIA L40S1qwen2.5-14b-instructbf1632kApache License 2.0
-NVIDIA H1001qwen2.5-14b-instructbf1632kApache License 2.0
-NVIDIA H1002qwen2.5-14b-instructbf1632kApache License 2.0
Pixtral 12b 2409NVIDIA L40S1pixtral-12b-2409bf16128kApache License 2.0
-NVIDIA H1001pixtral-12b-2409bf16128kApache License 2.0
-NVIDIA H1002pixtral-12b-2409bf16128kApache License 2.0
Mistral Nemo Instruct 2407NVIDIA H1002mistral-nemo-instruct-2407bf16128kApache License 2.0
-NVIDIA H1001mistral-nemo-instruct-2407bf16128kApache License 2.0
-NVIDIA L40S1mistral-nemo-instruct-2407bf16128kApache License 2.0
Gemma 2 9B InstructNVIDIA L41gemma-2-9b-itbf164kGemma
-NVIDIA L40S1gemma-2-9b-itbf164kGemma
-NVIDIA H1001gemma-2-9b-itbf164kGemma
-NVIDIA H1002gemma-2-9b-itbf164kGemma
Llama 3.1 8B InstructNVIDIA L40S1llama-3.1-8b-instructbf16, fp8131kLlama 3.1 Community License Agreement
-NVIDIA H1001llama-3.1-8b-instructbf16, fp8131kLlama 3.1 Community License Agreement
-NVIDIA H1002llama-3.1-8b-instructbf16, fp8131kLlama 3.1 Community License Agreement
-NVIDIA L41llama-3.1-8b-instructbf16, fp8131kLlama 3.1 Community License Agreement
Qwen 2.5 Coder 7B InstructNVIDIA H1001qwen2.5-coder-7b-instructbf1632kApache License 2.0
-NVIDIA H1002qwen2.5-coder-7b-instructbf1632kApache License 2.0
-NVIDIA L40S1qwen2.5-coder-7b-instructbf1632kApache License 2.0
-NVIDIA L41qwen2.5-coder-7b-instructbf1632kApache License 2.0
Qwen 2.5 7B InstructNVIDIA L40S1qwen2.5-7b-instructbf1632kApache License 2.0
-NVIDIA L41qwen2.5-7b-instructbf1632kApache License 2.0
-NVIDIA H1001qwen2.5-7b-instructbf1632kApache License 2.0
-NVIDIA H1002qwen2.5-7b-instructbf1632kApache License 2.0
Phi 3.5 Mini InstructNVIDIA L40S1phi-3.5-mini-instructbf16131kMIT License
-NVIDIA H1001phi-3.5-mini-instructbf16131kMIT License
-NVIDIA H1002phi-3.5-mini-instructbf16131kMIT License
-NVIDIA L41phi-3.5-mini-instructbf16131kMIT License
Gemma 2 2B InstructNVIDIA L41gemma-2-2b-itbf164kGemma
-NVIDIA L40S1gemma-2-2b-itbf164kGemma
-NVIDIA H1001gemma-2-2b-itbf164kGemma
-NVIDIA H1002gemma-2-2b-itbf164kGemma
Qwen 2.5 Coder 1.5B InstructNVIDIA L41qwen2.5-coder-1.5b-instructbf1632kApache License 2.0
-NVIDIA H1001qwen2.5-coder-1.5b-instructbf1632kApache License 2.0
-NVIDIA H1002qwen2.5-coder-1.5b-instructbf1632kApache License 2.0
-NVIDIA L40S1qwen2.5-coder-1.5b-instructbf1632kApache License 2.0
Qwen 2.5 1.5B InstructNVIDIA L40S1qwen2.5-1.5b-instructbf1632kApache License 2.0
-NVIDIA L41qwen2.5-1.5b-instructbf1632kApache License 2.0
-NVIDIA H1001qwen2.5-1.5b-instructbf1632kApache License 2.0
-NVIDIA H1002qwen2.5-1.5b-instructbf1632kApache License 2.0

Audio Models #

NameGPUGPU countString in APIAvailable QuantizationsMaximum Context LengthLicense
Whisper Large v3 TurboNVIDIA L40S1whisper-large-v3-turbofp16-MIT License
-NVIDIA H1001whisper-large-v3-turbofp16-MIT License
-NVIDIA H1002whisper-large-v3-turbofp16-MIT License
-NVIDIA L41whisper-large-v3-turbofp16-MIT License

Text Embedding Models #

NameGPUGPU countString in APIAvailable QuantizationsMaximum Context LengthLicense
GTE Large EN v1.5NVIDIA H1002gte-large-en-v1.5f328kApache License 2.0
-NVIDIA L41gte-large-en-v1.5f328kApache License 2.0
-NVIDIA L40S1gte-large-en-v1.5f328kApache License 2.0
-NVIDIA H1001gte-large-en-v1.5f328kApache License 2.0
GTE Multilingual baseNVIDIA L41gte-multilingual-basefp168kApache License 2.0
-NVIDIA L40S1gte-multilingual-basefp168kApache License 2.0
-NVIDIA H1001gte-multilingual-basefp168kApache License 2.0
-NVIDIA H1002gte-multilingual-basefp168kApache License 2.0

Ektos AI offers the most popular and trending open source models. We add new models on our platform immediately after they are released.

If you would like to use a model that is not currently supported, please let us know on Discord!

Next steps #