Available models for Inference with Dedicated Deployments #

The models listed below are available to deploy and use with Dedicated Deployments.

Text Models #

Name	GPU	GPU count	String in API	Available Quantizations	Maximum Context Length	License
Llama 3.1 70B Instruct	NVIDIA H100	2	`llama-3.1-70b-instruct`	bf16, fp8	131072	Llama 3.1 Community License Agreement
-	NVIDIA H100	1	`llama-3.1-70b-instruct`	fp8	131072	Llama 3.1 Community License Agreement
Llama 3.3 70B Instruct	NVIDIA H100	2	`llama-3.3-70b-instruct`	bf16, fp8	131072	Llama 3.3 Community License Agreement
-	NVIDIA H100	1	`llama-3.3-70b-instruct`	fp8	131072	Llama 3.3 Community License Agreement
Qwen 2.5 Coder 32B Instruct	NVIDIA H100	1	`qwen2.5-coder-32b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`qwen2.5-coder-32b-instruct`	bf16	32768	Apache License 2.0
Qwen 2.5 32B Instruct	NVIDIA H100	1	`qwen2.5-32b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`qwen2.5-32b-instruct`	bf16	32768	Apache License 2.0
Gemma 2 27B Instruct	NVIDIA H100	1	`gemma-2-27b-it`	bf16	4096	Gemma
-	NVIDIA H100	2	`gemma-2-27b-it`	bf16	4096	Gemma
legml-v0.1	NVIDIA L40S	1	`legml-v0.1`	bf16	32768	Apache License 2.0
-	NVIDIA H100	1	`legml-v0.1`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`legml-v0.1`	bf16	32768	Apache License 2.0
Qwen 2.5 14B Instruct	NVIDIA L40S	1	`qwen2.5-14b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	1	`qwen2.5-14b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`qwen2.5-14b-instruct`	bf16	32768	Apache License 2.0
Qwen 2.5 Coder 14B Instruct	NVIDIA L40S	1	`qwen2.5-coder-14b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	1	`qwen2.5-coder-14b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`qwen2.5-coder-14b-instruct`	bf16	32768	Apache License 2.0
Pixtral 12b 2409	NVIDIA H100	2	`pixtral-12b-2409`	bf16	128000	Apache License 2.0
-	NVIDIA L40S	1	`pixtral-12b-2409`	bf16	128000	Apache License 2.0
-	NVIDIA H100	1	`pixtral-12b-2409`	bf16	128000	Apache License 2.0
Mistral Nemo Instruct 2407	NVIDIA L40S	1	`mistral-nemo-instruct-2407`	bf16	128000	Apache License 2.0
-	NVIDIA H100	1	`mistral-nemo-instruct-2407`	bf16	128000	Apache License 2.0
-	NVIDIA H100	2	`mistral-nemo-instruct-2407`	bf16	128000	Apache License 2.0
Gemma 2 9B Instruct	NVIDIA L4	1	`gemma-2-9b-it`	bf16	4096	Gemma
-	NVIDIA L40S	1	`gemma-2-9b-it`	bf16	4096	Gemma
-	NVIDIA H100	1	`gemma-2-9b-it`	bf16	4096	Gemma
-	NVIDIA H100	2	`gemma-2-9b-it`	bf16	4096	Gemma
Llama 3.1 8B Instruct	NVIDIA L40S	1	`llama-3.1-8b-instruct`	bf16, fp8	131072	Llama 3.1 Community License Agreement
-	NVIDIA H100	1	`llama-3.1-8b-instruct`	bf16, fp8	131072	Llama 3.1 Community License Agreement
-	NVIDIA H100	2	`llama-3.1-8b-instruct`	bf16, fp8	131072	Llama 3.1 Community License Agreement
-	NVIDIA L4	1	`llama-3.1-8b-instruct`	bf16, fp8	131072	Llama 3.1 Community License Agreement
Qwen 2.5 Coder 7B Instruct	NVIDIA L4	1	`qwen2.5-coder-7b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA L40S	1	`qwen2.5-coder-7b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	1	`qwen2.5-coder-7b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`qwen2.5-coder-7b-instruct`	bf16	32768	Apache License 2.0
Qwen 2.5 7B Instruct	NVIDIA L4	1	`qwen2.5-7b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA L40S	1	`qwen2.5-7b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	1	`qwen2.5-7b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`qwen2.5-7b-instruct`	bf16	32768	Apache License 2.0
Phi 3.5 Mini Instruct	NVIDIA L4	1	`phi-3.5-mini-instruct`	bf16	131072	MIT License
-	NVIDIA L40S	1	`phi-3.5-mini-instruct`	bf16	131072	MIT License
-	NVIDIA H100	1	`phi-3.5-mini-instruct`	bf16	131072	MIT License
-	NVIDIA H100	2	`phi-3.5-mini-instruct`	bf16	131072	MIT License
Gemma 2 2B Instruct	NVIDIA L40S	1	`gemma-2-2b-it`	bf16	4096	Gemma
-	NVIDIA H100	1	`gemma-2-2b-it`	bf16	4096	Gemma
-	NVIDIA H100	2	`gemma-2-2b-it`	bf16	4096	Gemma
-	NVIDIA L4	1	`gemma-2-2b-it`	bf16	4096	Gemma
Qwen 2.5 Coder 1.5B Instruct	NVIDIA L40S	1	`qwen2.5-coder-1.5b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	1	`qwen2.5-coder-1.5b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`qwen2.5-coder-1.5b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA L4	1	`qwen2.5-coder-1.5b-instruct`	bf16	32768	Apache License 2.0
Qwen 2.5 1.5B Instruct	NVIDIA L4	1	`qwen2.5-1.5b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA L40S	1	`qwen2.5-1.5b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	1	`qwen2.5-1.5b-instruct`	bf16	32768	Apache License 2.0
-	NVIDIA H100	2	`qwen2.5-1.5b-instruct`	bf16	32768	Apache License 2.0

Audio Models #

Name	GPU	GPU count	String in API	Available Quantizations	Maximum Context Length	License
Whisper Large v3 Turbo	NVIDIA L40S	1	`whisper-large-v3-turbo`	fp16	-	MIT License
-	NVIDIA H100	1	`whisper-large-v3-turbo`	fp16	-	MIT License
-	NVIDIA H100	2	`whisper-large-v3-turbo`	fp16	-	MIT License
-	NVIDIA L4	1	`whisper-large-v3-turbo`	fp16	-	MIT License

Text Embedding Models #

Name	GPU	GPU count	String in API	Available Quantizations	Maximum Context Length	License
GTE Large EN v1.5	NVIDIA H100	2	`gte-large-en-v1.5`	f32	8k	Apache License 2.0
-	NVIDIA L4	1	`gte-large-en-v1.5`	f32	8k	Apache License 2.0
-	NVIDIA L40S	1	`gte-large-en-v1.5`	f32	8k	Apache License 2.0
-	NVIDIA H100	1	`gte-large-en-v1.5`	f32	8k	Apache License 2.0
GTE Multilingual base	NVIDIA L4	1	`gte-multilingual-base`	fp16	8k	Apache License 2.0
-	NVIDIA L40S	1	`gte-multilingual-base`	fp16	8k	Apache License 2.0
-	NVIDIA H100	1	`gte-multilingual-base`	fp16	8k	Apache License 2.0
-	NVIDIA H100	2	`gte-multilingual-base`	fp16	8k	Apache License 2.0

Ektos AI offers the most popular and trending open source models. We add new models on our platform immediately after they are released.

If you would like to use a model that is not currently supported, please let us know on Discord!

Next steps #

Manage dedicated deployments.
Use text models.
Use audio models.
Use embedding models.
Get in touch and interact with our community on our Discord.