Available models for Inference with Dedicated Deployments #
The models listed below are available to deploy and use with Dedicated Deployments.
Text Models #
Name | GPU | GPU count | String in API | Available Quantizations | Maximum Context Length | License |
---|---|---|---|---|---|---|
Llama 3.3 70B Instruct | NVIDIA H100 | 2 | llama-3.3-70b-instruct | bf16, fp8 | 131k | Llama 3.3 Community License Agreement |
- | NVIDIA H100 | 1 | llama-3.3-70b-instruct | fp8 | 131k | Llama 3.3 Community License Agreement |
Llama 3.1 70B Instruct | NVIDIA H100 | 2 | llama-3.1-70b-instruct | bf16, fp8 | 131k | Llama 3.1 Community License Agreement |
- | NVIDIA H100 | 1 | llama-3.1-70b-instruct | fp8 | 131k | Llama 3.1 Community License Agreement |
Qwen 2.5 Coder 32B Instruct | NVIDIA H100 | 1 | qwen2.5-coder-32b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 2 | qwen2.5-coder-32b-instruct | bf16 | 32k | Apache License 2.0 |
Qwen 2.5 32B Instruct | NVIDIA H100 | 1 | qwen2.5-32b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 2 | qwen2.5-32b-instruct | bf16 | 32k | Apache License 2.0 |
Gemma 2 27B Instruct | NVIDIA H100 | 1 | gemma-2-27b-it | bf16 | 4k | Gemma |
- | NVIDIA H100 | 2 | gemma-2-27b-it | bf16 | 4k | Gemma |
Qwen 2.5 Coder 14B Instruct | NVIDIA H100 | 2 | qwen2.5-coder-14b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA L40S | 1 | qwen2.5-coder-14b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 1 | qwen2.5-coder-14b-instruct | bf16 | 32k | Apache License 2.0 |
legml-v0.1 | NVIDIA H100 | 2 | legml-v0.1 | bf16 | 32k | Apache License 2.0 |
- | NVIDIA L40S | 1 | legml-v0.1 | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 1 | legml-v0.1 | bf16 | 32k | Apache License 2.0 |
Qwen 2.5 14B Instruct | NVIDIA L40S | 1 | qwen2.5-14b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 1 | qwen2.5-14b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 2 | qwen2.5-14b-instruct | bf16 | 32k | Apache License 2.0 |
Pixtral 12b 2409 | NVIDIA L40S | 1 | pixtral-12b-2409 | bf16 | 128k | Apache License 2.0 |
- | NVIDIA H100 | 1 | pixtral-12b-2409 | bf16 | 128k | Apache License 2.0 |
- | NVIDIA H100 | 2 | pixtral-12b-2409 | bf16 | 128k | Apache License 2.0 |
Mistral Nemo Instruct 2407 | NVIDIA H100 | 2 | mistral-nemo-instruct-2407 | bf16 | 128k | Apache License 2.0 |
- | NVIDIA H100 | 1 | mistral-nemo-instruct-2407 | bf16 | 128k | Apache License 2.0 |
- | NVIDIA L40S | 1 | mistral-nemo-instruct-2407 | bf16 | 128k | Apache License 2.0 |
Gemma 2 9B Instruct | NVIDIA L4 | 1 | gemma-2-9b-it | bf16 | 4k | Gemma |
- | NVIDIA L40S | 1 | gemma-2-9b-it | bf16 | 4k | Gemma |
- | NVIDIA H100 | 1 | gemma-2-9b-it | bf16 | 4k | Gemma |
- | NVIDIA H100 | 2 | gemma-2-9b-it | bf16 | 4k | Gemma |
Llama 3.1 8B Instruct | NVIDIA L40S | 1 | llama-3.1-8b-instruct | bf16, fp8 | 131k | Llama 3.1 Community License Agreement |
- | NVIDIA H100 | 1 | llama-3.1-8b-instruct | bf16, fp8 | 131k | Llama 3.1 Community License Agreement |
- | NVIDIA H100 | 2 | llama-3.1-8b-instruct | bf16, fp8 | 131k | Llama 3.1 Community License Agreement |
- | NVIDIA L4 | 1 | llama-3.1-8b-instruct | bf16, fp8 | 131k | Llama 3.1 Community License Agreement |
Qwen 2.5 Coder 7B Instruct | NVIDIA H100 | 1 | qwen2.5-coder-7b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 2 | qwen2.5-coder-7b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA L40S | 1 | qwen2.5-coder-7b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA L4 | 1 | qwen2.5-coder-7b-instruct | bf16 | 32k | Apache License 2.0 |
Qwen 2.5 7B Instruct | NVIDIA L40S | 1 | qwen2.5-7b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA L4 | 1 | qwen2.5-7b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 1 | qwen2.5-7b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 2 | qwen2.5-7b-instruct | bf16 | 32k | Apache License 2.0 |
Phi 3.5 Mini Instruct | NVIDIA L40S | 1 | phi-3.5-mini-instruct | bf16 | 131k | MIT License |
- | NVIDIA H100 | 1 | phi-3.5-mini-instruct | bf16 | 131k | MIT License |
- | NVIDIA H100 | 2 | phi-3.5-mini-instruct | bf16 | 131k | MIT License |
- | NVIDIA L4 | 1 | phi-3.5-mini-instruct | bf16 | 131k | MIT License |
Gemma 2 2B Instruct | NVIDIA L4 | 1 | gemma-2-2b-it | bf16 | 4k | Gemma |
- | NVIDIA L40S | 1 | gemma-2-2b-it | bf16 | 4k | Gemma |
- | NVIDIA H100 | 1 | gemma-2-2b-it | bf16 | 4k | Gemma |
- | NVIDIA H100 | 2 | gemma-2-2b-it | bf16 | 4k | Gemma |
Qwen 2.5 Coder 1.5B Instruct | NVIDIA L4 | 1 | qwen2.5-coder-1.5b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 1 | qwen2.5-coder-1.5b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 2 | qwen2.5-coder-1.5b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA L40S | 1 | qwen2.5-coder-1.5b-instruct | bf16 | 32k | Apache License 2.0 |
Qwen 2.5 1.5B Instruct | NVIDIA L40S | 1 | qwen2.5-1.5b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA L4 | 1 | qwen2.5-1.5b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 1 | qwen2.5-1.5b-instruct | bf16 | 32k | Apache License 2.0 |
- | NVIDIA H100 | 2 | qwen2.5-1.5b-instruct | bf16 | 32k | Apache License 2.0 |
Audio Models #
Name | GPU | GPU count | String in API | Available Quantizations | Maximum Context Length | License |
---|---|---|---|---|---|---|
Whisper Large v3 Turbo | NVIDIA L40S | 1 | whisper-large-v3-turbo | fp16 | - | MIT License |
- | NVIDIA H100 | 1 | whisper-large-v3-turbo | fp16 | - | MIT License |
- | NVIDIA H100 | 2 | whisper-large-v3-turbo | fp16 | - | MIT License |
- | NVIDIA L4 | 1 | whisper-large-v3-turbo | fp16 | - | MIT License |
Text Embedding Models #
Name | GPU | GPU count | String in API | Available Quantizations | Maximum Context Length | License |
---|---|---|---|---|---|---|
GTE Large EN v1.5 | NVIDIA H100 | 2 | gte-large-en-v1.5 | f32 | 8k | Apache License 2.0 |
- | NVIDIA L4 | 1 | gte-large-en-v1.5 | f32 | 8k | Apache License 2.0 |
- | NVIDIA L40S | 1 | gte-large-en-v1.5 | f32 | 8k | Apache License 2.0 |
- | NVIDIA H100 | 1 | gte-large-en-v1.5 | f32 | 8k | Apache License 2.0 |
GTE Multilingual base | NVIDIA L4 | 1 | gte-multilingual-base | fp16 | 8k | Apache License 2.0 |
- | NVIDIA L40S | 1 | gte-multilingual-base | fp16 | 8k | Apache License 2.0 |
- | NVIDIA H100 | 1 | gte-multilingual-base | fp16 | 8k | Apache License 2.0 |
- | NVIDIA H100 | 2 | gte-multilingual-base | fp16 | 8k | Apache License 2.0 |
Ektos AI offers the most popular and trending open source models. We add new models on our platform immediately after they are released.
If you would like to use a model that is not currently supported, please let us know on Discord!
Next steps #
- Manage dedicated deployments.
- Use text models.
- Use audio models.
- Use embedding models.
- Get in touch and interact with our community on our Discord.