Ektos AI is now in Early Access!Join our Discord 

Inference -- Dedicated Deployments

Deploy any model with a dedicated GPU and API endpoint in seconds. Customize your deployment with options to select quantization, context length, and GPU accelerator tailored to your exact needs. Maximum performance with no rate limits.

GPU Type$/minute$/hourSelect
NVIDIA L4 24GB
0
NVIDIA L40S 48GB
0
NVIDIA H100 80GB
0
NVIDIA H100 80GB x 2
0

Inference -- Serverless Models

Access the most popular models instantly, with no cold starts. Pay only for what you use (by tokens, minutes, steps) ensuring cost efficiency and seamless performance.

Text and Embedding models$/1M tokensSelect
Text models (0-4B params)
LLM
Text models (4-8B params)
LLM
Text models (8-21B params)
LLM
Text models (21-41B params)
LLM
Text models (41-80B params)
LLM
Embeddings models (0-250M params)
TEBD
Embeddings models (250-500M params)
TEBD

Spending Limits

Spending limits restrict how much you can spend on the Ektos AI platform per calendar month.

  • The spending limit is determined by your total historical Ektos AI spend.
  • You can purchase prepaid credits to immediately increase your historical spend.

Note: Credits are counted against your spending limit, so it is possible to hit the spending limit before all of your current credits are depleted.

TierCriteria
Tier 1
Tier 2
Tier 3
Tier 4
Custom