Inference -- Dedicated Deployments

Deploy any model with a dedicated GPU and API endpoint in seconds. Customize your deployment with options to select quantization, context length, and GPU accelerator tailored to your exact needs. Maximum performance with no rate limits.

GPU Type	$/minute	$/hour
NVIDIA L4 24GB 0	$0.0183	$1.10
NVIDIA L40S 48GB 0	$0.0325	$1.95
NVIDIA H100 80GB 0	$0.0658	$3.95
NVIDIA H100 80GB x 2 0	$0.1317	$7.90

Inference -- Serverless Models

Access the most popular models instantly, with no cold starts. Pay only for what you use (by tokens, minutes, steps) ensuring cost efficiency and seamless performance.

Text and Embedding models	$/1M tokens	Select
Text models (0-4B params) LLM	$0.08
Text models (4-8B params) LLM	$0.15
Text models (8-21B params) LLM	$0.25
Text models (21-41B params) LLM	$0.70
Text models (41-80B params) LLM	$0.90
Embeddings models (0-250M params) TEBD	$0.008
Embeddings models (250-500M params) TEBD	$0.016

Spending Limits

Spending limits restrict how much you can spend on the Ektos AI platform per calendar month.

The spending limit is determined by your total historical Ektos AI spend.
You can purchase prepaid credits to immediately increase your historical spend.

Note: Credits are counted against your spending limit, so it is possible to hit the spending limit before all of your current credits are depleted.

Tier	Spending Limit ($/month)	Criteria
Tier 1	$50	Default with valid payment method added
Tier 2	$500	Total historical spend of $100+
Tier 3	$5000	Total historical spend of $1,000+
Tier 4	$50000	Total historical spend of $10,000+
Custom	Custom	Contact sales@ektos.ai