Introduction #

Welcome to the Ektos AI documentation!

Ektos AI is a cloud platform for generative AI where you can use the best open-source models without having to think about the underlying compute infrastructure.

We handle the heavy lifting: model deployment, GPU configuration, performance optimization, scaling, and continuous monitoring.

Services #

Inference -- Dedicated Deployments: Deploy any model with a dedicated GPU and API endpoint in seconds. Customize your deployment with options to select quantization, context length, and GPU accelerator tailored to your exact needs. Maximum performance with no rate limits.
Inference -- Serverless: Access the most popular models instantly, with no cold starts. Pay only for what you use (by tokens, minutes, steps) ensuring cost efficiency and seamless performance.

(Additional services will be announced after Early Access)

Real-time usage dashboards, statistics and logs are directly available from our web platform to get an accurate overview and manage costs effectively.

All our API inference endpoints are compatible with the OpenAI API. You can seamlessly use any OpenAI API client library for an immediate migration to our platform, leading to significant cost savings.

The endpoint for the Ektos AI API is: https://api.ektos.ai/v1/
The API Reference specification of the Ektos AI API can be found at: https://ektos.ai/docs/api

Next steps #

Discover the available models for Inference (Dedicated Deployments).
Discover the available models for Inference (Serverless).
Manage dedicated deployments.
Use text models.
Use audio models.
Use embedding models.
Get in touch and interact with our community on our Discord.