Ektos AI is now in Early Access!Join our Discord 

Text To Speech: Audio Transcription and Translation #

The audio API endpoints provide two speech to text options, transcriptions and translations. They can be used to:

  • Transcribe audio into whatever language the audio is in.
  • Translate and transcribe the audio into english.

File uploads are currently limited to 25 MB and the following input file types are supported: flac, mp3, mp4, mpeg, mpga, m4a, wav, and webm.

To use the audio models you can directly make requests to the HTTP API endpoints: transcriptions & translations using your favorite tool, programming language, or with the client libraries for the OpenAI API.

API Endpoints #

With HTTP POST method, the endpoints for audio models are:

  • Dedicated Deployments:
    • Transcriptions: https://{deployment-id}.deployments.api.ektos.ai/v1/audio/transcriptions.
    • Translations: https://{deployment-id}.deployments.api.ektos.ai/v1/audio/translations.

When making a request to the endpoints, select the model you want to use with the model body parameter.

Available models and corresponding API strings #

The diarise parameter (used to identify every speaker in the audio recording) is not directly supported by the official OpenAI client libraries, to supply it you must use the extra_body method such as:

1transcription = client.audio.transcriptions.create(
2    model="whisper-large-v3-turbo",
3    language="fr",
4    file=audio_file,
5    extra_body={"diarise": True}
6)

A detailed description of the available parameters can be found in the API reference specification:

Below are basic examples using cuRL and the OpenAI official client libraries for Python and Node.JS.

[transcriptions] cuRL #

1curl --request POST \
2  --url https://{deployment-id}.deployments.api.ektos.ai/v1/audio/transcriptions \
3  --header "Authorization: Bearer $EKTOS_API_KEY" \
4  --header "Content-Type: multipart/form-data" \
5  --form file="@/path/to/file/audio.mp3" \
6  --form model="whisper-large-v3-turbo"

[transcriptions] Python: OpenAI client library #

 1import os
 2from openai import OpenAI
 3client = OpenAI(
 4    base_url="https://{deployment-id}.deployments.api.ektos.ai/v1/",
 5    api_key="YOUR_EKTOS_API_KEY_HERE"
 6    # api_key=os.environ.get("EKTOS_API_KEY")
 7)
 8
 9audio_file = open("speech.mp3", "rb")
10transcript = client.audio.transcriptions.create(
11  model="whisper-large-v3-turbo",
12  file=audio_file
13)
14print(transcript.text)

[transcriptions] JavaScript: Node.JS OpenAI client library #

 1import fs from "fs";
 2import OpenAI from "openai";
 3
 4const openai = new OpenAI({
 5  apiKey: "$EKTOS_API_KEY",
 6  baseURL: 'https://{deployment-id}.deployments.api.ektos.ai/v1/',
 7});
 8
 9async function main() {
10  const transcription = await openai.audio.transcriptions.create({
11    file: fs.createReadStream("audio.mp3"),
12    model: "whisper-large-v3-turbo",
13 });
14
15  console.log(transcription.text);
16}
17main();

[translations] cuRL #

1curl --request POST \
2  --url https://{deployment-id}.deployments.api.ektos.ai/v1/audio/translations \
3  --header "Authorization: Bearer $EKTOS_API_KEY" \
4  --header "Content-Type: multipart/form-data" \
5  --form file="@/path/to/file/audio.mp3" \
6  --form model="whisper-large-v3-turbo"

(translations) Python: OpenAI client library #

 1import os
 2from openai import OpenAI
 3client = OpenAI(
 4    base_url="https://{deployment-id}.deployments.api.ektos.ai/v1/",
 5    api_key="YOUR_EKTOS_API_KEY_HERE"
 6    # api_key=os.environ.get("EKTOS_API_KEY")
 7)
 8
 9audio_file = open("speech.mp3", "rb")
10translation = client.audio.translations.create(
11  model="whisper-large-v3-turbo",
12  file=audio_file
13)
14print(translation.text)

[translations] JavaScript: Node.JS OpenAI client library #

 1import fs from "fs";
 2import OpenAI from "openai";
 3
 4const openai = new OpenAI({
 5  apiKey: "$EKTOS_API_KEY",
 6  baseURL: 'https://{deployment-id}.deployments.api.ektos.ai/v1/',
 7});
 8
 9async function main() {
10  const transcription = await openai.audio.translations.create({
11    file: fs.createReadStream("audio.mp3"),
12    model: "whisper-large-v3-turbo",
13 });
14
15  console.log(transcription.text);
16}
17main();