Ektos AI is in closed Beta!Join our Discord to become a tester. 

Text To Speech: Audio Transcription and Translation #

The audio API endpoints provide two speech to text options, transcriptions and translations. They can be used to:

  • Transcribe audio into whatever language the audio is in.
  • Translate and transcribe the audio into english.

File uploads are currently limited to 25 MB and the following input file types are supported: flac, mp3, mp4, mpeg, mpga, m4a, wav, and webm.

There are multiple ways you can use the audio models:

Using the HTTP POST method, the endpoints for these models are:

  • Transcriptions: https://api.ektos.ai/v1/audio/transcriptions
  • Translations: https://api.ektos.ai/v1/audio/translations

The diarise parameter (used to identify every speaker in the audio recording) is not directly supported by the official OpenAI client libraries, to supply it you must use the extra_body method such as:

1transcription = client.audio.transcriptions.create(
2    model="whisper-large-v3-turbo",
3    language="fr",
4    file=audio_file,
5    extra_body={"diarise": True}
6)

Select the model you want to use with the model field.

Available models and corresponding API strings are listed here (type: STT).

A detailed description of the available parameters can be found in the API reference specification:

Below are basic examples using cuRL and the OpenAI official client libraries for Python and Node.JS.

[transcriptions] cuRL #

1curl --request POST \
2  --url https://api.ektos.ai/v1/audio/transcriptions \
3  --header "Authorization: Bearer $EKTOS_API_KEY" \
4  --header "Content-Type: multipart/form-data" \
5  --form file="@/path/to/file/audio.mp3" \
6  --form model="whisper-large-v3-turbo"

[transcriptions] Python: OpenAI client library #

 1import os
 2from openai import OpenAI
 3client = OpenAI(
 4    base_url="https://api.ektos.ai/v1/",
 5    api_key="YOUR_EKTOS_API_KEY_HERE"
 6    # api_key=os.environ.get("EKTOS_API_KEY")
 7)
 8
 9audio_file = open("speech.mp3", "rb")
10transcript = client.audio.transcriptions.create(
11  model="whisper-large-v3-turbo",
12  file=audio_file
13)
14print(transcript.text)

[transcriptions] JavaScript: Node.JS OpenAI client library #

 1import fs from "fs";
 2import OpenAI from "openai";
 3
 4const openai = new OpenAI({
 5  apiKey: "$EKTOS_API_KEY",
 6  baseURL: 'https://api.ektos.ai/v1/',
 7});
 8
 9async function main() {
10  const transcription = await openai.audio.transcriptions.create({
11    file: fs.createReadStream("audio.mp3"),
12    model: "whisper-large-v3-turbo",
13 });
14
15  console.log(transcription.text);
16}
17main();

[translations] cuRL #

1curl --request POST \
2  --url https://api.ektos.ai/v1/audio/translations \
3  --header "Authorization: Bearer $EKTOS_API_KEY" \
4  --header "Content-Type: multipart/form-data" \
5  --form file="@/path/to/file/audio.mp3" \
6  --form model="whisper-large-v3-turbo"

(translations) Python: OpenAI client library #

 1import os
 2from openai import OpenAI
 3client = OpenAI(
 4    base_url="https://api.ektos.ai/v1/",
 5    api_key="YOUR_EKTOS_API_KEY_HERE"
 6    # api_key=os.environ.get("EKTOS_API_KEY")
 7)
 8
 9audio_file = open("speech.mp3", "rb")
10translation = client.audio.translations.create(
11  model="whisper-large-v3-turbo",
12  file=audio_file
13)
14print(translation.text)

[translations] JavaScript: Node.JS OpenAI client library #

 1import fs from "fs";
 2import OpenAI from "openai";
 3
 4const openai = new OpenAI({
 5  apiKey: "$EKTOS_API_KEY",
 6  baseURL: 'https://api.ektos.ai/v1/',
 7});
 8
 9async function main() {
10  const transcription = await openai.audio.translations.create({
11    file: fs.createReadStream("audio.mp3"),
12    model: "whisper-large-v3-turbo",
13 });
14
15  console.log(transcription.text);
16}
17main();