Text To Speech: Audio Transcription and Translation #

The audio API endpoints provide two speech to text options, transcriptions and translations. They can be used to:

Transcribe audio into whatever language the audio is in.
Translate and transcribe the audio into english.

File uploads are currently limited to 25 MB and the following input file types are supported: flac, mp3, mp4, mpeg, mpga, m4a, wav, and webm.

To use the audio models you can directly make requests to the HTTP API endpoints: transcriptions & translations using your favorite tool, programming language, or with the client libraries for the OpenAI API.

API Endpoints #

With HTTP POST method, the endpoints for audio models are:

Dedicated Deployments:
- Transcriptions: https://{deployment-id}.deployments.api.ektos.ai/v1/audio/transcriptions.
- Translations: https://{deployment-id}.deployments.api.ektos.ai/v1/audio/translations.

When making a request to the endpoints, select the model you want to use with the model body parameter.

Available models and corresponding API strings #

The diarise parameter (used to identify every speaker in the audio recording) is not directly supported by the official OpenAI client libraries, to supply it you must use the extra_body method such as:

1transcription = client.audio.transcriptions.create(
2    model="whisper-large-v3-turbo",
3    language="fr",
4    file=audio_file,
5    extra_body={"diarise": True}
6)

A detailed description of the available parameters can be found in the API reference specification:

Below are basic examples using cuRL and the OpenAI official client libraries for Python and Node.JS.

[transcriptions] cuRL #

1curl --request POST \
2  --url https://{deployment-id}.deployments.api.ektos.ai/v1/audio/transcriptions \
3  --header "Authorization: Bearer $EKTOS_API_KEY" \
4  --header "Content-Type: multipart/form-data" \
5  --form file="@/path/to/file/audio.mp3" \
6  --form model="whisper-large-v3-turbo"

[transcriptions] Python: OpenAI client library #

 1import os
 2from openai import OpenAI
 3client = OpenAI(
 4    base_url="https://{deployment-id}.deployments.api.ektos.ai/v1/",
 5    api_key="YOUR_EKTOS_API_KEY_HERE"
 6    # api_key=os.environ.get("EKTOS_API_KEY")
 7)
 8
 9audio_file = open("speech.mp3", "rb")
10transcript = client.audio.transcriptions.create(
11  model="whisper-large-v3-turbo",
12  file=audio_file
13)
14print(transcript.text)

[transcriptions] JavaScript: Node.JS OpenAI client library #

 1import fs from "fs";
 2import OpenAI from "openai";
 3
 4const openai = new OpenAI({
 5  apiKey: "$EKTOS_API_KEY",
 6  baseURL: 'https://{deployment-id}.deployments.api.ektos.ai/v1/',
 7});
 8
 9async function main() {
10  const transcription = await openai.audio.transcriptions.create({
11    file: fs.createReadStream("audio.mp3"),
12    model: "whisper-large-v3-turbo",
13 });
14
15  console.log(transcription.text);
16}
17main();

[translations] cuRL #

1curl --request POST \
2  --url https://{deployment-id}.deployments.api.ektos.ai/v1/audio/translations \
3  --header "Authorization: Bearer $EKTOS_API_KEY" \
4  --header "Content-Type: multipart/form-data" \
5  --form file="@/path/to/file/audio.mp3" \
6  --form model="whisper-large-v3-turbo"

(translations) Python: OpenAI client library #

 1import os
 2from openai import OpenAI
 3client = OpenAI(
 4    base_url="https://{deployment-id}.deployments.api.ektos.ai/v1/",
 5    api_key="YOUR_EKTOS_API_KEY_HERE"
 6    # api_key=os.environ.get("EKTOS_API_KEY")
 7)
 8
 9audio_file = open("speech.mp3", "rb")
10translation = client.audio.translations.create(
11  model="whisper-large-v3-turbo",
12  file=audio_file
13)
14print(translation.text)

[translations] JavaScript: Node.JS OpenAI client library #

 1import fs from "fs";
 2import OpenAI from "openai";
 3
 4const openai = new OpenAI({
 5  apiKey: "$EKTOS_API_KEY",
 6  baseURL: 'https://{deployment-id}.deployments.api.ektos.ai/v1/',
 7});
 8
 9async function main() {
10  const transcription = await openai.audio.translations.create({
11    file: fs.createReadStream("audio.mp3"),
12    model: "whisper-large-v3-turbo",
13 });
14
15  console.log(transcription.text);
16}
17main();