Text To Speech: Audio Transcription and Translation #
The audio API endpoints provide two speech to text options, transcriptions and translations. They can be used to:
- Transcribe audio into whatever language the audio is in.
- Translate and transcribe the audio into english.
File uploads are currently limited to 25 MB and the following input file types are supported: flac, mp3, mp4, mpeg, mpga, m4a, wav, and webm.
To use the audio models you can directly make requests to the HTTP API endpoints: transcriptions & translations using your favorite tool, programming language, or with the client libraries for the OpenAI API.
API Endpoints #
With HTTP POST
method, the endpoints for audio models are:
- Dedicated Deployments:
- Transcriptions:
https://{deployment-id}.deployments.api.ektos.ai/v1/audio/transcriptions
. - Translations:
https://{deployment-id}.deployments.api.ektos.ai/v1/audio/translations
.
- Transcriptions:
When making a request to the endpoints, select the model you want to use with the model
body parameter.
Available models and corresponding API strings #
The diarise
parameter (used to identify every speaker in the audio recording) is not directly supported by the official OpenAI client libraries, to supply it you must use the extra_body
method such as:
1transcription = client.audio.transcriptions.create(
2 model="whisper-large-v3-turbo",
3 language="fr",
4 file=audio_file,
5 extra_body={"diarise": True}
6)
A detailed description of the available parameters can be found in the API reference specification:
Below are basic examples using cuRL and the OpenAI official client libraries for Python and Node.JS.
[transcriptions] cuRL #
1curl --request POST \
2 --url https://{deployment-id}.deployments.api.ektos.ai/v1/audio/transcriptions \
3 --header "Authorization: Bearer $EKTOS_API_KEY" \
4 --header "Content-Type: multipart/form-data" \
5 --form file="@/path/to/file/audio.mp3" \
6 --form model="whisper-large-v3-turbo"
[transcriptions] Python: OpenAI client library #
1import os
2from openai import OpenAI
3client = OpenAI(
4 base_url="https://{deployment-id}.deployments.api.ektos.ai/v1/",
5 api_key="YOUR_EKTOS_API_KEY_HERE"
6 # api_key=os.environ.get("EKTOS_API_KEY")
7)
8
9audio_file = open("speech.mp3", "rb")
10transcript = client.audio.transcriptions.create(
11 model="whisper-large-v3-turbo",
12 file=audio_file
13)
14print(transcript.text)
[transcriptions] JavaScript: Node.JS OpenAI client library #
1import fs from "fs";
2import OpenAI from "openai";
3
4const openai = new OpenAI({
5 apiKey: "$EKTOS_API_KEY",
6 baseURL: 'https://{deployment-id}.deployments.api.ektos.ai/v1/',
7});
8
9async function main() {
10 const transcription = await openai.audio.transcriptions.create({
11 file: fs.createReadStream("audio.mp3"),
12 model: "whisper-large-v3-turbo",
13 });
14
15 console.log(transcription.text);
16}
17main();
[translations] cuRL #
1curl --request POST \
2 --url https://{deployment-id}.deployments.api.ektos.ai/v1/audio/translations \
3 --header "Authorization: Bearer $EKTOS_API_KEY" \
4 --header "Content-Type: multipart/form-data" \
5 --form file="@/path/to/file/audio.mp3" \
6 --form model="whisper-large-v3-turbo"
(translations) Python: OpenAI client library #
1import os
2from openai import OpenAI
3client = OpenAI(
4 base_url="https://{deployment-id}.deployments.api.ektos.ai/v1/",
5 api_key="YOUR_EKTOS_API_KEY_HERE"
6 # api_key=os.environ.get("EKTOS_API_KEY")
7)
8
9audio_file = open("speech.mp3", "rb")
10translation = client.audio.translations.create(
11 model="whisper-large-v3-turbo",
12 file=audio_file
13)
14print(translation.text)
[translations] JavaScript: Node.JS OpenAI client library #
1import fs from "fs";
2import OpenAI from "openai";
3
4const openai = new OpenAI({
5 apiKey: "$EKTOS_API_KEY",
6 baseURL: 'https://{deployment-id}.deployments.api.ektos.ai/v1/',
7});
8
9async function main() {
10 const transcription = await openai.audio.translations.create({
11 file: fs.createReadStream("audio.mp3"),
12 model: "whisper-large-v3-turbo",
13 });
14
15 console.log(transcription.text);
16}
17main();