Text To Speech: Audio Transcription and Translation #
The audio API endpoints provide two speech to text options, transcriptions and translations. They can be used to:
- Transcribe audio into whatever language the audio is in.
- Translate and transcribe the audio into english.
File uploads are currently limited to 25 MB and the following input file types are supported: flac, mp3, mp4, mpeg, mpga, m4a, wav, and webm.
There are multiple ways you can use the audio models:
- The interactive web audio playground on our website.
- Directly making requests to the HTTP API endpoints: transcriptions & translations using your favorite tool, programming language, or with the client libraries for the OpenAI API
Using the HTTP POST
method, the endpoints for these models are:
- Transcriptions:
https://api.ektos.ai/v1/audio/transcriptions
- Translations:
https://api.ektos.ai/v1/audio/translations
The diarise
parameter (used to identify every speaker in the audio recording) is not directly supported by the official OpenAI client libraries, to supply it you must use the extra_body
method such as:
1transcription = client.audio.transcriptions.create(
2 model="whisper-large-v3-turbo",
3 language="fr",
4 file=audio_file,
5 extra_body={"diarise": True}
6)
Select the model you want to use with the model
field.
Available models and corresponding API strings are listed here (type: STT
).
A detailed description of the available parameters can be found in the API reference specification:
Below are basic examples using cuRL and the OpenAI official client libraries for Python and Node.JS.
[transcriptions] cuRL #
1curl --request POST \
2 --url https://api.ektos.ai/v1/audio/transcriptions \
3 --header "Authorization: Bearer $EKTOS_API_KEY" \
4 --header "Content-Type: multipart/form-data" \
5 --form file="@/path/to/file/audio.mp3" \
6 --form model="whisper-large-v3-turbo"
[transcriptions] Python: OpenAI client library #
1import os
2from openai import OpenAI
3client = OpenAI(
4 base_url="https://api.ektos.ai/v1/",
5 api_key="YOUR_EKTOS_API_KEY_HERE"
6 # api_key=os.environ.get("EKTOS_API_KEY")
7)
8
9audio_file = open("speech.mp3", "rb")
10transcript = client.audio.transcriptions.create(
11 model="whisper-large-v3-turbo",
12 file=audio_file
13)
14print(transcript.text)
[transcriptions] JavaScript: Node.JS OpenAI client library #
1import fs from "fs";
2import OpenAI from "openai";
3
4const openai = new OpenAI({
5 apiKey: "$EKTOS_API_KEY",
6 baseURL: 'https://api.ektos.ai/v1/',
7});
8
9async function main() {
10 const transcription = await openai.audio.transcriptions.create({
11 file: fs.createReadStream("audio.mp3"),
12 model: "whisper-large-v3-turbo",
13 });
14
15 console.log(transcription.text);
16}
17main();
[translations] cuRL #
1curl --request POST \
2 --url https://api.ektos.ai/v1/audio/translations \
3 --header "Authorization: Bearer $EKTOS_API_KEY" \
4 --header "Content-Type: multipart/form-data" \
5 --form file="@/path/to/file/audio.mp3" \
6 --form model="whisper-large-v3-turbo"
(translations) Python: OpenAI client library #
1import os
2from openai import OpenAI
3client = OpenAI(
4 base_url="https://api.ektos.ai/v1/",
5 api_key="YOUR_EKTOS_API_KEY_HERE"
6 # api_key=os.environ.get("EKTOS_API_KEY")
7)
8
9audio_file = open("speech.mp3", "rb")
10translation = client.audio.translations.create(
11 model="whisper-large-v3-turbo",
12 file=audio_file
13)
14print(translation.text)
[translations] JavaScript: Node.JS OpenAI client library #
1import fs from "fs";
2import OpenAI from "openai";
3
4const openai = new OpenAI({
5 apiKey: "$EKTOS_API_KEY",
6 baseURL: 'https://api.ektos.ai/v1/',
7});
8
9async function main() {
10 const transcription = await openai.audio.translations.create({
11 file: fs.createReadStream("audio.mp3"),
12 model: "whisper-large-v3-turbo",
13 });
14
15 console.log(transcription.text);
16}
17main();