Skip to content

Buddy Documentation

Audio Processing

is-leeroy-jenkins/Buddy

Audio Processing¶

Audio processing mode supports speech-related workflows such as text-to-speech, transcription, and translation where provider support is available.

🧭 Purpose¶

Audio processing lets the user convert text to speech, transcribe uploaded audio, or translate spoken content while retaining provider-specific configuration in the application interface. It supports review, accessibility, briefing preparation, and multimodal analysis workflows.

🧱 Workflow Position¶

Audio mode is a provider-backed multimodal workflow. It collects model, voice, language, file, timing, and playback settings, then routes the request through the selected provider capability.

Workflow	Description
Text-to-Speech	Converts text into spoken audio.
Transcription	Converts speech audio into text.
Translation	Converts speech from one language into translated text where supported.
Playback	Supports user review of generated or uploaded audio.
Export	Saves generated output when the provider and UI path support it.

🧪 Example¶

Task:
Convert a short budget briefing paragraph into spoken audio for review.

Recommended setup:
- Mode: Audio
- Task: Text-to-Speech
- Voice: Select a supported provider voice
- Input: Final briefing paragraph
- Review: Play the output before reuse

✅ Recommended Sequence¶

Select the provider and audio-capable model.
Select the audio task.
Provide text or upload an audio file.
Set voice, language, timing, or playback options.
Run the audio operation.
Review generated text or audio.
Export only after confirming the output is accurate.

⚠️ Review Checks¶

For transcription or translation:

Verify proper nouns, acronyms, and agency names.
Review numbers, fiscal years, and dollar amounts.
Confirm that punctuation and paragraph breaks are usable.
Avoid relying on raw transcripts for final policy language without review.

Use the app, gpt, gemini, and grok API documentation for provider-specific audio options.