Your users want voice. They want to listen while commuting, hear responses while cooking, and interact hands-free while multitasking. But adding text-to-speech to your AI application means wrestling with audio encoding, managing voice configurations, and handling streaming audio buffers.
Voice integration should take minutes, not weeks. NeuroLink makes it happen. Pass a single tts option to your existing generate() call and receive both text and audio in one response. No separate API calls. No audio processing libraries. No voice configuration headaches.
This guide walks you through complete TTS integration with NeuroLink. You will learn voice selection, streaming audio, multi-speaker podcasts, and voice assistant patterns.
TL;DR
- One API call produces text + audio output
- Google Cloud TTS with Studio, Neural2, WaveNet, and Standard voices
- Real-time streaming audio for immediate playback
- Multi-speaker podcast generation
- 40+ languages supported
Why Voice Matters for AI Apps
Voice transforms how users interact with AI. Reading text requires attention and focus. Listening frees users to do other things.
The Accessibility Advantage
Voice output makes your application accessible to users with visual impairments. Natural AI-generated speech provides better context and nuance than screen readers. Voice also helps users with reading difficulties or those who prefer audio content.
The Engagement Difference
Voice creates emotional connection. A well-chosen voice with appropriate pacing builds trust and personality. Users remember voice interactions more vividly than text exchanges.
What NeuroLink TTS Provides
-
Unified API - Same
generate()call produces text and audio - Google Cloud Voices - Access to Studio, Neural2, WaveNet, and Standard voices
- Streaming Support - Real-time audio chunks for immediate playback
- Format Options - MP3, WAV (LINEAR16), and OGG Opus output
- Voice Control - Speaking rate, pitch, and volume adjustment
Quick Start: Your First TTS Request
Getting started takes five minutes. You need Google Cloud credentials and the NeuroLink package.
Step 1: Configure Google Cloud TTS
Enable the Cloud Text-to-Speech API in your Google Cloud Console. Create a service account and download the credentials JSON file:
# Required - Path to Google Cloud credentials
export GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
# For LLM provider (any supported provider)
export OPENAI_API_KEY=sk-...
# or
export ANTHROPIC_API_KEY=sk-ant-...
Step 2: Generate Your First Audio Response
pnpm add @juspay/neurolink
# or
npm install @juspay/neurolink
import { NeuroLink } from "@juspay/neurolink";
import fs from "fs";
async function main() {
const ai = new NeuroLink();
// Generate AI response with TTS audio output
const result = await ai.generate({
input: {
text: "Write a friendly welcome message for new users",
systemPrompt: "You are a helpful assistant with a warm tone",
},
tts: {
enabled: true,
provider: "google-tts",
voice: "en-US-Studio-M",
outputFormat: "mp3",
},
});
// Save the audio file
if (result.audio?.buffer) {
fs.writeFileSync("welcome.mp3", result.audio.buffer);
console.log("Audio saved to welcome.mp3");
}
console.log("\nText Response:", result.content);
}
main().catch(console.error);
That's it. One generate() call produces both text and audio. The TTS option integrates seamlessly with any LLM provider.
CLI equivalent:
npx @juspay/neurolink generate "Write a welcome message" \
--tts \
--tts-voice "en-US-Studio-M" \
--output welcome.mp3
Voice Selection Guide
Google TTS offers four voice tiers with different quality levels and pricing.
Voice Quality Tiers
| Voice Type | Quality | Use Case | Cost per 1M chars |
|---|---|---|---|
| Studio | Premium | Production apps, customer-facing | ~$160 |
| Neural2 | High | Standard production apps | ~$16 |
| WaveNet | High | Natural-sounding speech | ~$16 |
| Standard | Good | Development, testing | ~$4 |
Voice Selection Recommendations
| Scenario | Recommended Voice | Rationale |
|---|---|---|
| Development/Testing | en-US-Standard-A |
Low cost, fast iteration |
| Internal Tools | en-US-Neural2-A |
Good quality, reasonable cost |
| Customer-Facing Apps | en-US-Studio-M |
Premium quality, professional |
| Podcasts/Content | en-US-Studio-O |
Broadcast quality |
| High-Volume Processing | en-US-Standard-* |
Cost-effective at scale |
Discovering Available Voices
import { NeuroLink } from "@juspay/neurolink";
async function listVoices() {
const ai = new NeuroLink();
const voices = await ai.tts.getVoices();
console.log(`Total voices available: ${voices.length}`);
// Filter by language
const englishVoices = voices.filter((v) => v.language.startsWith("en"));
console.log(`English voices: ${englishVoices.length}`);
englishVoices.slice(0, 10).forEach((voice) => {
console.log(
` ${voice.name} - ${voice.gender} - ${voice.language} (${voice.type})`
);
});
}
listVoices().catch(console.error);
CLI equivalent:
npx @juspay/neurolink tts voices --provider google-tts --language en-US
Streaming Audio
Real-time audio streaming enables immediate playback. Users hear the response as it generates instead of waiting for completion.
import { NeuroLink } from "@juspay/neurolink";
async function streamWithAudio() {
const ai = new NeuroLink();
const stream = await ai.stream({
input: { text: "Explain the history of artificial intelligence" },
tts: {
enabled: true,
streaming: true,
voice: "en-US-Neural2-A",
},
});
let textContent = "";
let audioChunks = 0;
for await (const chunk of stream) {
if (chunk.content) {
process.stdout.write(chunk.content);
textContent += chunk.content;
}
if (chunk.audio) {
audioChunks++;
}
}
console.log(`\nTotal characters: ${textContent.length}`);
console.log(`Audio chunks received: ${audioChunks}`);
}
streamWithAudio().catch(console.error);
Streaming Benefits
- Reduced Latency - Users hear audio within seconds, not after full generation
- Memory Efficiency - Process chunks instead of buffering entire responses
- Progressive Enhancement - Degrade gracefully if audio playback fails
- Real-time Feedback - Users know the system is working
Podcast Generation Pipeline
Generate multi-speaker podcast episodes with different voices for each speaker:
import { NeuroLink } from "@juspay/neurolink";
import fs from "fs";
interface PodcastSection {
speaker: "host" | "guest";
text: string;
}
async function generatePodcastEpisode(script: PodcastSection[]) {
const ai = new NeuroLink();
const audioSegments: Buffer[] = [];
for (let i = 0; i < script.length; i++) {
const section = script[i];
console.log(`Processing section ${i + 1}/${script.length} (${section.speaker})...`);
const result = await ai.generate({
input: {
text: section.text,
systemPrompt: `Speak naturally as a ${section.speaker}`,
},
tts: {
enabled: true,
voice:
section.speaker === "host"
? "en-US-Studio-M" // Male host voice
: "en-US-Studio-O", // Female guest voice
speakingRate: 0.95,
},
});
if (result.audio?.buffer) {
audioSegments.push(result.audio.buffer);
}
}
return Buffer.concat(audioSegments);
}
This pattern works for any multi-speaker content: interviews, dialogues, audiobooks, or educational content.
Voice Assistant Integration
Build conversational voice assistants with memory and context:
import { NeuroLink } from "@juspay/neurolink";
async function voiceAssistant(userQuery: string) {
const ai = new NeuroLink({
conversationMemory: { enabled: true },
});
const result = await ai.generate({
input: { text: userQuery },
tts: {
enabled: true,
voice: "en-US-Neural2-A",
},
});
return {
text: result.content,
audio: result.audio?.buffer,
conversationId: result.conversationId,
};
}
The voice assistant maintains conversation context across turns. Each response includes both text and audio.
CLI Workflows
The NeuroLink CLI provides quick access to TTS features for testing and prototyping.
# Basic TTS generation
npx @juspay/neurolink generate "Welcome to our platform!" \
--tts --tts-voice "en-US-Studio-M" --output welcome.mp3
# Stream with voice
npx @juspay/neurolink stream "Tell me a bedtime story" \
--tts --tts-voice "en-US-Studio-O"
# List available voices
npx @juspay/neurolink tts voices
# Test a specific voice
npx @juspay/neurolink tts test "Hello, this is a voice test" \
--voice "en-US-Studio-M" --output test.mp3
Audio Quality Settings
Fine-tune audio output with configuration options:
const ttsConfig = {
tts: {
enabled: true,
provider: "google-tts",
voice: "en-US-Studio-M",
audioEncoding: "MP3", // Options: MP3, LINEAR16, OGG_OPUS
speakingRate: 1.0, // Range: 0.25 to 4.0
pitch: 0.0, // Range: -20.0 to 20.0
volumeGainDb: 0.0 // Range: -96.0 to 16.0
}
};
Audio Format Comparison
| Format | Use Case | File Size | Quality |
|---|---|---|---|
| MP3 | General use, web apps | Small | Good |
| LINEAR16 | Professional audio, editing | Large | Lossless |
| OGG_OPUS | Low-latency streaming | Small | Excellent |
Speaking Rate Guidelines
| Rate | Effect | Best For |
|---|---|---|
| 0.75 | Slow, deliberate | Accessibility, complex content |
| 1.0 | Normal speed | General use |
| 1.15 | Slightly faster | Notifications, quick updates |
| 1.5 | Fast | Speed listeners, time-sensitive |
Summary
Voice transforms AI applications from tools into companions. NeuroLink makes this transformation effortless.
You learned how to:
- Generate audio output with a single
ttsoption ingenerate() - Select the right voice tier for your use case and budget
- Stream audio chunks for real-time playback
- Build multi-speaker podcasts with distinct voices
- Create conversational voice assistants with memory
- Use CLI workflows for rapid TTS prototyping
- Fine-tune audio quality with encoding and modulation settings
Stop building separate audio pipelines. Start shipping voice features.
Found this helpful? Drop a comment below with your questions!
Want to try NeuroLink?
- GitHub: github.com/juspay/neurolink
- Star the repo if you find it useful!
Follow us for more AI development content:
- Dev.to: @neurolink
- Twitter: @Neurolink__
Top comments (0)