Text-to-Speech Providers¶
A TTS provider implements IJarvisTextToSpeechProvider to speak text aloud through the node's audio output. The node calls the provider after receiving a response from the Command Center.
Interface Reference¶
from abc import ABC, abstractmethod
class IJarvisTextToSpeechProvider(ABC):
@property
@abstractmethod
def provider_name(self) -> str:
"""Unique name for this provider. Example: 'jarvis_tts'."""
...
@abstractmethod
def speak(self, include_chime: bool, text: str) -> None:
"""Speak the given text aloud.
Args:
include_chime: If True, play a chime sound before speaking.
text: The text to synthesize and play.
"""
...
def play_chime(self) -> None:
"""Play the built-in chime sound (sounds/chime.wav).
This is a helper method available to all providers. Call it from
your speak() implementation when include_chime is True.
"""
...
The play_chime() method is a built-in helper that plays sounds/chime.wav through the system audio. You do not need to implement chime logic yourself --- just call self.play_chime() when include_chime is True.
Built-in Implementations¶
JarvisTTS¶
The primary TTS provider. Proxies text through the Command Center's media endpoint, which forwards it to the jarvis-tts service (Piper TTS). Supports streaming PCM audio for low latency.
class JarvisTTS(IJarvisTextToSpeechProvider):
provider_name = "jarvis_tts"
def speak(self, include_chime: bool, text: str) -> None:
if include_chime:
self.play_chime()
# Request streaming PCM audio from the TTS service
response = self.jcc_client.post(
"/api/v0/media/tts/synthesize",
json={"text": text},
stream=True,
)
# Play PCM audio chunks as they arrive (low latency)
for chunk in response.iter_bytes():
self.audio_player.play_pcm(chunk)
Streaming: JarvisTTS uses streaming PCM to minimize time-to-first-audio. The TTS service begins sending audio data before the entire text is synthesized, so the user hears the beginning of the response while the rest is still being generated.
EspeakTTS¶
A local fallback provider that uses the espeak command-line tool. No network connection required --- useful for offline operation or as a lightweight alternative.
class EspeakTTS(IJarvisTextToSpeechProvider):
provider_name = "espeak"
def speak(self, include_chime: bool, text: str) -> None:
if include_chime:
self.play_chime()
# Generate WAV to a temp file using espeak
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
tmp_path = tmp.name
subprocess.run(
["espeak", "-w", tmp_path, text],
check=True,
)
# Play the generated audio file
self.audio_player.play_file(tmp_path)
os.unlink(tmp_path)
Trade-offs: espeak produces robotic-sounding speech but works offline, requires no API keys, and has near-zero latency. It is a good fallback when the Jarvis TTS service is unavailable.
Writing a Custom Provider¶
Here is an example that uses macOS system voices via the say command:
from tts_providers.base import IJarvisTextToSpeechProvider
import subprocess
class MacOSSayProvider(IJarvisTextToSpeechProvider):
@property
def provider_name(self) -> str:
return "macos_say"
def speak(self, include_chime: bool, text: str) -> None:
if include_chime:
self.play_chime()
# Use macOS built-in "say" command with the Samantha voice
subprocess.run(
["say", "-v", "Samantha", text],
check=True,
)
Save as tts_providers/macos_say_provider.py, then set "tts_provider": "macos_say" in your node config.
Here is a more advanced example using a cloud TTS API with audio streaming:
from tts_providers.base import IJarvisTextToSpeechProvider
import httpx
import tempfile
import os
class ElevenLabsProvider(IJarvisTextToSpeechProvider):
@property
def provider_name(self) -> str:
return "elevenlabs"
def __init__(self):
self._api_key = self.secret_service.get_secret("ELEVENLABS_API_KEY")
self._voice_id = self.secret_service.get_secret("ELEVENLABS_VOICE_ID") or "default"
def speak(self, include_chime: bool, text: str) -> None:
if not self._api_key:
logger.error("ELEVENLABS_API_KEY not configured")
return
if include_chime:
self.play_chime()
response = httpx.post(
f"https://api.elevenlabs.io/v1/text-to-speech/{self._voice_id}",
headers={
"xi-api-key": self._api_key,
"Content-Type": "application/json",
},
json={
"text": text,
"model_id": "eleven_monolingual_v1",
},
)
if response.status_code != 200:
logger.error(f"ElevenLabs API error: {response.status_code}")
return
# Write audio to temp file and play
with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as tmp:
tmp.write(response.content)
tmp_path = tmp.name
self.audio_player.play_file(tmp_path)
os.unlink(tmp_path)
Chime Behavior¶
The include_chime parameter is True when Jarvis is responding to a voice command (indicating "I heard you, here is my response"). It is False for proactive speech (announcements, timers, etc.) where a chime would be unexpected.
The built-in chime file is located at sounds/chime.wav. To use a custom chime, replace this file or override play_chime() in your provider.