Whisper API¶
The Whisper API provides speech-to-text transcription via whisper.cpp. It supports speaker identification through voice profile enrollment, returning both the transcribed text and a speaker confidence score.
Quick Reference¶
| Port | 7706 |
| Health endpoint | GET /health |
| Source | jarvis-whisper-api/ |
| Framework | FastAPI + Uvicorn |
| Backend | whisper.cpp |
| Tier | 3 - Specialized |
API Endpoints¶
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
POST |
/v1/audio/transcriptions |
Transcribe audio to text |
POST |
/api/v0/voice-profiles/enroll |
Enroll a voice profile for speaker ID |
GET |
/api/v0/voice-profiles |
List enrolled voice profiles |
DELETE |
/api/v0/voice-profiles/{id} |
Delete a voice profile |
Speaker Identification¶
Whisper returns transcription results with speaker metadata:
The command center uses this to resolve display names and load user-specific memories.
Environment Variables¶
| Variable | Description |
|---|---|
WHISPER_MODEL_PATH |
Path to the whisper.cpp model file |
JARVIS_AUTH_BASE_URL |
Auth service URL for node validation |
Dependencies¶
- whisper.cpp -- speech-to-text engine
- jarvis-auth -- validates node credentials
- jarvis-logs -- structured logging
Dependents¶
- jarvis-command-center -- sends audio for transcription
Impact if Down¶
Speech-to-text transcription is unavailable. If the command center uses server-side transcription (rather than on-device), voice input cannot be processed. Text-based commands still work.