Skip to content

Architecture Overview

Jarvis is a distributed voice assistant built from small, focused microservices. Pi Zero nodes capture voice input, a central command center orchestrates processing, and specialized services handle speech-to-text, LLM inference, text-to-speech, and more.

System Diagram

graph TD
    Node["Pi Zero Node"] --> CC["Command Center<br/>:7703"]
    CC --> Auth["Auth<br/>:7701"]
    CC --> Config["Config Service<br/>:7700"]
    CC --> Logs["Logs<br/>:7702"]
    CC --> LLM["LLM Proxy<br/>:7704"]
    CC --> Whisper["Whisper API<br/>:7706"]
    CC --> TTS["TTS<br/>:7707"]
    CC --> Notify["Notifications<br/>:7712"]
    CC --> Scraper["Web Scraper"]

    LLM -.->|"Metal/MLX (macOS)<br/>CUDA (Linux)"| GPU["GPU"]

    Auth --> PG["PostgreSQL"]
    CC --> PG
    Config --> PG
    Notify --> PG

    subgraph "Tier 0 — Foundation"
        Config
        PG
    end

    subgraph "Tier 1 — Infrastructure"
        Auth
        Logs
    end

    subgraph "Tier 2 — Core"
        CC
        LLM
    end

    subgraph "Tier 3 — Specialized"
        Whisper
        TTS
        Notify
        OCR["OCR Service<br/>:7031"]
        Recipes["Recipes<br/>:7030"]
    end

Dependency Tiers

Services are organized into tiers based on how many other services depend on them. Lower tiers must start first.

Tier Name Services Role
0 Foundation Config Service (7700), PostgreSQL Service discovery and persistent storage. Every other service depends on these.
1 Infrastructure Auth (7701), Logs (7702) Authentication and observability. Most services require auth; logs degrade gracefully if unavailable.
2 Core Command Center (7703), LLM Proxy (7704/7705) Voice command orchestration and LLM inference. The main processing pipeline.
3 Specialized Whisper (7706), TTS (7707), OCR (7031), Recipes (7030), Notifications (7712) Domain-specific services called by the command center as needed.
4 Management Settings Server (7708), MCP (7709), Admin UI (7710) Developer and admin tooling. No runtime services depend on these.
5 Clients Pi Zero nodes, mobile app End-user devices that connect to the command center.

Critical Path for Voice Commands

For a voice command to flow from microphone to speaker, these services must be running:

graph LR
    CS["Config Service"] --> Auth["Auth"]
    Auth --> CC["Command Center"]
    CC --> LLM["LLM Proxy"]
    LLM --> Response["Response"]

    CC -.->|"optional"| W["Whisper API"]
    CC -.->|"optional"| TTS["TTS"]
    CC -.->|"optional"| Logs["Logs"]
Service Required? Impact if Down
Config Service Yes No service can discover other services
Auth Yes No authentication, all requests rejected
Command Center Yes No voice command processing
LLM Proxy Yes No intent classification or response generation
Whisper API Conditional No speech-to-text (needed if nodes send audio)
TTS Conditional No spoken responses (text responses still work)
Logs No Services continue; logs fall back to console

Architectural Principles

FastAPI + Uvicorn Everywhere

Every Python service uses FastAPI with Uvicorn as the ASGI server. This provides:

  • Automatic OpenAPI documentation at /docs
  • Async request handling
  • Pydantic validation on all request/response models

PostgreSQL for Persistent Data

Auth, Command Center, Config Service, Recipes, and Notifications all use PostgreSQL. Schema migrations are managed with Alembic.

JWT Authentication

Three auth patterns cover all communication needs:

  • Node auth: API keys (X-API-Key header)
  • App-to-app auth: Service credentials (X-Jarvis-App-Id + X-Jarvis-App-Key headers)
  • User auth: JWT bearer tokens (Authorization: Bearer <token>)

See Authentication for details.

Docker Containers

All services run in Docker containers orchestrated by Docker Compose. GPU-dependent services may run locally on macOS to access Metal/MLX. See Deployment for platform-specific details.

Centralized Logging

All services use jarvis-log-client to send structured logs to jarvis-logs (backed by Loki + Grafana). No print() statements in production code.

Service Discovery

Services register with and discover each other through jarvis-config-service. No hardcoded URLs between services. See Service Discovery for details.

Service Inventory

Service Port Description
Config Service 7700 Service discovery and registration
Auth 7701 JWT authentication, node registration, app credentials
Logs 7702 Centralized structured logging (Loki + Grafana)
Command Center 7703 Voice command orchestration, tool routing, memory
LLM Proxy 7704/7705 LLM inference (MLX on macOS, llama.cpp/vLLM on Linux)
Whisper API 7706 Speech-to-text via whisper.cpp with speaker identification
TTS 7707 Text-to-speech via Piper TTS
Settings Server 7708 Runtime settings aggregation
MCP 7709 Claude Code tool integration
Admin UI 7710 Web administration interface
Notifications 7712 Push notifications, inbox, device tokens
Recipes 7030 Recipe CRUD, meal planning, OCR import
OCR Service 7031 Image-to-text (Apple Vision on macOS, Tesseract on Linux)