Testing¶

TDD Workflow¶

Test-driven development is mandatory for all Jarvis code. Every feature starts with a failing test.

RED -- Write a test that defines the expected behavior. Run it and confirm it fails.
GREEN -- Write the minimum code to make the test pass.
REFACTOR -- Clean up the implementation while keeping all tests green.

# 1. Write the test in tests/test_new_feature.py
# 2. Run it (should fail)
pytest tests/test_new_feature.py -v

# 3. Implement the feature
# 4. Run again (should pass)
pytest tests/test_new_feature.py -v

# 5. Refactor, run again (should still pass)

Running Tests¶

Via the Jarvis CLI¶

The ./jarvis test command handles virtual environment activation and test configuration for each service:

# Run tests for a specific service
./jarvis test jarvis-auth
./jarvis test jarvis-command-center

# Run tests for all services
./jarvis test --all

Directly with pytest¶

Each Python service uses pytest. Activate the service's virtual environment first:

cd jarvis-auth
source .venv/bin/activate
pytest -v --tb=short

For the mobile app:

cd jarvis-node-mobile
npm test
npm run test:coverage

Coverage Targets¶

Target: 80%+ coverage for all services
Current coverage by service:

Service	Coverage
jarvis-config-service	93%
jarvis-tts	98%
jarvis-notifications	77%
jarvis-auth	Good
jarvis-command-center	Good

Use --cov to check coverage:

pytest -v --tb=short --cov=app --cov-report=term-missing

E2E Tests¶

End-to-end tests validate the full voice pipeline from text input through command execution.

Command Parsing Tests¶

Tests intent classification and parameter extraction (the "front half" of the pipeline):

cd jarvis-node-setup

# List all available tests
python test_command_parsing.py -l

# Run all tests
python test_command_parsing.py

# Run specific tests by index
python test_command_parsing.py -t 5 7 11

# Run tests for specific commands
python test_command_parsing.py -c calculate get_weather

# Custom output file
python test_command_parsing.py -o results.json

Required services: Command Center (7703), LLM Proxy (7704)

Multi-Turn Conversation Tests¶

Tests tool execution, validation flow, and context preservation (the "back half"):

cd jarvis-node-setup

# Fast mode (text only, no audio pipeline)
python test_multi_turn_conversation.py

# Full mode (TTS + Whisper audio pipeline)
python test_multi_turn_conversation.py --full

# Run a specific category
python test_multi_turn_conversation.py -c validation

# Save audio artifacts
python test_multi_turn_conversation.py --full -t 0 1 2 --save-audio ./audio/

Test categories: tool_execution, validation, result_incorporation, context, error_handling, complex

Required services: Command Center (7703), LLM Proxy (7704). For full mode: TTS (7707), Whisper API (7706).

Test Results¶

E2E test results are written to JSON files containing:

Summary with pass/fail counts, success rate, and response times
Per-test results with expected vs actual output
Analysis with command success rates and a confusion matrix
Recommendations for improving low-performing commands

Performance Target¶

Total end-to-end voice latency target: under 5 seconds, including:

Whisper transcription (speech-to-text)
Date context extraction
Command inference (tool routing)
Command execution and response generation