close
Spacebot

Metrics

Prometheus-compatible metrics for monitoring Spacebot's LLM usage, costs, and agent activity.

Metrics

Spacebot exposes Prometheus-compatible metrics for monitoring LLM costs, token usage, agent activity, and memory operations. All telemetry code is behind the metrics cargo feature flag — without it, every instrumentation block compiles out to nothing.

Building with Metrics

cargo build --release --features metrics

Configuration

Add a [metrics] block to your spacebot.toml:

[metrics]
enabled = true
port = 9090
bind = "0.0.0.0"
KeyDefaultDescription
enabledfalseEnable the /metrics HTTP endpoint
port9090Port for the metrics server
bind"0.0.0.0"Address to bind the metrics server

The metrics server runs as a separate tokio task alongside the main API server and shuts down gracefully with the rest of the process.

Endpoints

PathDescription
/metricsPrometheus text exposition format (0.0.4)
/healthReturns 200 OK (for liveness probes)

Exposed Metrics

All metrics are prefixed with spacebot_.

LLM Metrics

These metrics track every LLM completion request, including token counts and estimated costs.

MetricTypeLabelsDescription
spacebot_llm_requests_totalCounteragent_id, model, tierTotal LLM completion requests
spacebot_llm_request_duration_secondsHistogramagent_id, model, tierEnd-to-end LLM request duration
spacebot_llm_tokens_totalCounteragent_id, model, tier, directionToken counts (direction: input, output, cached_input)
spacebot_llm_estimated_cost_dollarsCounteragent_id, model, tierEstimated cost in USD

The tier label corresponds to the process type: channel, branch, worker, compactor, or cortex.

The direction label on token counts distinguishes input tokens, output (completion) tokens, and cached input tokens. Cached tokens are billed at a lower rate by most providers.

Cost estimation uses a built-in pricing table covering Claude, GPT-4o, o-series, Gemini, and DeepSeek models. Unknown models use a conservative fallback rate. Costs are best-effort estimates — exact billing depends on your provider agreement.

Tool Metrics

MetricTypeLabelsDescription
spacebot_tool_calls_totalCounteragent_id, tool_nameTotal tool calls executed
spacebot_tool_call_duration_secondsHistogramTool call execution duration

Agent & Worker Metrics

MetricTypeLabelsDescription
spacebot_active_workersGaugeagent_idCurrently active workers
spacebot_active_branchesGaugeagent_idCurrently active branches
spacebot_worker_duration_secondsHistogramagent_id, worker_typeWorker lifetime duration
spacebot_process_errors_totalCounteragent_id, process_type, error_typeProcess errors by type

Memory Metrics

MetricTypeLabelsDescription
spacebot_memory_reads_totalCounterTotal memory recall operations
spacebot_memory_writes_totalCounterTotal memory save operations
spacebot_memory_entry_countGaugeagent_idTotal memory entries per agent
spacebot_memory_updates_totalCounteragent_id, operationMemory mutations (operation: save, update, delete, forget)

Cost Tracking

Token usage and estimated costs are tracked per-request. To see total estimated spend:

sum(spacebot_llm_estimated_cost_dollars) by (agent_id)

To see spend rate over the last hour:

sum(rate(spacebot_llm_estimated_cost_dollars[1h])) by (agent_id, model) * 3600

To see token throughput:

sum(rate(spacebot_llm_tokens_total[5m])) by (direction)

Prometheus Scrape Config

scrape_configs:
  - job_name: spacebot
    scrape_interval: 15s
    static_configs:
      - targets: ["localhost:9090"]

Docker

Expose the metrics port alongside the API port:

docker run -d \
  --name spacebot \
  -e ANTHROPIC_API_KEY="sk-ant-..." \
  -v spacebot-data:/data \
  -p 19898:19898 \
  -p 9090:9090 \
  ghcr.io/spacedriveapp/spacebot:latest

The published Docker image already includes metrics support.

Histogram Buckets

MetricBuckets (seconds)
llm_request_duration_seconds0.1, 0.25, 0.5, 1, 2.5, 5, 10, 15, 30, 60, 120
tool_call_duration_seconds0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30
worker_duration_seconds1, 5, 10, 30, 60, 120, 300, 600, 1800

Cardinality

MetricEstimated series
llm_requests_totalagents × models × tiers (~25–375)
llm_tokens_totalagents × models × tiers × 3 directions (~75–1125)
llm_estimated_cost_dollarsagents × models × tiers (~25–375)
tool_calls_totalagents × tools (~20–100)
active_workers / active_branchesagents (~1–5 each)
process_errors_totalagents × process_types × error_types (~15–75)
memory_*1–10 per metric
Total~160–2000

Well within safe operating range for any Prometheus deployment.

On this page