Problem
The desktop proactive assistants (Task, Insight, Memory, Focus) currently call Gemini directly through a thin proxy (/v1/proxy/gemini/*), resulting in disproportionately high API costs. Root causes:
- Token multiplication — each tool-call iteration resends the entire conversation history including base64 screenshots. A 5-iteration task extraction sends the same JPEG image 5×.
- No server-side cost control — the Gemini proxy has rate limiting but no cost logging, no model routing, no caching. We can't throttle or optimize without client updates.
- No unified observability — Gemini calls bypass the
/v2/chat/completions path that has structured cost logging.
- Client-side loop overhead — Swift manages the tool loop directly, making it hard to change loop strategy, add caching, or swap models without app releases.
Current architecture
Swift Assistant → GeminiClient.sendImageToolLoop()
→ POST /v1/proxy/gemini/models/{model}:generateContent (thin proxy, no cost log)
→ generativelanguage.googleapis.com
← tool_calls
Swift executes tool locally (search_similar, execute_sql, etc.)
Swift appends result to contents[] and loops (re-sending everything)
Proposed architecture
Swift Assistant → AgentBridge (existing pi-mono bridge)
→ pi-mono adapter (local Node.js, already running on Mac)
→ POST /v2/chat/completions (unified proxy with cost logging + model routing)
→ Gemini / Claude / future models (server decides)
← tool_use events
← tool_use relayed to Swift
Swift executes tool locally (same tools, same local data access)
Swift sends tool_result back through bridge
Pi-mono manages the loop (iteration limits, caching, prompt optimization)
Why pi-mono instead of keeping the Swift loop
Pi-mono already runs locally on the user's Mac as a Node.js child process. It has the same local data access as Swift (same machine, same filesystem). The key advantages:
- Server-side model routing —
/v2/chat/completions can route to Gemini, Claude, or any model based on cost/quality tier without app updates
- Unified cost logging — all LLM calls flow through one proxy with structured logging
- Prompt caching — pi-mono can implement conversation-level caching (don't resend unchanged images)
- Loop optimization — change iteration limits, add early termination heuristics, batch tool calls — all without app releases
- Tool bridge already exists —
AgentBridge.swift already handles tool_use/tool_result relay for chat. Same protocol works for assistants.
Required changes
1. Rust backend: Add Gemini provider to /v2/chat/completions
File: desktop/Backend-Rust/src/models/chat_completions.rs
- Add
Gemini variant to Provider enum
- Add Gemini model routes to
MODEL_ROUTES (e.g., omi-gemini-flash → gemini-2.5-flash-preview-05-20)
- Add vision support for Gemini content format
File: desktop/Backend-Rust/src/routes/chat_completions.rs
- Add
translate_to_gemini() — convert OpenAI format → Gemini generateContent format
- Add Gemini API key resolution in
chat_completions() handler
- Handle Gemini response format → OpenAI response format (including tool_calls)
- Add Gemini streaming support (SSE → OpenAI SSE)
2. Pi-mono: Add assistant mode
File: desktop/agent/src/adapters/pi-mono.ts
- Add assistant-mode message type that accepts: system prompt, tools definition, image (base64/URL), context data
- Implement tool-calling loop management (currently Swift loops; pi-mono would own the loop)
- Add iteration limits, early termination on
no_task_found/no_advice
- Add prompt caching: detect unchanged images, skip re-encoding
- Route assistant requests through
/v2/chat/completions with Gemini model
3. Swift: Refactor assistants to use AgentBridge
File: desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift
- Keep as fallback but deprecate for assistant use
- New:
AssistantBridge protocol that wraps AgentBridge for assistant-specific calls
Files: TaskAssistant.swift, InsightAssistant.swift, MemoryAssistant.swift, FocusAssistant.swift
- Replace direct
geminiClient.sendImageToolLoop() calls with assistantBridge.analyze()
- Move tool definitions to a shared format (pi-mono needs them too)
- Keep tool execution in Swift (local data access unchanged)
- InsightAssistant: preserve two-phase pattern (text-only Phase 1 → vision Phase 2) but let pi-mono manage both phases
4. Swift: AgentBridge extensions
File: desktop/Desktop/Sources/Chat/AgentBridge.swift
- Add assistant-mode query type (distinct from chat)
- Support passing tool definitions and system prompts per-assistant
- Support image attachment in assistant queries
Key files
| File |
Role |
desktop/Backend-Rust/src/routes/chat_completions.rs |
Add Gemini provider translation |
desktop/Backend-Rust/src/models/chat_completions.rs |
Add Provider::Gemini, model routes |
desktop/agent/src/adapters/pi-mono.ts |
Add assistant mode with tool loop |
desktop/Desktop/Sources/Chat/AgentBridge.swift |
Extend for assistant queries |
desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift |
Deprecate for assistants |
desktop/Desktop/Sources/ProactiveAssistants/Assistants/TaskExtraction/TaskAssistant.swift |
Migrate to bridge |
desktop/Desktop/Sources/ProactiveAssistants/Assistants/Insight/InsightAssistant.swift |
Migrate (preserve two-phase) |
desktop/Desktop/Sources/ProactiveAssistants/Assistants/Memory/MemoryAssistant.swift |
Migrate to bridge |
desktop/Desktop/Sources/ProactiveAssistants/Assistants/Focus/FocusAssistant.swift |
Migrate to bridge |
Migration plan
Phase 1: Gemini backend support
Add Provider::Gemini to /v2/chat/completions with translation layer. This alone enables cost logging for Gemini calls.
Phase 2: Pi-mono assistant mode
Add assistant-mode to pi-mono with tool loop management. Test with TaskAssistant (simplest loop, highest frequency).
Phase 3: Full migration
Migrate all 4 assistants. InsightAssistant last (most complex, two-phase pattern).
Cost optimization opportunities once migrated
- Image caching: Don't resend base64 image if screenshot unchanged between tool iterations
- Model tiering: Use flash for
no_task_found fast-path (majority of task extractions), pro only when tools are invoked
- Prompt compression: Strip completed/deleted task lists when they haven't changed
- Server-side throttle: Rate limit per-user assistant calls at the proxy level
- Batch tool calls: Some iterations could batch multiple tool calls in one request
Risk areas
- Latency: Adding pi-mono as intermediary adds ~10-50ms per round-trip (local IPC). Acceptable for 10-min interval assistants.
- Two-phase Insight: Must preserve the text-only → vision handoff. Pi-mono needs to manage phase transitions.
- Error handling: Currently Swift handles Gemini errors directly. Need equivalent error propagation through the bridge.
- Offline/fallback: If pi-mono crashes, assistants should fall back to direct Gemini (keep GeminiClient as fallback).
Problem
The desktop proactive assistants (Task, Insight, Memory, Focus) currently call Gemini directly through a thin proxy (
/v1/proxy/gemini/*), resulting in disproportionately high API costs. Root causes:/v2/chat/completionspath that has structured cost logging.Current architecture
Proposed architecture
Why pi-mono instead of keeping the Swift loop
Pi-mono already runs locally on the user's Mac as a Node.js child process. It has the same local data access as Swift (same machine, same filesystem). The key advantages:
/v2/chat/completionscan route to Gemini, Claude, or any model based on cost/quality tier without app updatesAgentBridge.swiftalready handlestool_use/tool_resultrelay for chat. Same protocol works for assistants.Required changes
1. Rust backend: Add Gemini provider to
/v2/chat/completionsFile:
desktop/Backend-Rust/src/models/chat_completions.rsGeminivariant toProviderenumMODEL_ROUTES(e.g.,omi-gemini-flash→gemini-2.5-flash-preview-05-20)File:
desktop/Backend-Rust/src/routes/chat_completions.rstranslate_to_gemini()— convert OpenAI format → Gemini generateContent formatchat_completions()handler2. Pi-mono: Add assistant mode
File:
desktop/agent/src/adapters/pi-mono.tsno_task_found/no_advice/v2/chat/completionswith Gemini model3. Swift: Refactor assistants to use AgentBridge
File:
desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swiftAssistantBridgeprotocol that wraps AgentBridge for assistant-specific callsFiles: TaskAssistant.swift, InsightAssistant.swift, MemoryAssistant.swift, FocusAssistant.swift
geminiClient.sendImageToolLoop()calls withassistantBridge.analyze()4. Swift: AgentBridge extensions
File:
desktop/Desktop/Sources/Chat/AgentBridge.swiftKey files
desktop/Backend-Rust/src/routes/chat_completions.rsdesktop/Backend-Rust/src/models/chat_completions.rsdesktop/agent/src/adapters/pi-mono.tsdesktop/Desktop/Sources/Chat/AgentBridge.swiftdesktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swiftdesktop/Desktop/Sources/ProactiveAssistants/Assistants/TaskExtraction/TaskAssistant.swiftdesktop/Desktop/Sources/ProactiveAssistants/Assistants/Insight/InsightAssistant.swiftdesktop/Desktop/Sources/ProactiveAssistants/Assistants/Memory/MemoryAssistant.swiftdesktop/Desktop/Sources/ProactiveAssistants/Assistants/Focus/FocusAssistant.swiftMigration plan
Phase 1: Gemini backend support
Add
Provider::Geminito/v2/chat/completionswith translation layer. This alone enables cost logging for Gemini calls.Phase 2: Pi-mono assistant mode
Add assistant-mode to pi-mono with tool loop management. Test with TaskAssistant (simplest loop, highest frequency).
Phase 3: Full migration
Migrate all 4 assistants. InsightAssistant last (most complex, two-phase pattern).
Cost optimization opportunities once migrated
no_task_foundfast-path (majority of task extractions), pro only when tools are invokedRisk areas