close
Skip to content

Migrate proactive assistant tool loops from direct Gemini to pi-mono agent harness #7002

@beastoin

Description

@beastoin

Problem

The desktop proactive assistants (Task, Insight, Memory, Focus) currently call Gemini directly through a thin proxy (/v1/proxy/gemini/*), resulting in disproportionately high API costs. Root causes:

  1. Token multiplication — each tool-call iteration resends the entire conversation history including base64 screenshots. A 5-iteration task extraction sends the same JPEG image 5×.
  2. No server-side cost control — the Gemini proxy has rate limiting but no cost logging, no model routing, no caching. We can't throttle or optimize without client updates.
  3. No unified observability — Gemini calls bypass the /v2/chat/completions path that has structured cost logging.
  4. Client-side loop overhead — Swift manages the tool loop directly, making it hard to change loop strategy, add caching, or swap models without app releases.

Current architecture

Swift Assistant → GeminiClient.sendImageToolLoop()
    → POST /v1/proxy/gemini/models/{model}:generateContent  (thin proxy, no cost log)
        → generativelanguage.googleapis.com
    ← tool_calls
Swift executes tool locally (search_similar, execute_sql, etc.)
Swift appends result to contents[] and loops (re-sending everything)

Proposed architecture

Swift Assistant → AgentBridge (existing pi-mono bridge)
    → pi-mono adapter (local Node.js, already running on Mac)
        → POST /v2/chat/completions  (unified proxy with cost logging + model routing)
            → Gemini / Claude / future models (server decides)
        ← tool_use events
    ← tool_use relayed to Swift
Swift executes tool locally (same tools, same local data access)
Swift sends tool_result back through bridge
Pi-mono manages the loop (iteration limits, caching, prompt optimization)

Why pi-mono instead of keeping the Swift loop

Pi-mono already runs locally on the user's Mac as a Node.js child process. It has the same local data access as Swift (same machine, same filesystem). The key advantages:

  1. Server-side model routing/v2/chat/completions can route to Gemini, Claude, or any model based on cost/quality tier without app updates
  2. Unified cost logging — all LLM calls flow through one proxy with structured logging
  3. Prompt caching — pi-mono can implement conversation-level caching (don't resend unchanged images)
  4. Loop optimization — change iteration limits, add early termination heuristics, batch tool calls — all without app releases
  5. Tool bridge already existsAgentBridge.swift already handles tool_use/tool_result relay for chat. Same protocol works for assistants.

Required changes

1. Rust backend: Add Gemini provider to /v2/chat/completions

File: desktop/Backend-Rust/src/models/chat_completions.rs

  • Add Gemini variant to Provider enum
  • Add Gemini model routes to MODEL_ROUTES (e.g., omi-gemini-flashgemini-2.5-flash-preview-05-20)
  • Add vision support for Gemini content format

File: desktop/Backend-Rust/src/routes/chat_completions.rs

  • Add translate_to_gemini() — convert OpenAI format → Gemini generateContent format
  • Add Gemini API key resolution in chat_completions() handler
  • Handle Gemini response format → OpenAI response format (including tool_calls)
  • Add Gemini streaming support (SSE → OpenAI SSE)

2. Pi-mono: Add assistant mode

File: desktop/agent/src/adapters/pi-mono.ts

  • Add assistant-mode message type that accepts: system prompt, tools definition, image (base64/URL), context data
  • Implement tool-calling loop management (currently Swift loops; pi-mono would own the loop)
  • Add iteration limits, early termination on no_task_found/no_advice
  • Add prompt caching: detect unchanged images, skip re-encoding
  • Route assistant requests through /v2/chat/completions with Gemini model

3. Swift: Refactor assistants to use AgentBridge

File: desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift

  • Keep as fallback but deprecate for assistant use
  • New: AssistantBridge protocol that wraps AgentBridge for assistant-specific calls

Files: TaskAssistant.swift, InsightAssistant.swift, MemoryAssistant.swift, FocusAssistant.swift

  • Replace direct geminiClient.sendImageToolLoop() calls with assistantBridge.analyze()
  • Move tool definitions to a shared format (pi-mono needs them too)
  • Keep tool execution in Swift (local data access unchanged)
  • InsightAssistant: preserve two-phase pattern (text-only Phase 1 → vision Phase 2) but let pi-mono manage both phases

4. Swift: AgentBridge extensions

File: desktop/Desktop/Sources/Chat/AgentBridge.swift

  • Add assistant-mode query type (distinct from chat)
  • Support passing tool definitions and system prompts per-assistant
  • Support image attachment in assistant queries

Key files

File Role
desktop/Backend-Rust/src/routes/chat_completions.rs Add Gemini provider translation
desktop/Backend-Rust/src/models/chat_completions.rs Add Provider::Gemini, model routes
desktop/agent/src/adapters/pi-mono.ts Add assistant mode with tool loop
desktop/Desktop/Sources/Chat/AgentBridge.swift Extend for assistant queries
desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift Deprecate for assistants
desktop/Desktop/Sources/ProactiveAssistants/Assistants/TaskExtraction/TaskAssistant.swift Migrate to bridge
desktop/Desktop/Sources/ProactiveAssistants/Assistants/Insight/InsightAssistant.swift Migrate (preserve two-phase)
desktop/Desktop/Sources/ProactiveAssistants/Assistants/Memory/MemoryAssistant.swift Migrate to bridge
desktop/Desktop/Sources/ProactiveAssistants/Assistants/Focus/FocusAssistant.swift Migrate to bridge

Migration plan

Phase 1: Gemini backend support

Add Provider::Gemini to /v2/chat/completions with translation layer. This alone enables cost logging for Gemini calls.

Phase 2: Pi-mono assistant mode

Add assistant-mode to pi-mono with tool loop management. Test with TaskAssistant (simplest loop, highest frequency).

Phase 3: Full migration

Migrate all 4 assistants. InsightAssistant last (most complex, two-phase pattern).

Cost optimization opportunities once migrated

  • Image caching: Don't resend base64 image if screenshot unchanged between tool iterations
  • Model tiering: Use flash for no_task_found fast-path (majority of task extractions), pro only when tools are invoked
  • Prompt compression: Strip completed/deleted task lists when they haven't changed
  • Server-side throttle: Rate limit per-user assistant calls at the proxy level
  • Batch tool calls: Some iterations could batch multiple tool calls in one request

Risk areas

  • Latency: Adding pi-mono as intermediary adds ~10-50ms per round-trip (local IPC). Acceptable for 10-min interval assistants.
  • Two-phase Insight: Must preserve the text-only → vision handoff. Pi-mono needs to manage phase transitions.
  • Error handling: Currently Swift handles Gemini errors directly. Need equivalent error propagation through the bridge.
  • Offline/fallback: If pi-mono crashes, assistants should fall back to direct Gemini (keep GeminiClient as fallback).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions