Migrate proactive assistant tool loops from direct Gemini to pi-mono agent harness

## Problem

The desktop proactive assistants (Task, Insight, Memory, Focus) currently call Gemini directly through a thin proxy (`/v1/proxy/gemini/*`), resulting in disproportionately high API costs. Root causes:

1. **Token multiplication** — each tool-call iteration resends the entire conversation history including base64 screenshots. A 5-iteration task extraction sends the same JPEG image 5×.
2. **No server-side cost control** — the Gemini proxy has rate limiting but no cost logging, no model routing, no caching. We can't throttle or optimize without client updates.
3. **No unified observability** — Gemini calls bypass the `/v2/chat/completions` path that has structured cost logging.
4. **Client-side loop overhead** — Swift manages the tool loop directly, making it hard to change loop strategy, add caching, or swap models without app releases.

### Current architecture
```
Swift Assistant → GeminiClient.sendImageToolLoop()
    → POST /v1/proxy/gemini/models/{model}:generateContent  (thin proxy, no cost log)
        → generativelanguage.googleapis.com
    ← tool_calls
Swift executes tool locally (search_similar, execute_sql, etc.)
Swift appends result to contents[] and loops (re-sending everything)
```

### Proposed architecture
```
Swift Assistant → AgentBridge (existing pi-mono bridge)
    → pi-mono adapter (local Node.js, already running on Mac)
        → POST /v2/chat/completions  (unified proxy with cost logging + model routing)
            → Gemini / Claude / future models (server decides)
        ← tool_use events
    ← tool_use relayed to Swift
Swift executes tool locally (same tools, same local data access)
Swift sends tool_result back through bridge
Pi-mono manages the loop (iteration limits, caching, prompt optimization)
```

## Why pi-mono instead of keeping the Swift loop

Pi-mono already runs locally on the user's Mac as a Node.js child process. It has the same local data access as Swift (same machine, same filesystem). The key advantages:

1. **Server-side model routing** — `/v2/chat/completions` can route to Gemini, Claude, or any model based on cost/quality tier without app updates
2. **Unified cost logging** — all LLM calls flow through one proxy with structured logging
3. **Prompt caching** — pi-mono can implement conversation-level caching (don't resend unchanged images)
4. **Loop optimization** — change iteration limits, add early termination heuristics, batch tool calls — all without app releases
5. **Tool bridge already exists** — `AgentBridge.swift` already handles `tool_use`/`tool_result` relay for chat. Same protocol works for assistants.

## Required changes

### 1. Rust backend: Add Gemini provider to `/v2/chat/completions`

**File: `desktop/Backend-Rust/src/models/chat_completions.rs`**
- Add `Gemini` variant to `Provider` enum
- Add Gemini model routes to `MODEL_ROUTES` (e.g., `omi-gemini-flash` → `gemini-2.5-flash-preview-05-20`)
- Add vision support for Gemini content format

**File: `desktop/Backend-Rust/src/routes/chat_completions.rs`**
- Add `translate_to_gemini()` — convert OpenAI format → Gemini generateContent format
- Add Gemini API key resolution in `chat_completions()` handler
- Handle Gemini response format → OpenAI response format (including tool_calls)
- Add Gemini streaming support (SSE → OpenAI SSE)

### 2. Pi-mono: Add assistant mode

**File: `desktop/agent/src/adapters/pi-mono.ts`**
- Add assistant-mode message type that accepts: system prompt, tools definition, image (base64/URL), context data
- Implement tool-calling loop management (currently Swift loops; pi-mono would own the loop)
- Add iteration limits, early termination on `no_task_found`/`no_advice`
- Add prompt caching: detect unchanged images, skip re-encoding
- Route assistant requests through `/v2/chat/completions` with Gemini model

### 3. Swift: Refactor assistants to use AgentBridge

**File: `desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift`**
- Keep as fallback but deprecate for assistant use
- New: `AssistantBridge` protocol that wraps AgentBridge for assistant-specific calls

**Files: TaskAssistant.swift, InsightAssistant.swift, MemoryAssistant.swift, FocusAssistant.swift**
- Replace direct `geminiClient.sendImageToolLoop()` calls with `assistantBridge.analyze()`
- Move tool definitions to a shared format (pi-mono needs them too)
- Keep tool execution in Swift (local data access unchanged)
- InsightAssistant: preserve two-phase pattern (text-only Phase 1 → vision Phase 2) but let pi-mono manage both phases

### 4. Swift: AgentBridge extensions

**File: `desktop/Desktop/Sources/Chat/AgentBridge.swift`**
- Add assistant-mode query type (distinct from chat)
- Support passing tool definitions and system prompts per-assistant
- Support image attachment in assistant queries

## Key files

| File | Role |
|------|------|
| `desktop/Backend-Rust/src/routes/chat_completions.rs` | Add Gemini provider translation |
| `desktop/Backend-Rust/src/models/chat_completions.rs` | Add Provider::Gemini, model routes |
| `desktop/agent/src/adapters/pi-mono.ts` | Add assistant mode with tool loop |
| `desktop/Desktop/Sources/Chat/AgentBridge.swift` | Extend for assistant queries |
| `desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift` | Deprecate for assistants |
| `desktop/Desktop/Sources/ProactiveAssistants/Assistants/TaskExtraction/TaskAssistant.swift` | Migrate to bridge |
| `desktop/Desktop/Sources/ProactiveAssistants/Assistants/Insight/InsightAssistant.swift` | Migrate (preserve two-phase) |
| `desktop/Desktop/Sources/ProactiveAssistants/Assistants/Memory/MemoryAssistant.swift` | Migrate to bridge |
| `desktop/Desktop/Sources/ProactiveAssistants/Assistants/Focus/FocusAssistant.swift` | Migrate to bridge |

## Migration plan

### Phase 1: Gemini backend support
Add `Provider::Gemini` to `/v2/chat/completions` with translation layer. This alone enables cost logging for Gemini calls.

### Phase 2: Pi-mono assistant mode
Add assistant-mode to pi-mono with tool loop management. Test with TaskAssistant (simplest loop, highest frequency).

### Phase 3: Full migration
Migrate all 4 assistants. InsightAssistant last (most complex, two-phase pattern).

## Cost optimization opportunities once migrated

- **Image caching**: Don't resend base64 image if screenshot unchanged between tool iterations
- **Model tiering**: Use flash for `no_task_found` fast-path (majority of task extractions), pro only when tools are invoked
- **Prompt compression**: Strip completed/deleted task lists when they haven't changed
- **Server-side throttle**: Rate limit per-user assistant calls at the proxy level
- **Batch tool calls**: Some iterations could batch multiple tool calls in one request

## Risk areas

- **Latency**: Adding pi-mono as intermediary adds ~10-50ms per round-trip (local IPC). Acceptable for 10-min interval assistants.
- **Two-phase Insight**: Must preserve the text-only → vision handoff. Pi-mono needs to manage phase transitions.
- **Error handling**: Currently Swift handles Gemini errors directly. Need equivalent error propagation through the bridge.
- **Offline/fallback**: If pi-mono crashes, assistants should fall back to direct Gemini (keep GeminiClient as fallback).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate proactive assistant tool loops from direct Gemini to pi-mono agent harness #7002

Problem

Current architecture

Proposed architecture

Why pi-mono instead of keeping the Swift loop

Required changes

1. Rust backend: Add Gemini provider to `/v2/chat/completions`

2. Pi-mono: Add assistant mode

3. Swift: Refactor assistants to use AgentBridge

4. Swift: AgentBridge extensions

Key files

Migration plan

Phase 1: Gemini backend support

Phase 2: Pi-mono assistant mode

Phase 3: Full migration

Cost optimization opportunities once migrated

Risk areas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	Role
`desktop/Backend-Rust/src/routes/chat_completions.rs`	Add Gemini provider translation
`desktop/Backend-Rust/src/models/chat_completions.rs`	Add Provider::Gemini, model routes
`desktop/agent/src/adapters/pi-mono.ts`	Add assistant mode with tool loop
`desktop/Desktop/Sources/Chat/AgentBridge.swift`	Extend for assistant queries
`desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift`	Deprecate for assistants
`desktop/Desktop/Sources/ProactiveAssistants/Assistants/TaskExtraction/TaskAssistant.swift`	Migrate to bridge
`desktop/Desktop/Sources/ProactiveAssistants/Assistants/Insight/InsightAssistant.swift`	Migrate (preserve two-phase)
`desktop/Desktop/Sources/ProactiveAssistants/Assistants/Memory/MemoryAssistant.swift`	Migrate to bridge
`desktop/Desktop/Sources/ProactiveAssistants/Assistants/Focus/FocusAssistant.swift`	Migrate to bridge

Migrate proactive assistant tool loops from direct Gemini to pi-mono agent harness #7002

Description

Problem

Current architecture

Proposed architecture

Why pi-mono instead of keeping the Swift loop

Required changes

1. Rust backend: Add Gemini provider to /v2/chat/completions

2. Pi-mono: Add assistant mode

3. Swift: Refactor assistants to use AgentBridge

4. Swift: AgentBridge extensions

Key files

Migration plan

Phase 1: Gemini backend support

Phase 2: Pi-mono assistant mode

Phase 3: Full migration

Cost optimization opportunities once migrated

Risk areas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Rust backend: Add Gemini provider to `/v2/chat/completions`