close
Skip to content

Floating bar chat latency: 6.6-11.1s per query — bottleneck analysis with logs #6981

@beastoin

Description

@beastoin

Summary

Floating bar chat takes 6.6-11.1s (avg 8.6s) to respond to "what do you see on my screen?" queries. Tested 7 runs on Omi Beta v0.11.358 (production build, Mac Mini M4, ACP/Claude OAuth).

Sub-second is the UX target. Current architecture makes 3-4s achievable with quick wins; sub-second requires on-device vision.

Test Setup

  • App: Omi Beta v0.11.358 (production)
  • Query: "what do you see on my screen?" × 7 runs via floating bar
  • Provider: ACP (Claude OAuth) → claude-sonnet-4-6
  • Machine: Mac Mini M4, 24GB RAM, macOS Tahoe
  • System prompt: 27,387 chars (~36K tokens)

Results

Run Total Screenshot Quota Check Session LLM API Save/Sync Cache
1 9,520ms 36ms 1,629ms 689ms 6,600ms 566ms MISS
2 8,986ms 136ms 1,215ms 1ms 5,917ms 1,717ms MISS
3 7,918ms 133ms 1,885ms 1ms 5,242ms 657ms HIT
4 7,724ms 137ms 1,762ms 2ms 5,324ms 499ms MISS
5 8,077ms 138ms 1,794ms 1ms HIT
6 11,114ms 2ms 2,608ms 1ms 6,450ms 2,053ms HIT
7 6,563ms 1ms 714ms 1ms 5,381ms 466ms HIT
AVG 8,557ms 83ms 1,658ms 99ms 4,987ms 851ms

Pipeline Waterfall (best case — Run 7, 6.6s)

T+0ms     Screenshot capture (CGDisplayCreateImage → WebP 134KB)
T+1ms     Query dispatched to ChatProvider.sendMessage()
T+1ms     AgentBridge.query() → await fetchChatUsageQuota()  ← BLOCKS HERE
T+715ms   Quota OK → JSON serialized (base64 image + 27K system prompt)
T+716ms   Sent to Node.js bridge via stdin pipe
T+716ms   ACP session reused (key=floating, pre-warmed)
T+716ms   → Claude Sonnet 4.6 API call starts  ← BLOCKS HERE
T+6097ms  LLM response complete (5,381ms inference)
T+6097ms  Save to backend + Firebase sync + analytics
T+6563ms  DONE

3 Bottlenecks

1. LLM Inference — 5.2-6.6s (82% of best-case time)

System prompt is ~36K tokens:

Component Size Needed for floating bar?
base_template 9,542 chars Yes (persona, instructions)
schema 12,280 chars (45 tables) No — floating bar doesn't use SQL tools
context/memories 2,619 chars Partial
ai_profile 2,162 chars Yes
tasks 601 chars Maybe
goals 183 chars Maybe

Plus 1920×1080 screenshot (120-134 KB WebP → ~163 KB base64) for vision processing.

Prompt cache hit rate: 57% (4/7 runs). Cache misses add ~800ms to inference and cost 8× more ($0.24 vs $0.03 per query).

Code: System prompt built at ChatProvider.swift:857, cached in cachedMainSystemPrompt. Same prompt used for main chat and floating bar — no differentiation.

2. Quota Check — 0.7-2.6s (11-23% of time)

Sequential blocking await before any query is sent:

// AgentBridge.swift:422
if let quota = await APIClient.shared.fetchChatUsageQuota(), !quota.allowed {
    throw BridgeError.quotaExceeded(...)
}

Endpoint: GET api.omi.me/v1/users/me/usage-quota

Not parallelized with screenshot capture, JSON serialization, or anything else. Pure serial wait.

3. Save/Sync — 0.5-2.1s (7-18% of time)

After LLM response is already rendered to the user:

  • POST api.omi.me/v2/desktop/messages (response persistence)
  • AgentSync push (Firebase)
  • PostHog event tracking
  • GoalsAI progress check

Code: ChatProvider.swift:2523-2570

Current vs Optimal Pipeline

CURRENT (sequential):
  Screenshot → Send → [WAIT quota 1.7s] → [WAIT LLM 5.7s] → [WAIT save 0.9s] → Done
                                                                                  8.6s avg

OPTIMAL (parallel + slim prompt):
  Screenshot ──┐
  Quota check ─┤ (parallel)
               └─► [LLM ~3-4s with slim prompt] → Done
                                                    └─► Save (background)
                                                    3-4s estimated

Proposed Fixes

# Change Savings Effort Code Location
1 Optimistic/cached quota check 1.0-1.8s Low AgentBridge.swift:422
2 Background save/sync 0.5-1.5s Low ChatProvider.swift:2523-2570
3 Slim floating bar prompt (drop schema, skills) 1.0-2.0s Medium ChatProvider.swift:857
4 Half-resolution screenshot (960×540) 0.5-1.0s Low ScreenCaptureManager.swift
Total 3.0-6.3s

Estimated new time: ~2-4s (down from 6.6-11.1s)

Sub-second requires on-device vision (Apple Vision + CoreML) or pre-computed responses — not achievable with cloud LLM.

Token Usage & Cost

Run Total Tokens CacheRead CacheWrite Cost
1 36,759 0 36,692 $0.231
2 38,388 0 38,331 $0.241
3 40,004 38,331 1,629 $0.030
4 41,613 0 41,576 $0.261
5 43,227 41,576 1,609 $0.032
6 44,843 43,185 1,614 $0.033
7 46,449 44,799 1,616 $0.033

Cache miss: ~$0.24/query. Cache hit: ~$0.03/query (8× cheaper).

Raw Logs

Run 1 — Cold start (9.5s)
[15:31:32.827] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 120 KB
[15:31:32.863] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:31:32.864] [app] ChatProvider loaded 50 memories from local DB
[15:31:32.865] [app] ChatProvider loaded 4 goals from local DB
[15:31:32.866] [app] ChatProvider loaded 8 tasks for context
[15:31:32.867] [app] ChatProvider loaded AI profile (generated 2026-04-23 12:59:33 +0000)
[15:31:32.868] [app] ChatProvider loaded schema for 45 tables
[15:31:32.869] [app] AgentBridge: starting with node=...node (exists=true), bridge=...index.js (exists=true)
[15:31:32.920] [app] AgentBridge stderr: [agent] Bridge main() starting (pid=55280, node=v22.14.0)
[15:31:32.920] [app] AgentBridge stderr: [agent] Harness mode: acp
[15:31:32.922] [app] AgentBridge: bridge ready (sessionId=)
[15:31:32.926] [app] ChatProvider: prompt built — schema: yes, goals: 4, tasks: 8, ai_profile: yes, memories: 50, history: none, prompt_length: 27387 chars
[15:31:32.927] [app] ChatProvider: prompt breakdown — base_template:9542c, context:2619c, goals:183c, tasks:601c, ai_profile:2162c, schema:12280c
[15:31:32.928] [app] AgentBridge stderr: [agent] Warmup requested (cwd=default, sessions=main, floating)
[15:31:33.011] [app] AgentBridge stderr: [agent] ACP initialized
[15:31:34.492] [app] APIClient: Quota plan=Operator unit=questions used=277.0 limit=500.0 allowed=true
[15:31:34.494] [app] AgentBridge stderr: [agent] Query mode: act
[15:31:35.181] [app] AgentBridge stderr: [agent] Pre-warmed session: d8343433-... (key=floating, model=claude-sonnet-4-6)
[15:31:35.212] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:31:41.781] [app] AgentBridge stderr: [agent] Usage: model=claude-sonnet-4-6, cost=$0.23094, cacheWrite=36692, cacheRead=0, total=36759
[15:31:41.781] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:31:42.347] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 2 — Warm bridge, cache miss (9.0s)
[15:36:55.356] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 122 KB
[15:36:55.492] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:36:56.707] [app] APIClient: Quota plan=Operator unit=questions used=279.0 limit=500.0 allowed=true
[15:36:56.708] [app] AgentBridge stderr: [agent] Query mode: act
[15:36:56.708] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:37:02.624] [app] AgentBridge stderr: [agent] Usage: cost=$0.24093, cacheWrite=38331, cacheRead=0, total=38388
[15:37:02.625] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:37:04.342] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 3 — Cache hit (7.9s)
[15:38:05.801] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 124 KB
[15:38:05.934] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:38:07.819] [app] APIClient: Quota plan=Operator unit=questions used=281.0 limit=500.0 allowed=true
[15:38:07.820] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:38:13.062] [app] AgentBridge stderr: [agent] Usage: cost=$0.03039, cacheWrite=1629, cacheRead=38331, total=40004
[15:38:13.062] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:38:13.719] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 4 — Cache miss (7.7s)
[15:43:51.787] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 130 KB
[15:43:51.924] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:43:53.686] [app] APIClient: Quota plan=Operator unit=questions used=283.0 limit=500.0 allowed=true
[15:43:53.688] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:43:59.012] [app] AgentBridge stderr: [agent] Usage: cost=$0.26072, cacheWrite=41576, cacheRead=0, total=41613
[15:43:59.012] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:43:59.511] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 5 — Cache hit (8.1s)
[15:45:00.514] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 130 KB
[15:45:00.652] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:45:02.446] [app] APIClient: Quota plan=Operator unit=questions used=285.0 limit=500.0 allowed=true
[15:45:02.447] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:45:07.212] [app] AgentBridge stderr: [agent] Usage: cost=$0.03183, cacheWrite=1609, cacheRead=41576, total=43227
[15:45:08.591] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 6 — Worst case (11.1s)
[15:47:55.272] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 134 KB
[15:47:55.274] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:47:57.882] [app] APIClient: Quota plan=Operator unit=questions used=287.0 limit=500.0 allowed=true
[15:47:57.883] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:48:04.333] [app] AgentBridge stderr: [agent] Usage: cost=$0.03272, cacheWrite=1616, cacheRead=43185, total=44843 [CACHE MISS]
[15:48:04.333] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:48:06.386] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 7 — Best case (6.6s)
[15:49:44.856] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 134 KB
[15:49:44.857] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:49:45.571] [app] APIClient: Quota plan=Operator unit=questions used=289.0 limit=500.0 allowed=true
[15:49:45.572] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:49:50.953] [app] AgentBridge stderr: [agent] Usage: cost=$0.03329, cacheWrite=1616, cacheRead=44799, total=46449
[15:49:50.953] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:49:51.419] [app] PostHog: Tracked event 'chat_agent_query_completed'

Tested by ren (AI agent) for @beastoin — 2026-04-23

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions