Summary
Floating bar chat takes 6.6-11.1s (avg 8.6s) to respond to "what do you see on my screen?" queries. Tested 7 runs on Omi Beta v0.11.358 (production build, Mac Mini M4, ACP/Claude OAuth).
Sub-second is the UX target. Current architecture makes 3-4s achievable with quick wins; sub-second requires on-device vision.
Test Setup
- App: Omi Beta v0.11.358 (production)
- Query: "what do you see on my screen?" × 7 runs via floating bar
- Provider: ACP (Claude OAuth) → claude-sonnet-4-6
- Machine: Mac Mini M4, 24GB RAM, macOS Tahoe
- System prompt: 27,387 chars (~36K tokens)
Results
| Run |
Total |
Screenshot |
Quota Check |
Session |
LLM API |
Save/Sync |
Cache |
| 1 |
9,520ms |
36ms |
1,629ms |
689ms |
6,600ms |
566ms |
MISS |
| 2 |
8,986ms |
136ms |
1,215ms |
1ms |
5,917ms |
1,717ms |
MISS |
| 3 |
7,918ms |
133ms |
1,885ms |
1ms |
5,242ms |
657ms |
HIT |
| 4 |
7,724ms |
137ms |
1,762ms |
2ms |
5,324ms |
499ms |
MISS |
| 5 |
8,077ms |
138ms |
1,794ms |
1ms |
— |
— |
HIT |
| 6 |
11,114ms |
2ms |
2,608ms |
1ms |
6,450ms |
2,053ms |
HIT |
| 7 |
6,563ms |
1ms |
714ms |
1ms |
5,381ms |
466ms |
HIT |
| AVG |
8,557ms |
83ms |
1,658ms |
99ms |
4,987ms |
851ms |
|
Pipeline Waterfall (best case — Run 7, 6.6s)
T+0ms Screenshot capture (CGDisplayCreateImage → WebP 134KB)
T+1ms Query dispatched to ChatProvider.sendMessage()
T+1ms AgentBridge.query() → await fetchChatUsageQuota() ← BLOCKS HERE
T+715ms Quota OK → JSON serialized (base64 image + 27K system prompt)
T+716ms Sent to Node.js bridge via stdin pipe
T+716ms ACP session reused (key=floating, pre-warmed)
T+716ms → Claude Sonnet 4.6 API call starts ← BLOCKS HERE
T+6097ms LLM response complete (5,381ms inference)
T+6097ms Save to backend + Firebase sync + analytics
T+6563ms DONE
3 Bottlenecks
1. LLM Inference — 5.2-6.6s (82% of best-case time)
System prompt is ~36K tokens:
| Component |
Size |
Needed for floating bar? |
| base_template |
9,542 chars |
Yes (persona, instructions) |
| schema |
12,280 chars (45 tables) |
No — floating bar doesn't use SQL tools |
| context/memories |
2,619 chars |
Partial |
| ai_profile |
2,162 chars |
Yes |
| tasks |
601 chars |
Maybe |
| goals |
183 chars |
Maybe |
Plus 1920×1080 screenshot (120-134 KB WebP → ~163 KB base64) for vision processing.
Prompt cache hit rate: 57% (4/7 runs). Cache misses add ~800ms to inference and cost 8× more ($0.24 vs $0.03 per query).
Code: System prompt built at ChatProvider.swift:857, cached in cachedMainSystemPrompt. Same prompt used for main chat and floating bar — no differentiation.
2. Quota Check — 0.7-2.6s (11-23% of time)
Sequential blocking await before any query is sent:
// AgentBridge.swift:422
if let quota = await APIClient.shared.fetchChatUsageQuota(), !quota.allowed {
throw BridgeError.quotaExceeded(...)
}
Endpoint: GET api.omi.me/v1/users/me/usage-quota
Not parallelized with screenshot capture, JSON serialization, or anything else. Pure serial wait.
3. Save/Sync — 0.5-2.1s (7-18% of time)
After LLM response is already rendered to the user:
POST api.omi.me/v2/desktop/messages (response persistence)
- AgentSync push (Firebase)
- PostHog event tracking
- GoalsAI progress check
Code: ChatProvider.swift:2523-2570
Current vs Optimal Pipeline
CURRENT (sequential):
Screenshot → Send → [WAIT quota 1.7s] → [WAIT LLM 5.7s] → [WAIT save 0.9s] → Done
8.6s avg
OPTIMAL (parallel + slim prompt):
Screenshot ──┐
Quota check ─┤ (parallel)
└─► [LLM ~3-4s with slim prompt] → Done
└─► Save (background)
3-4s estimated
Proposed Fixes
| # |
Change |
Savings |
Effort |
Code Location |
| 1 |
Optimistic/cached quota check |
1.0-1.8s |
Low |
AgentBridge.swift:422 |
| 2 |
Background save/sync |
0.5-1.5s |
Low |
ChatProvider.swift:2523-2570 |
| 3 |
Slim floating bar prompt (drop schema, skills) |
1.0-2.0s |
Medium |
ChatProvider.swift:857 |
| 4 |
Half-resolution screenshot (960×540) |
0.5-1.0s |
Low |
ScreenCaptureManager.swift |
| Total |
|
3.0-6.3s |
|
|
Estimated new time: ~2-4s (down from 6.6-11.1s)
Sub-second requires on-device vision (Apple Vision + CoreML) or pre-computed responses — not achievable with cloud LLM.
Token Usage & Cost
| Run |
Total Tokens |
CacheRead |
CacheWrite |
Cost |
| 1 |
36,759 |
0 |
36,692 |
$0.231 |
| 2 |
38,388 |
0 |
38,331 |
$0.241 |
| 3 |
40,004 |
38,331 |
1,629 |
$0.030 |
| 4 |
41,613 |
0 |
41,576 |
$0.261 |
| 5 |
43,227 |
41,576 |
1,609 |
$0.032 |
| 6 |
44,843 |
43,185 |
1,614 |
$0.033 |
| 7 |
46,449 |
44,799 |
1,616 |
$0.033 |
Cache miss: ~$0.24/query. Cache hit: ~$0.03/query (8× cheaper).
Raw Logs
Run 1 — Cold start (9.5s)
[15:31:32.827] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 120 KB
[15:31:32.863] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:31:32.864] [app] ChatProvider loaded 50 memories from local DB
[15:31:32.865] [app] ChatProvider loaded 4 goals from local DB
[15:31:32.866] [app] ChatProvider loaded 8 tasks for context
[15:31:32.867] [app] ChatProvider loaded AI profile (generated 2026-04-23 12:59:33 +0000)
[15:31:32.868] [app] ChatProvider loaded schema for 45 tables
[15:31:32.869] [app] AgentBridge: starting with node=...node (exists=true), bridge=...index.js (exists=true)
[15:31:32.920] [app] AgentBridge stderr: [agent] Bridge main() starting (pid=55280, node=v22.14.0)
[15:31:32.920] [app] AgentBridge stderr: [agent] Harness mode: acp
[15:31:32.922] [app] AgentBridge: bridge ready (sessionId=)
[15:31:32.926] [app] ChatProvider: prompt built — schema: yes, goals: 4, tasks: 8, ai_profile: yes, memories: 50, history: none, prompt_length: 27387 chars
[15:31:32.927] [app] ChatProvider: prompt breakdown — base_template:9542c, context:2619c, goals:183c, tasks:601c, ai_profile:2162c, schema:12280c
[15:31:32.928] [app] AgentBridge stderr: [agent] Warmup requested (cwd=default, sessions=main, floating)
[15:31:33.011] [app] AgentBridge stderr: [agent] ACP initialized
[15:31:34.492] [app] APIClient: Quota plan=Operator unit=questions used=277.0 limit=500.0 allowed=true
[15:31:34.494] [app] AgentBridge stderr: [agent] Query mode: act
[15:31:35.181] [app] AgentBridge stderr: [agent] Pre-warmed session: d8343433-... (key=floating, model=claude-sonnet-4-6)
[15:31:35.212] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:31:41.781] [app] AgentBridge stderr: [agent] Usage: model=claude-sonnet-4-6, cost=$0.23094, cacheWrite=36692, cacheRead=0, total=36759
[15:31:41.781] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:31:42.347] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 2 — Warm bridge, cache miss (9.0s)
[15:36:55.356] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 122 KB
[15:36:55.492] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:36:56.707] [app] APIClient: Quota plan=Operator unit=questions used=279.0 limit=500.0 allowed=true
[15:36:56.708] [app] AgentBridge stderr: [agent] Query mode: act
[15:36:56.708] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:37:02.624] [app] AgentBridge stderr: [agent] Usage: cost=$0.24093, cacheWrite=38331, cacheRead=0, total=38388
[15:37:02.625] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:37:04.342] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 3 — Cache hit (7.9s)
[15:38:05.801] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 124 KB
[15:38:05.934] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:38:07.819] [app] APIClient: Quota plan=Operator unit=questions used=281.0 limit=500.0 allowed=true
[15:38:07.820] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:38:13.062] [app] AgentBridge stderr: [agent] Usage: cost=$0.03039, cacheWrite=1629, cacheRead=38331, total=40004
[15:38:13.062] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:38:13.719] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 4 — Cache miss (7.7s)
[15:43:51.787] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 130 KB
[15:43:51.924] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:43:53.686] [app] APIClient: Quota plan=Operator unit=questions used=283.0 limit=500.0 allowed=true
[15:43:53.688] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:43:59.012] [app] AgentBridge stderr: [agent] Usage: cost=$0.26072, cacheWrite=41576, cacheRead=0, total=41613
[15:43:59.012] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:43:59.511] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 5 — Cache hit (8.1s)
[15:45:00.514] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 130 KB
[15:45:00.652] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:45:02.446] [app] APIClient: Quota plan=Operator unit=questions used=285.0 limit=500.0 allowed=true
[15:45:02.447] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:45:07.212] [app] AgentBridge stderr: [agent] Usage: cost=$0.03183, cacheWrite=1609, cacheRead=41576, total=43227
[15:45:08.591] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 6 — Worst case (11.1s)
[15:47:55.272] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 134 KB
[15:47:55.274] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:47:57.882] [app] APIClient: Quota plan=Operator unit=questions used=287.0 limit=500.0 allowed=true
[15:47:57.883] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:48:04.333] [app] AgentBridge stderr: [agent] Usage: cost=$0.03272, cacheWrite=1616, cacheRead=43185, total=44843 [CACHE MISS]
[15:48:04.333] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:48:06.386] [app] PostHog: Tracked event 'chat_agent_query_completed'
Run 7 — Best case (6.6s)
[15:49:44.856] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 134 KB
[15:49:44.857] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:49:45.571] [app] APIClient: Quota plan=Operator unit=questions used=289.0 limit=500.0 allowed=true
[15:49:45.572] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:49:50.953] [app] AgentBridge stderr: [agent] Usage: cost=$0.03329, cacheWrite=1616, cacheRead=44799, total=46449
[15:49:50.953] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:49:51.419] [app] PostHog: Tracked event 'chat_agent_query_completed'
Tested by ren (AI agent) for @beastoin — 2026-04-23
Summary
Floating bar chat takes 6.6-11.1s (avg 8.6s) to respond to "what do you see on my screen?" queries. Tested 7 runs on Omi Beta v0.11.358 (production build, Mac Mini M4, ACP/Claude OAuth).
Sub-second is the UX target. Current architecture makes 3-4s achievable with quick wins; sub-second requires on-device vision.
Test Setup
Results
Pipeline Waterfall (best case — Run 7, 6.6s)
3 Bottlenecks
1. LLM Inference — 5.2-6.6s (82% of best-case time)
System prompt is ~36K tokens:
Plus 1920×1080 screenshot (120-134 KB WebP → ~163 KB base64) for vision processing.
Prompt cache hit rate: 57% (4/7 runs). Cache misses add ~800ms to inference and cost 8× more ($0.24 vs $0.03 per query).
Code: System prompt built at
ChatProvider.swift:857, cached incachedMainSystemPrompt. Same prompt used for main chat and floating bar — no differentiation.2. Quota Check — 0.7-2.6s (11-23% of time)
Sequential blocking
awaitbefore any query is sent:Endpoint:
GET api.omi.me/v1/users/me/usage-quotaNot parallelized with screenshot capture, JSON serialization, or anything else. Pure serial wait.
3. Save/Sync — 0.5-2.1s (7-18% of time)
After LLM response is already rendered to the user:
POST api.omi.me/v2/desktop/messages(response persistence)Code:
ChatProvider.swift:2523-2570Current vs Optimal Pipeline
Proposed Fixes
AgentBridge.swift:422ChatProvider.swift:2523-2570ChatProvider.swift:857ScreenCaptureManager.swiftEstimated new time: ~2-4s (down from 6.6-11.1s)
Sub-second requires on-device vision (Apple Vision + CoreML) or pre-computed responses — not achievable with cloud LLM.
Token Usage & Cost
Cache miss: ~$0.24/query. Cache hit: ~$0.03/query (8× cheaper).
Raw Logs
Run 1 — Cold start (9.5s)
Run 2 — Warm bridge, cache miss (9.0s)
Run 3 — Cache hit (7.9s)
Run 4 — Cache miss (7.7s)
Run 5 — Cache hit (8.1s)
Run 6 — Worst case (11.1s)
Run 7 — Best case (6.6s)
Tested by ren (AI agent) for @beastoin — 2026-04-23