Floating bar chat latency: 6.6-11.1s per query — bottleneck analysis with logs

## Summary

Floating bar chat takes **6.6-11.1s** (avg 8.6s) to respond to "what do you see on my screen?" queries. Tested 7 runs on Omi Beta v0.11.358 (production build, Mac Mini M4, ACP/Claude OAuth).

Sub-second is the UX target. Current architecture makes **3-4s achievable** with quick wins; sub-second requires on-device vision.

## Test Setup

- **App:** Omi Beta v0.11.358 (production)
- **Query:** "what do you see on my screen?" × 7 runs via floating bar
- **Provider:** ACP (Claude OAuth) → claude-sonnet-4-6
- **Machine:** Mac Mini M4, 24GB RAM, macOS Tahoe
- **System prompt:** 27,387 chars (~36K tokens)

## Results

| Run | Total | Screenshot | Quota Check | Session | LLM API | Save/Sync | Cache |
|-----|-------|-----------|-------------|---------|---------|-----------|-------|
| 1 | 9,520ms | 36ms | 1,629ms | 689ms | 6,600ms | 566ms | MISS |
| 2 | 8,986ms | 136ms | 1,215ms | 1ms | 5,917ms | 1,717ms | MISS |
| 3 | 7,918ms | 133ms | 1,885ms | 1ms | 5,242ms | 657ms | HIT |
| 4 | 7,724ms | 137ms | 1,762ms | 2ms | 5,324ms | 499ms | MISS |
| 5 | 8,077ms | 138ms | 1,794ms | 1ms | — | — | HIT |
| 6 | 11,114ms | 2ms | 2,608ms | 1ms | 6,450ms | 2,053ms | HIT |
| 7 | 6,563ms | 1ms | 714ms | 1ms | 5,381ms | 466ms | HIT |
| **AVG** | **8,557ms** | **83ms** | **1,658ms** | **99ms** | **4,987ms** | **851ms** | |

## Pipeline Waterfall (best case — Run 7, 6.6s)

```
T+0ms     Screenshot capture (CGDisplayCreateImage → WebP 134KB)
T+1ms     Query dispatched to ChatProvider.sendMessage()
T+1ms     AgentBridge.query() → await fetchChatUsageQuota()  ← BLOCKS HERE
T+715ms   Quota OK → JSON serialized (base64 image + 27K system prompt)
T+716ms   Sent to Node.js bridge via stdin pipe
T+716ms   ACP session reused (key=floating, pre-warmed)
T+716ms   → Claude Sonnet 4.6 API call starts  ← BLOCKS HERE
T+6097ms  LLM response complete (5,381ms inference)
T+6097ms  Save to backend + Firebase sync + analytics
T+6563ms  DONE
```

## 3 Bottlenecks

### 1. LLM Inference — 5.2-6.6s (82% of best-case time)

System prompt is ~36K tokens:

| Component | Size | Needed for floating bar? |
|-----------|------|------------------------|
| base_template | 9,542 chars | Yes (persona, instructions) |
| schema | 12,280 chars (45 tables) | **No** — floating bar doesn't use SQL tools |
| context/memories | 2,619 chars | Partial |
| ai_profile | 2,162 chars | Yes |
| tasks | 601 chars | Maybe |
| goals | 183 chars | Maybe |

Plus 1920×1080 screenshot (120-134 KB WebP → ~163 KB base64) for vision processing.

Prompt cache hit rate: 57% (4/7 runs). Cache misses add ~800ms to inference and cost 8× more ($0.24 vs $0.03 per query).

**Code:** System prompt built at `ChatProvider.swift:857`, cached in `cachedMainSystemPrompt`. Same prompt used for main chat and floating bar — no differentiation.

### 2. Quota Check — 0.7-2.6s (11-23% of time)

Sequential blocking `await` before any query is sent:

```swift
// AgentBridge.swift:422
if let quota = await APIClient.shared.fetchChatUsageQuota(), !quota.allowed {
    throw BridgeError.quotaExceeded(...)
}
```

Endpoint: `GET api.omi.me/v1/users/me/usage-quota`

Not parallelized with screenshot capture, JSON serialization, or anything else. Pure serial wait.

### 3. Save/Sync — 0.5-2.1s (7-18% of time)

After LLM response is already rendered to the user:
- `POST api.omi.me/v2/desktop/messages` (response persistence)
- AgentSync push (Firebase)
- PostHog event tracking
- GoalsAI progress check

**Code:** `ChatProvider.swift:2523-2570`

## Current vs Optimal Pipeline

```
CURRENT (sequential):
  Screenshot → Send → [WAIT quota 1.7s] → [WAIT LLM 5.7s] → [WAIT save 0.9s] → Done
                                                                                  8.6s avg

OPTIMAL (parallel + slim prompt):
  Screenshot ──┐
  Quota check ─┤ (parallel)
               └─► [LLM ~3-4s with slim prompt] → Done
                                                    └─► Save (background)
                                                    3-4s estimated
```

## Proposed Fixes

| # | Change | Savings | Effort | Code Location |
|---|--------|---------|--------|---------------|
| 1 | Optimistic/cached quota check | 1.0-1.8s | Low | `AgentBridge.swift:422` |
| 2 | Background save/sync | 0.5-1.5s | Low | `ChatProvider.swift:2523-2570` |
| 3 | Slim floating bar prompt (drop schema, skills) | 1.0-2.0s | Medium | `ChatProvider.swift:857` |
| 4 | Half-resolution screenshot (960×540) | 0.5-1.0s | Low | `ScreenCaptureManager.swift` |
| **Total** | | **3.0-6.3s** | | |

**Estimated new time: ~2-4s** (down from 6.6-11.1s)

**Sub-second** requires on-device vision (Apple Vision + CoreML) or pre-computed responses — not achievable with cloud LLM.

## Token Usage & Cost

| Run | Total Tokens | CacheRead | CacheWrite | Cost |
|-----|-------------|-----------|------------|------|
| 1 | 36,759 | 0 | 36,692 | $0.231 |
| 2 | 38,388 | 0 | 38,331 | $0.241 |
| 3 | 40,004 | 38,331 | 1,629 | $0.030 |
| 4 | 41,613 | 0 | 41,576 | $0.261 |
| 5 | 43,227 | 41,576 | 1,609 | $0.032 |
| 6 | 44,843 | 43,185 | 1,614 | $0.033 |
| 7 | 46,449 | 44,799 | 1,616 | $0.033 |

Cache miss: ~$0.24/query. Cache hit: ~$0.03/query (8× cheaper).

## Raw Logs

<details>
<summary>Run 1 — Cold start (9.5s)</summary>

```
[15:31:32.827] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 120 KB
[15:31:32.863] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:31:32.864] [app] ChatProvider loaded 50 memories from local DB
[15:31:32.865] [app] ChatProvider loaded 4 goals from local DB
[15:31:32.866] [app] ChatProvider loaded 8 tasks for context
[15:31:32.867] [app] ChatProvider loaded AI profile (generated 2026-04-23 12:59:33 +0000)
[15:31:32.868] [app] ChatProvider loaded schema for 45 tables
[15:31:32.869] [app] AgentBridge: starting with node=...node (exists=true), bridge=...index.js (exists=true)
[15:31:32.920] [app] AgentBridge stderr: [agent] Bridge main() starting (pid=55280, node=v22.14.0)
[15:31:32.920] [app] AgentBridge stderr: [agent] Harness mode: acp
[15:31:32.922] [app] AgentBridge: bridge ready (sessionId=)
[15:31:32.926] [app] ChatProvider: prompt built — schema: yes, goals: 4, tasks: 8, ai_profile: yes, memories: 50, history: none, prompt_length: 27387 chars
[15:31:32.927] [app] ChatProvider: prompt breakdown — base_template:9542c, context:2619c, goals:183c, tasks:601c, ai_profile:2162c, schema:12280c
[15:31:32.928] [app] AgentBridge stderr: [agent] Warmup requested (cwd=default, sessions=main, floating)
[15:31:33.011] [app] AgentBridge stderr: [agent] ACP initialized
[15:31:34.492] [app] APIClient: Quota plan=Operator unit=questions used=277.0 limit=500.0 allowed=true
[15:31:34.494] [app] AgentBridge stderr: [agent] Query mode: act
[15:31:35.181] [app] AgentBridge stderr: [agent] Pre-warmed session: d8343433-... (key=floating, model=claude-sonnet-4-6)
[15:31:35.212] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:31:41.781] [app] AgentBridge stderr: [agent] Usage: model=claude-sonnet-4-6, cost=$0.23094, cacheWrite=36692, cacheRead=0, total=36759
[15:31:41.781] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:31:42.347] [app] PostHog: Tracked event 'chat_agent_query_completed'
```
</details>

<details>
<summary>Run 2 — Warm bridge, cache miss (9.0s)</summary>

```
[15:36:55.356] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 122 KB
[15:36:55.492] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:36:56.707] [app] APIClient: Quota plan=Operator unit=questions used=279.0 limit=500.0 allowed=true
[15:36:56.708] [app] AgentBridge stderr: [agent] Query mode: act
[15:36:56.708] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:37:02.624] [app] AgentBridge stderr: [agent] Usage: cost=$0.24093, cacheWrite=38331, cacheRead=0, total=38388
[15:37:02.625] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:37:04.342] [app] PostHog: Tracked event 'chat_agent_query_completed'
```
</details>

<details>
<summary>Run 3 — Cache hit (7.9s)</summary>

```
[15:38:05.801] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 124 KB
[15:38:05.934] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:38:07.819] [app] APIClient: Quota plan=Operator unit=questions used=281.0 limit=500.0 allowed=true
[15:38:07.820] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:38:13.062] [app] AgentBridge stderr: [agent] Usage: cost=$0.03039, cacheWrite=1629, cacheRead=38331, total=40004
[15:38:13.062] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:38:13.719] [app] PostHog: Tracked event 'chat_agent_query_completed'
```
</details>

<details>
<summary>Run 4 — Cache miss (7.7s)</summary>

```
[15:43:51.787] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 130 KB
[15:43:51.924] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:43:53.686] [app] APIClient: Quota plan=Operator unit=questions used=283.0 limit=500.0 allowed=true
[15:43:53.688] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:43:59.012] [app] AgentBridge stderr: [agent] Usage: cost=$0.26072, cacheWrite=41576, cacheRead=0, total=41613
[15:43:59.012] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:43:59.511] [app] PostHog: Tracked event 'chat_agent_query_completed'
```
</details>

<details>
<summary>Run 5 — Cache hit (8.1s)</summary>

```
[15:45:00.514] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 130 KB
[15:45:00.652] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:45:02.446] [app] APIClient: Quota plan=Operator unit=questions used=285.0 limit=500.0 allowed=true
[15:45:02.447] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:45:07.212] [app] AgentBridge stderr: [agent] Usage: cost=$0.03183, cacheWrite=1609, cacheRead=41576, total=43227
[15:45:08.591] [app] PostHog: Tracked event 'chat_agent_query_completed'
```
</details>

<details>
<summary>Run 6 — Worst case (11.1s)</summary>

```
[15:47:55.272] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 134 KB
[15:47:55.274] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:47:57.882] [app] APIClient: Quota plan=Operator unit=questions used=287.0 limit=500.0 allowed=true
[15:47:57.883] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:48:04.333] [app] AgentBridge stderr: [agent] Usage: cost=$0.03272, cacheWrite=1616, cacheRead=43185, total=44843 [CACHE MISS]
[15:48:04.333] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:48:06.386] [app] PostHog: Tracked event 'chat_agent_query_completed'
```
</details>

<details>
<summary>Run 7 — Best case (6.6s)</summary>

```
[15:49:44.856] [app] ScreenCaptureManager: Screenshot captured 1920x1080, WebP 134 KB
[15:49:44.857] [app] PostHog: Tracked event 'floating_bar_query_sent'
[15:49:45.571] [app] APIClient: Quota plan=Operator unit=questions used=289.0 limit=500.0 allowed=true
[15:49:45.572] [app] AgentBridge stderr: [agent] Reusing existing ACP session: d8343433-... (key=floating)
[15:49:50.953] [app] AgentBridge stderr: [agent] Usage: cost=$0.03329, cacheWrite=1616, cacheRead=44799, total=46449
[15:49:50.953] [app] AgentBridge stderr: [agent] Prompt completed: stopReason=end_turn
[15:49:51.419] [app] PostHog: Tracked event 'chat_agent_query_completed'
```
</details>

---
_Tested by ren (AI agent) for @beastoin — 2026-04-23_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating bar chat latency: 6.6-11.1s per query — bottleneck analysis with logs #6981

Summary

Test Setup

Results

Pipeline Waterfall (best case — Run 7, 6.6s)

3 Bottlenecks

1. LLM Inference — 5.2-6.6s (82% of best-case time)

2. Quota Check — 0.7-2.6s (11-23% of time)

3. Save/Sync — 0.5-2.1s (7-18% of time)

Current vs Optimal Pipeline

Proposed Fixes

Token Usage & Cost

Raw Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run	Total	Screenshot	Quota Check	Session	LLM API	Save/Sync	Cache
1	9,520ms	36ms	1,629ms	689ms	6,600ms	566ms	MISS
2	8,986ms	136ms	1,215ms	1ms	5,917ms	1,717ms	MISS
3	7,918ms	133ms	1,885ms	1ms	5,242ms	657ms	HIT
4	7,724ms	137ms	1,762ms	2ms	5,324ms	499ms	MISS
5	8,077ms	138ms	1,794ms	1ms	—	—	HIT
6	11,114ms	2ms	2,608ms	1ms	6,450ms	2,053ms	HIT
7	6,563ms	1ms	714ms	1ms	5,381ms	466ms	HIT
AVG	8,557ms	83ms	1,658ms	99ms	4,987ms	851ms

Component	Size	Needed for floating bar?
base_template	9,542 chars	Yes (persona, instructions)
schema	12,280 chars (45 tables)	No — floating bar doesn't use SQL tools
context/memories	2,619 chars	Partial
ai_profile	2,162 chars	Yes
tasks	601 chars	Maybe
goals	183 chars	Maybe

#	Change	Savings	Effort	Code Location
1	Optimistic/cached quota check	1.0-1.8s	Low	`AgentBridge.swift:422`
2	Background save/sync	0.5-1.5s	Low	`ChatProvider.swift:2523-2570`
3	Slim floating bar prompt (drop schema, skills)	1.0-2.0s	Medium	`ChatProvider.swift:857`
4	Half-resolution screenshot (960×540)	0.5-1.0s	Low	`ScreenCaptureManager.swift`
Total		3.0-6.3s

Run	Total Tokens	CacheRead	CacheWrite	Cost
1	36,759	0	36,692	$0.231
2	38,388	0	38,331	$0.241
3	40,004	38,331	1,629	$0.030
4	41,613	0	41,576	$0.261
5	43,227	41,576	1,609	$0.032
6	44,843	43,185	1,614	$0.033
7	46,449	44,799	1,616	$0.033

Floating bar chat latency: 6.6-11.1s per query — bottleneck analysis with logs #6981

Description

Summary

Test Setup

Results

Pipeline Waterfall (best case — Run 7, 6.6s)

3 Bottlenecks

1. LLM Inference — 5.2-6.6s (82% of best-case time)

2. Quota Check — 0.7-2.6s (11-23% of time)

3. Save/Sync — 0.5-2.1s (7-18% of time)

Current vs Optimal Pipeline

Proposed Fixes

Token Usage & Cost

Raw Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions