TimeSeries: batch writes per shard to reduce transaction overhead

## Idea

The InfluxDB Line Protocol HTTP handler (`POST /api/v1/ts/{db}/write`) currently calls `engine.appendSamples()` once per sample in the incoming batch. Each call creates its own internal nested transaction — so a batch of N samples produces N transaction begin/commit cycles executed sequentially.

## Problem

For a typical batch of 5,000 samples this means 5,000 sequential TX cycles on the server side, even though the data could be grouped by target shard and written far more efficiently.

## Proposed optimisation

1. **Group by shard up front** — before writing anything, assign each sample to its target shard via the existing round-robin counter and collect the sample indices per shard.
2. **One transaction per shard** — instead of N transactions, open one nested transaction per shard and write all of that shard's samples in a single commit. For 32 shards and 5,000 samples this reduces TX cycles from 5,000 to at most 32 (one per active shard).
3. **Parallel shard writes** — dispatch each shard's write to the existing `shardExecutor` thread pool so all active shards write concurrently. The existing per-shard `appendLock` already guarantees that concurrent HTTP requests to the same shard are serialised without MVCC conflicts.

The change lives in two places:
- A new `appendBatch(long[] allTimestamps, Object[][] allColumnValues)` method on `TimeSeriesEngine` that implements the grouping and parallel dispatch.
- The HTTP handler groups samples by measurement type and calls `appendBatch` once per measurement instead of once per sample.

## Expected impact

Benchmarks on a MacBook M5 Pro (8 GB JVM heap, 8 Python workers, batch size 5,000, localhost):

| Scenario | Avg throughput |
|---|---|
| Current (1 TX per sample) | ~60,000 metrics/sec |
| Batched (1 TX per shard, parallel) | ~433,000 metrics/sec |

~7× improvement with the same client configuration. With batch size 20,000 the gain reaches ~460,000 metrics/sec. No client-side changes are needed — the optimisation is fully transparent to callers of the HTTP endpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TimeSeries: batch writes per shard to reduce transaction overhead #3862

Idea

Problem

Proposed optimisation

Expected impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scenario	Avg throughput
Current (1 TX per sample)	~60,000 metrics/sec
Batched (1 TX per shard, parallel)	~433,000 metrics/sec

Uh oh!

TimeSeries: batch writes per shard to reduce transaction overhead #3862

Description

Idea

Problem

Proposed optimisation

Expected impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions