⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

Model library

Leading open models, ready for production

Browse and compare a growing library of models available on Together AI

Abstract 3D geometric shapes consisting of intersecting blue, purple, and orange discs and planes on a white background.

Abstract blue and purple gradient shapes on a light blue background.

Deployment options

Run models using different deployment options depending on latency needs, traffic patterns, and infrastructure control.

Serverless
Inference

Serverless Inference

Real-time

A fully managed inference API that automatically scales with request volume.

Best for

Variable or unpredictable traffic

Rapid prototyping and iteration

Cost-sensitive or early-stage production workloads

Batch

Process massive workloads of up to 30 billion tokens asynchronously, at up to 50% less cost.

Best for

Classifying large datasets

Offline summarization

Synthetic data generation

Dedicated Inference

Dedicated Model Inference

An inference endpoint backed by reserved, isolated compute resources and the Together AI inference engine.

Best for

Predictable or steady traffic

Latency-sensitive applications

High-throughput production workloads

Dedicated Container Inference

Run inference with your own engine and model on fully-managed, scalable infrastructure.

Best for

Generative media models

Non-standard runtimes

Custom inference pipelines

Explore model providers

Leading model providers rely on Together AI infrastructure to deploy,  scale, and run their models in production.

5 models

Deepgram

5 models

Wan-AI

5 models

Meta

5 models

Qwen

5 models

Black Forest Labs

5 models

Google

5 models

Mistral AI

5 models

Arcee AI

5 models

Deep Cogito

5 models

DeepSeek

5 models

Minimax AI

5 models

OpenAI

5 models

Kuaishou

5 models

ByteDance

5 models

SCB10X

5 models

Moonshot AI

5 models

Rime

5 models

ZAI

5 models

Alibaba

5 models

HiDream.ai

5 models

NVIDIA

5 models

Stability AI

5 models

Together AI

5 models

BAAI

5 models

BERT

5 models

Cartesia

5 models

Gryphe

5 models

LG AI Research

5 models

Refuel

5 models

RunDiffusion

5 models

ServiceNow AI

5 models

Vidu

5 models

Agentica & Together AI

5 models

Alibaba-NLP

5 models

Canopy Labs

5 models

DataBricks

5 models

Essential AI

5 models

Ideogram

5 models

Liquid AI

5 models

Lykon