Cohere
History
Founding and Early Years
Cohere was founded in 2019 in Toronto, Canada, by Aidan Gomez, Nick Frosst, and Ivan Zhang, all of whom are alumni of the University of Toronto and former researchers at Google Brain.[6][10] The company's initial focus centered on developing large language models tailored for enterprise applications, prioritizing natural language processing capabilities such as text generation, classification, and semantic search to address business-specific needs like security, privacy, and customization.[2][6] Aidan Gomez serves as CEO and is recognized as a co-inventor of the Transformer architecture for co-authoring the 2017 Google Brain paper "Attention Is All You Need," which introduced the architecture that underpins subsequent advancements in AI models.[11] Nick Frosst and Ivan Zhang, who hold roles as co-founders, contributed expertise in AI research from their Google tenure, with the trio motivated by the potential of scalable intelligence to enhance human productivity without the consumer-oriented hype of general-purpose chatbots.[12][13] In its formative period from 2019 to 2021, Cohere operated leanly from Toronto, refining proprietary models through internal research before public release. The company secured its Series A funding in September 2021, enabling infrastructure scaling.[6] This preceded the November 2021 launch of its core API, which provided developers access to foundational models and saw an 800% usage surge by the following funding round, signaling early market validation for enterprise-grade AI tools.[6]Key Milestones and Expansion
Cohere was founded in 2019 in Toronto, Canada, by Aidan Gomez, Nick Frosst, and Ivan Zhang, former researchers at Google Brain and the Vector Institute, with an initial focus on developing large language models for enterprise applications.[2][6] In November 2021, the company publicly launched its API, enabling developers to access its natural language processing models such as Generate and Embed, which facilitated early adoption in search and classification tasks.[6] This was followed by a Series A funding round in September 2021, raising $40 million led by Index Ventures to support model scaling and team growth.[14] Subsequent milestones included the April 2022 Series B round of $125 million, which accelerated product development, and the launch of Cohere Labs in 2022 as an open science initiative fostering community-driven research with over 4,500 members and more than 100 papers published.[2] A Series C round in April 2023 raised $270 million, enabling further infrastructure investments.[10] By July 2024, Cohere completed a $500 million Series D at a $5.5 billion valuation, funding advancements in secure, enterprise-grade AI.[10] In August 2025, the company released its North platform in general availability, a workspace tool for agentic AI workflows designed to handle sensitive data securely.[15] Cohere's expansion has emphasized global operations and talent acquisition, starting with offices in Toronto (headquarters), San Francisco, New York, and London.[2] In July 2025, it opened a Montreal office in partnership with Mila to leverage local AI expertise, followed by a Seoul hub to target Asia-Pacific markets and a Paris office in September 2025 as its EMEA base.[16][17][18] These moves supported team scaling amid funding, with the September 2025 addition of $100 million to its latest round pushing valuation to $7 billion and prioritizing sovereign AI deployments.[19][20]Recent Developments
In 2024, Cohere released Command R+ in April, a 104-billion-parameter model optimized for retrieval-augmented generation (RAG) tasks in enterprise settings.[10] Later that year, on October 24, the company launched two open-weight models under its Aya initiative to enhance performance in non-English languages, addressing gaps in multilingual AI capabilities.[21] In November, updated versions of Command R and R+ (08-2024) became available on Oracle Cloud Infrastructure, improving capabilities for enterprise deployments.[22] Early 2025 saw further advancements, including the March 3 release of Aya Vision, a multimodal model supporting non-commercial vision-language tasks.[23] In August, Cohere introduced Command A Vision, a efficient vision model runnable on two GPUs, outperforming larger models on enterprise document analysis like graphs and PDFs.[24] By September 29, the company debuted Command A (03-2025), its most performant chat model to date, with superior throughput compared to prior versions.[25] On the business front, Cohere secured a $500 million Series D round in July 2024, valuing the company at $6.8 billion and funding global expansion and secure AI development.[26] In September 2025, it added $100 million in a second close to that round, pushing valuation to $7 billion and supporting security-focused enterprise AI scaling.[20] That same month, Cohere expanded its collaboration with AMD to deploy Instinct GPUs for enterprise and sovereign AI infrastructure.[27] The company also announced plans for new offices in South Korea and Montréal, alongside C-suite hires from Uber and Meta to bolster operations.[28] In October 2025, Cohere launched its Partner Program to accelerate enterprise AI adoption through ecosystem collaborations.[29] CEO Aidan Gomez indicated preparations for an initial public offering "soon," reflecting maturing market position amid annualized revenue growth to approximately $35 million by early 2025.[30] [10] However, the departure of former VP of AI Research Sara Hooker, who founded a startup critiquing scaling-heavy approaches, highlighted internal debates on AI development strategies.[31] In March 2026, Cohere entered the speech recognition domain with the launch of Cohere Transcribe, an open-source ASR model that claimed the top spot on the Hugging Face Open ASR Leaderboard with a 5.42% average WER, outperforming leading alternatives and marking the company's expansion into voice AI technologies.[32][33] In 2025, Cohere achieved approximately $240 million in annual recurring revenue (ARR), surpassing its $200 million target and highlighting strong demand for its enterprise-focused AI solutions.Technology and Products
Core Technologies
Cohere's core technologies revolve around proprietary large language models (LLMs) tailored for enterprise-scale applications, focusing on generation, semantic understanding, and retrieval optimization. These include the Command family of generative models, which enable tasks such as reasoning, translation, and retrieval-augmented generation (RAG) with support for up to 256,000 input tokens and 8,000 output tokens in advanced variants like Command A.[3][34] Multimodal extensions, such as Command A Vision, process both text and images to support vision-language tasks.[3] Complementing generation capabilities, Cohere's embedding models, exemplified by Embed v4.0, convert text and images into dense vector representations ranging from 256 to 1,536 dimensions, enabling semantic search, clustering, and classification with contexts up to 128,000 tokens and compatibility with similarity metrics like cosine or Euclidean distance.[3][35] Reranking models, such as Rerank v3.5 and multilingual variants supporting over 100 languages, further refine retrieval results in RAG applications by scoring and reordering documents based on semantic relevance to the query within a 4,096-token context per document, thereby improving accuracy and reducing hallucinations in generated responses. Implementation considerations include integrating the API after initial retrieval, selecting between English and multilingual model variants, and optimizing parameters like the number of top results. These components enhance precision for enterprise search and recommendation systems.[3][36][37] These components are instruction-tuned and preference-trained, incorporating techniques like model merging to adapt to specific domains without full retraining.[3] Research-driven innovations underpin these technologies, including multilingual advancements via the Aya models, which span 101 languages through synthetic data optimization and collaborative training involving over 3,000 researchers.[38] Efficiency methods like EAGER, a training-free inference technique, reduce computational overhead by 65% while boosting performance by 37% on metrics such as Pass@k, and FusioN, which synthesizes responses to outperform traditional best-of-N sampling across multiple languages and tasks.[38] Entropy-aware generation and universal tokenizers enhance adaptability, increasing language plasticity by 20.2%.[38] For enterprise deployment, these technologies support customization through fine-tuning on proprietary datasets, alongside secure features like private virtual private clouds (VPCs) and on-premises options to maintain data sovereignty.[39]Model Offerings
Cohere offers a range of AI models optimized for enterprise applications, including generative language models, embedding models for semantic search, reranking models for relevance refinement, multilingual models, and automatic speech recognition (ASR) models. These models support tasks such as text generation, retrieval-augmented generation (RAG), tool use, translation, image processing, and speech transcription, with varying context lengths from 4,000 to 256,000 tokens and multilingual capabilities across up to 101 languages.[3] The company's models emphasize efficiency, scalability, and integration into secure enterprise environments, often prioritizing low-latency performance over raw parameter scale.[39] The flagship Command family consists of instruction-following large language models designed for conversational interactions, reasoning, and long-context tasks. Command R, a 35-billion-parameter model released in early 2024, excels in RAG, summarization, and external API calling with a 128,000-token context window.[40] [3] Its successor, Command R+ (August 2024), enhances nuanced responses and multilingual support for 10 languages, supporting up to 256,000 tokens and optimized for complex enterprise workflows like multi-step reasoning.[41] [42] Recent variants include Command A (March 2025), a cost-efficient model runnable on two GPUs for business tasks, and specialized iterations like Command A Reasoning (August 2025) for advanced logical processing and Command A Vision (July 2025) for multimodal text-and-image inputs.[43] [3] Deprecated earlier versions, such as Command Light, have been phased out in favor of these higher-performing options.[3] The Rerank API endpoint enables integration into production RAG systems, facilitating verifiable and source-grounded responses by prioritizing relevant documents and reducing hallucinations in AI applications. It is priced at $2.00 per 1,000 searches (each search ranks up to 100 documents; longer documents are split into chunks).[37][44] For multilingual applications, the Aya family provides open-access models covering 23 to 101 languages, with instruction-following capabilities outperforming baselines like mT0 in non-English tasks. Aya 23 (2024) focuses on generative tasks in underrepresented languages, while Expanse variants (e.g., 8B and 32B parameters, October 2024) extend to 128,000-token contexts; Aya Vision (March 2025) adds multimodal image understanding and translation.[4] [45] [46] In February 2026, Cohere launched Tiny Aya, a family of open-weight multilingual models supporting over 70 languages, designed for edge devices with offline capabilities, announced at the India AI Summit.[38] These models are released via Cohere for AI's research lab, emphasizing accessibility for global language coverage.[47] Cohere Transcribe marks Cohere's entry into automatic speech recognition. Released in March 2026 as an open-source model (cohere-transcribe-03-2026), it is a 2-billion-parameter Conformer-based encoder-decoder architecture with a lightweight Transformer decoder. Trained from scratch via supervised cross-entropy on log-Mel spectrograms from audio waveforms, it supports transcription in 14 languages: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Arabic, Chinese, Japanese, Korean, and Vietnamese. Cohere Transcribe achieved the #1 position on Hugging Face's Open ASR Leaderboard with an average word error rate (WER) of 5.42% across English and multilingual evaluations, outperforming OpenAI's Whisper Large v3 (7.44% WER), ElevenLabs Scribe v2, and Qwen3-ASR-1.7B. It offers up to 3x faster real-time factors than comparable models, automatic chunking for long-form audio (>35 seconds), and excels on business audio like multi-speaker meetings, diverse accents, and boardroom settings. Licensed under Apache 2.0, it is available on Hugging Face for integration with the Transformers library (manual language specification required) but lacks built-in noise filtering, diarization, timestamps, or auto language detection. Human evaluations showed a 61% win rate over alternatives for accuracy, coherence, and usability.[32][48][33]API Pricing
Cohere's API employs a pay-as-you-go model primarily based on tokens processed for generative models, with separate pricing for specialized tools like embeddings and reranking.Generative Models (Command Series)
Pricing per 1 million tokens (as of March 2026):- Command R+ (including 08-2024 variant) and Command A: Input $2.50, Output $10.00. These are flagship models for advanced enterprise tasks, agentic workflows, and multilingual support.
- Command R (08-2024): Input $0.15, Output $0.60. Balanced model suitable for production applications like chatbots, summarization, and RAG.
- Command R7B (12-2024): Input $0.0375, Output $0.15. Highly efficient small model for high-volume, simple tasks.
Embeddings
- Embed 4: $0.12 per 1M tokens for text; $0.47 per 1M for image tokens (multimodal support with 1,536 dimensions).
Rerank
- Rerank 3.5 / 4: $2.00 per 1,000 searches (each search ranks up to 100 documents; longer documents split into chunks).