Real pipelines you can clone, run, and bend to your data. Each example is production-wired — one source, a declarative flow, a live target. Pick the one closest to what you need and change the parts that don't fit.
The most complete end-to-end example — a custom source, LLM extraction, and a queryable target.
Pulls every new HackerNews thread, extracts topics with an LLM, and keeps a Postgres index continuously fresh. Custom source + live mode = 92% fewer API calls after the first sync.
The cleanest "hello world" for CocoIndex + embeddings — index markdown, query it with natural language.
Walk a repo, split by syntax, embed, and query your codebase in English. Real-time RAG for code.
Extract metadata, chunk and embed abstracts, enable semantic + author-based search over academic PDFs.
Bring your own source, target, or parser. Same declarative flow.
See all 3 →Use an existing Postgres table as a CocoIndex source. AI transforms + data mappings flow into pgvector.
Treat any API as a first-class incremental source. A custom HN connector that stays in sync with Postgres.
Export markdown files to local HTML using a custom target. The simplest file-to-file pipeline shape.
Turn loose prose into structured data with LLMs, BAML, DSPy, or Ollama.
See all 5 →Extract structured data from the Python manual markdowns with a local Ollama model.
Extract nested structured data from patient intake forms with field-level transformation and data mapping.
BAML as the typed contract between LLM and code. Same intake problem, stronger guarantees.
DSPy-style prompt programming on vision models. Compare the ergonomics to the BAML variant side by side.
Bring your own parser. Google Document AI extracts, CocoIndex embeds and stores for semantic search.
Give agents a persistent, graph-shaped memory from conversations, meetings, products.
See all 3 →Build live knowledge for agents from documentation — incremental triple extraction with LLMs.
Turn Google Drive meeting notes into an automatically updating Neo4j knowledge graph.
Real-time recommendation engine — product taxonomy understanding via LLM, stored in a graph database.
ColPali embeddings served behind a FastAPI endpoint. Page-level multi-vector image search.
CLIP embeddings over a folder of images. Query by text or reference image.
ColPali over PDFs, images, academic papers, and slides — mixed together in the same vector space, no OCR.
Extract, embed, and index both text and images from PDFs — SentenceTransformers + CLIP in one vector space.
Detect, extract, and embed faces from photos. Export to a vector DB for face similarity queries.
Clone the closest example, swap the source or the target, and keep the rest. Or request a new example — we ship the ones developers ask for.