Skip to content

DEV Community

# evaluation

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Apr 20

When "Slow Thinking" Is Just "Slow Talking"

#ai #machinelearning #llm #evaluation

3 min read

EClawbot Official

Apr 15

What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions

#ai #agents #benchmarks #evaluation

3 min read

ThomasP

Apr 8

LLM-as-Judge: using Claude to review a Gemini agent

#ai #llm #agents #evaluation

7 min read

Apr 4

The Evaluation Gap: Why We Dont Know If Agents Are Getting Better

#ai #agents #evaluation #engineering

2 min read

Josh T

Apr 17

Origin Part 3: The Teacher Was Scoring It Wrong

#aitraining #genesisframework #olt1 #evaluation

9 min read

kasi viswanath vandanapu

Apr 1

SQL Comparison Library Architecture

#sql #ai #evaluation #llm

14 min read

Mar 31

Building an LLM Judge That Doesn't Lie to You

#ai #evaluation #testing #machinelearning

8 min read

kasi viswanath vandanapu

Mar 30

Build a Production‑Ready SQL Evaluation Engine for LLMs

#sql #llm #evaluation #python

5 min read

Mar 30

Beyond Text: How We Built an Evaluation Framework for Multi-File AI Outputs

#ai #evaluation #testing #webdev

8 min read

Alina Trofimova

Mar 19

Evaluating Vendor Offerings: A Structured Approach to Identify High-Quality, Compatible Tools at Conferences

#devops #kubecon #evaluation #kubernetes

13 min read

Ultra Dune

Mar 17

EVAL #006: LLM Evaluation Tools — RAGAS vs DeepEval vs Braintrust vs LangSmith vs Arize Phoenix

#llm #evaluation #ai #machinelearning

10 min read

Serhii Panchyshyn

Apr 13

No Evals, No Idea. How 40% of RAG Answers Go Wrong.

#ai #rag #production #evaluation

5 min read

Ritwika Kancharla

Mar 3

Building an LLM Evaluation Framework That Actually Works

#evaluation #llm #ai

7 min read

Feb 22

Evals Aren’t a One-Time Report: Build a Living Test Suite That Ships With Every Release.

#llm #ai #evaluation

6 min read

Jamie Gray

Mar 23

How I Approach Evaluation When Building AI Features

#ai #machinelearning #testing #evaluation

6 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.