llm-evals

Because we should all have our own set of LLM evals. Blog post

Installation

brew install just gitleaks
just install

Run them all:

just eval-all

Run a specific one:

just eval CONFIG

where CONFIG is "social-media-insults" for example.

To view the dashboard (the version published at https://kschaul.com/llm-evals/):

just dev

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github/workflows		.github/workflows
logs		logs
src		src
tests		tests
.env		.env
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
Justfile		Justfile
README.md		README.md
agentic.py		agentic.py
cleanup_old_logs.py		cleanup_old_logs.py
extract_results.py		extract_results.py
observablehq.config.js		observablehq.config.js
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock