Because we should all have our own set of LLM evals. Blog post
brew install just gitleaks
just install
Run them all:
just eval-all
Run a specific one:
just eval CONFIG
where CONFIG is "social-media-insults" for example.
To view the dashboard (the version published at https://kschaul.com/llm-evals/):
just dev