Empirical is the fastest way to test different LLMs, prompts and other model configurations, across all the scenarios that matter for your application.
With Empirical, you can:
- Run your test datasets locally against off-the-shelf models
- Test your own custom models and RAG applications (see how-to)
- Reports to view, compare, analyze outputs on a web UI
- Score your outputs with scoring functions
- Run tests on CI/CD
Watch demo video | See all docs
Empirical bundles together a CLI and a web app. The CLI handles running tests and the web app visualizes results.
Everything runs locally, with a JSON configuration file, empiricalrc.json
.
Required: Node.js 20+ needs to be installed on your system.
In this example, we will ask an LLM to parse user messages to extract entities and
give us a structured JSON output. For example, "I'm Alice from Maryland" will
become "{name: 'Alice', location: 'Maryland'}"
.
Our test will succeed if the model outputs valid JSON.
-
Use the CLI to create a sample configuration file called
empiricalrc.json
.npx empiricalrun init cat empiricalrc.json
-
Run the test samples against the models with the
run
command. This step requires theOPENAI_API_KEY
environment variable to authenticate with OpenAI. This execution will cost $0.0026, based on the selected models.npx empiricalrun
-
Use the
ui
command to open the reporter web app and see side-by-side results.npx empiricalrun ui
Edit the empiricalrc.json
file to make Empirical work for your use-case.
- Configure which models to use
- Configure your test dataset
- Configure scoring functions to grade output quality