Benchmark Client

A small tool you run on your own computer to measure how fast your local AI models are. Results are shared anonymously with the community.

Download

Pick your platform. No installation needed — just download and run.

On Linux or macOS, make the file executable first: chmod +x llm-benchmark*

How it works

You need Ollama, LM Studio, vLLM, or HF TGI running locally with at least one model loaded. The whole process takes about 5–15 minutes.

1
Run the client
Open a terminal, navigate to where you downloaded the file, and run it. It starts an interactive prompt.
2
Select your AI tool and model
The client automatically finds any running AI tools on your machine and lists the available models.
3
Wait for the benchmark
It sends a series of test prompts to your model and measures how fast it responds.
4
Results are submitted
Once done, the results are automatically added to the community leaderboard so everyone can compare.

What data is collected

What is sent

  • GPU model and video memory size
  • CPU model and number of cores
  • Total system RAM
  • Operating system name and version
  • AI tool name and version (e.g. Ollama 0.3)
  • Model name and parameter size (e.g. llama3 8B)
  • Benchmark results: generation speed, memory usage, time to first token
  • The benchmark prompt texts and your model's responses to them (for server-side output quality scoring)

What is never sent

  • Your name, email, or any personal information
  • Your IP address (not stored)
  • Your model weights or any local files
  • Your own conversations or chat history
The only purpose of data collection is to power the community comparison tables on this site. All submitted results are public and visible to everyone.