svelte-bench
SvelteBench
An LLM benchmark for Svelte 5 based on the OpenAI methodology from OpenAI's paper "Evaluating Large Language Models Trained on Code", using a similar structure to the HumanEval dataset.
Work in progress
Overview
SvelteBench evaluates LLM-generated Svelte components by testing them against predefined test suites. It works by sending prompts to LLMs, generating Svelte components, and verifying their functionality through automated tests.
Supported Providers
SvelteBench supports multiple LLM providers:
- OpenAI - GPT-4, GPT-4o, o1, o3, o4 models
- Anthropic - Claude 3.5, Claude 4 models
- Google - Gemini 2.5 models
- OpenRouter - Access to multiple providers through a single API
Adding New Tests
To add a new test:
- Create a new directory in
src/tests/
with the name of your test - Add a
prompt.md
file with instructions for the LLM - Add a
test.ts
file with Vitest tests for the generated component
Example structure:
src/tests/your-test/
├── prompt.md # Instructions for the LLM
└── test.ts # Tests for the generated component
Benchmark Results
After running the benchmark, results are saved to a JSON file in the benchmarks
directory. The file is named benchmark-results-{timestamp}.json
.
When running with a context file, the results filename will include "with-context" in the name: benchmark-results-with-context-{timestamp}.json
.