svelte-bench

SvelteBench

An LLM benchmark for Svelte 5 based on the OpenAI methodology from OpenAI's paper "Evaluating Large Language Models Trained on Code", using a similar structure to the HumanEval dataset.

Work in progress

Overview

SvelteBench evaluates LLM-generated Svelte components by testing them against predefined test suites. It works by sending prompts to LLMs, generating Svelte components, and verifying their functionality through automated tests.

Supported Providers

SvelteBench supports multiple LLM providers:

OpenAI - GPT-4, GPT-4o, o1, o3, o4 models
Anthropic - Claude 3.5, Claude 4 models
Google - Gemini 2.5 models
OpenRouter - Access to multiple providers through a single API

Adding New Tests

To add a new test:

Create a new directory in src/tests/ with the name of your test
Add a prompt.md file with instructions for the LLM
Add a test.ts file with Vitest tests for the generated component

Example structure:

src/tests/your-test/
├── prompt.md    # Instructions for the LLM
└── test.ts      # Tests for the generated component

Benchmark Results

After running the benchmark, results are saved to a JSON file in the benchmarks directory. The file is named benchmark-results-{timestamp}.json.

When running with a context file, the results filename will include "with-context" in the name: benchmark-results-with-context-{timestamp}.json.

sveltesveltekitai

86 stars downloads unknown website

Author

Stanislav Khromov

👨‍💻️ Full-stack software engineer

Contact

Featured Stories

No featured articles.