Menu
Libraries |
Category

svelte-bench

SvelteBench

An LLM benchmark for Svelte 5 based on the OpenAI methodology from OpenAI's paper "Evaluating Large Language Models Trained on Code", using a similar structure to the HumanEval dataset.

Work in progress

Overview

SvelteBench evaluates LLM-generated Svelte components by testing them against predefined test suites. It works by sending prompts to LLMs, generating Svelte components, and verifying their functionality through automated tests.

Supported Providers

SvelteBench supports multiple LLM providers:

  • OpenAI - GPT-4, GPT-4o, o1, o3, o4 models
  • Anthropic - Claude 3.5, Claude 4 models
  • Google - Gemini 2.5 models
  • OpenRouter - Access to multiple providers through a single API

Adding New Tests

To add a new test:

  1. Create a new directory in src/tests/ with the name of your test
  2. Add a prompt.md file with instructions for the LLM
  3. Add a test.ts file with Vitest tests for the generated component

Example structure:

src/tests/your-test/
├── prompt.md    # Instructions for the LLM
└── test.ts      # Tests for the generated component

Benchmark Results

After running the benchmark, results are saved to a JSON file in the benchmarks directory. The file is named benchmark-results-{timestamp}.json.

When running with a context file, the results filename will include "with-context" in the name: benchmark-results-with-context-{timestamp}.json.

sveltesveltekitai

Comments