Open source · Bring your own keys

Git for your prompts.

Version, diff, and ship AI prompts with the same discipline you bring to code, so changes go live deliberately, not by accident.

Get started — it's free View on GitHub

Providers: OpenAI
Envs: Dev to prod
Self-host: Always

Prompts that adapt. Quietly.

SDK · live on npm

Five lines to live prompts.

Ship a prompt change without redeploying your app. The SDK fetches the active version for the environment you ask for, caches it in memory, and renders {{variables}} in one call.

Read SDK docs View on npm

app.ts

import { createClient } from "lexem-sdk";

const lexem = createClient({
  apiKey: process.env.LEXEM_API_KEY!,
});

const prompt = await lexem.render(
  "summarizer",
  { article: text },
  { env: "production" },
);

What's inside

The whole loop, finally sane.

No more silent prompt edits. No more "wait, what changed last week?". Every change is committed, tested, and reversible.

Real version control

Commit prompts with messages. Diff any two versions, branch for experiments, merge with conflict resolution, rollback in one click.

Evals that catch regressions

Build test suites with typed inputs. Score with exact match, regex, or LLM-as-judge. Auto-flag scores that drop ≥5 points from the previous version.

AI writes your test cases

Skip the blank page. Lexem reads your prompt and variables, then brainstorms a diverse set of judge-scored cases — happy paths, edge cases, refusals — in one click.

AI diff impact

Every diff comes with a plain-English summary plus flagged behaviour risks: tone shifts, scope creep, weakened guardrails. Catch the thing your team would have shipped by accident.

Eval-gated deploys

Three environments per project, strictly promoted dev → staging → prod. Block promotions below your eval-score threshold. Optional teammate approval on any env.

Token analytics built in

Per-version average tokens, 30-day daily trend, per-prompt totals. Powered by eval runs and a one-line lexem.logUsage() call — no extra observability stack to plumb.

How it works

Three moves. No magic.

01
Commit your prompts
Write a prompt, hit commit with a message. Every change is a versioned snapshot — typed variables, tags, branches and all.
02
Run evals against any version
Write your own suite — or let Lexem auto-generate one from the prompt — then run against the model and key of your choice. See score, pass/fail, tokens, latency.
03
Diff, rollback, ship
When something regresses, the AI-flagged diff impact shows what shifted. Roll back in one click, or merge an experimental branch when its eval score beats main.

100% open source

Self-host it. Own the loop.

Lexem runs on your Postgres, with your team's keys. No vendor lock-in, no usage caps, no telemetry. The whole stack is MIT-style permissive so you can fork it tomorrow.

Star on GitHub Try the hosted demo