LLM Observability & Cost Accounting

Every LLM call.
Captured. Priced.

A single binary ingests every LLM event, auto-computes cost from a 3,625-model pricing catalog, and serves real-time aggregations. Multi-tenant with configurable retention tiers. Zero dependencies to start; S3/R2 when you scale.

Get started Integration guide API reference →

How it works

Your app calls an LLM

OpenAI, Anthropic, Gemini, Bedrock, or any of 12 supported providers. Your code doesn't change.

POST the event to Keplor

Two required fields: model and provider. Add token counts and Keplor computes cost automatically. Each key gets its own retention tier.

Query costs and usage

Real-time quotas, daily rollups, per-model breakdowns. Filter by user, key, provider, or time range. Export as JSON Lines.

Capabilities

Everything you need.
Nothing you don't.

Automatic cost accounting

Bundled LiteLLM pricing catalog covers 3,625 models across all major providers. Handles cache discounts, reasoning tokens, batch pricing, and audio/image tokens. Cost stored as int64 nanodollars for precision. Auto-refreshed daily with version-pinned fallback.

Two ingest modes

Default durable POST awaits the disk flush — every accepted event has confirmed durability. Add ?durable=false for fire-and-forget at sub-2 ms p50; lose nothing on the happy path, recover from process crashes via WAL replay.

Multi-tenant with tiered retention

Assign API keys to named retention tiers: free (7 days), pro (90 days), team (180 days), or any custom tier. GC runs per-tier automatically. Tier names and durations are fully configurable.

Real-time aggregation API

Quota checks, daily rollups, and period statistics via REST. Filter by user, API key, model, provider, or time range. Cursor-based pagination for large result sets.

Event archival to S3/R2

Archive old events as zstd-compressed JSONL to Cloudflare R2, AWS S3, or MinIO. Age-based and size-based triggers. Daily rollups preserved locally; archived events tombstoned and reclaimed by segment GC. Per-chunk error isolation.

Zero-dep single binary

Static musl binary under 12 MB. KeplorDB columnar log + per-tier engines. One-command Docker deploy. No JVM, no runtime, no cloud account required.

Server-side key attribution

Authenticated keys are injected server-side, preventing clients from spoofing cost attribution. Each key carries a tier, so billing and retention are always tied to the actual caller.

12 providers, one API

OpenAI, Anthropic, Gemini, Bedrock, Azure, Mistral, Groq, xAI, DeepSeek, Cohere, Ollama, and any OpenAI-compatible endpoint. Provider-specific token handling built in.

Integration

Three lines to start.

$ curl -X POST http://localhost:8080/v1/events \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","provider":"openai",
      "usage":{"input_tokens":500,"output_tokens":200}}'

Returns cost in nanodollars, event ID, and normalized model/provider. See the integration guide for Python, Node.js, and LiteLLM examples.

~12 MB

Binary size

Static musl, mimalloc + simd-json on

40K+ rps

Sustained throughput

Single process, 0 errors, p99 < 20 ms

3,625

Models priced

LiteLLM catalog, auto-refreshed daily

5 ms

p99 (fire-and-forget)

8.5 ms p99 durable, with PGO build

Supported providers

Every major LLM provider.

OpenAIAnthropicGeminiVertex AIAWS BedrockAzure OpenAIMistralGroqxAI GrokDeepSeekCohereOllama + any OpenAI-compatible

Start observing.

docker compose up or build from source. No account, no API key, no credit card.

Read the quickstart View on GitHub

Every LLM call.Captured. Priced.