LLM Observability & Cost Accounting

Every LLM call.
Captured. Priced.

A single binary ingests every prompt and completion, auto-computes cost from a 2,263-model pricing catalog, and serves real-time aggregations. Multi-tenant with configurable retention tiers. Zero dependencies to start; S3/R2 when you scale.

How it works

1

Your app calls an LLM

OpenAI, Anthropic, Gemini, Bedrock, or any of 12 supported providers. Your code doesn't change.

2

POST the event to Keplor

Two required fields: model and provider. Add token counts and Keplor computes cost automatically. Each key gets its own retention tier.

3

Query costs and usage

Real-time quotas, daily rollups, per-model breakdowns. Filter by user, key, provider, or time range. Export as JSON Lines.

Capabilities

Everything you need.
Nothing you don't.

Automatic cost accounting

Bundled LiteLLM pricing catalog covers 2,263 models across all major providers. Handles cache discounts, reasoning tokens, batch pricing, and audio/image tokens. Cost stored as int64 nanodollars for precision.

Full request & response capture

Every prompt and completion stored alongside event metadata. Optionally archive old events to Cloudflare R2, AWS S3, or MinIO as compressed JSONL files to keep SQLite lean.

Multi-tenant with tiered retention

Assign API keys to named retention tiers: free (7 days), pro (90 days), team (180 days), or any custom tier. GC runs per-tier automatically. Tier names and durations are fully configurable.

Real-time aggregation API

Quota checks, daily rollups, and period statistics via REST. Filter by user, API key, model, provider, or time range. Cursor-based pagination for large result sets.

Event archival to S3/R2

Archive old events as compressed JSONL to Cloudflare R2, AWS S3, or MinIO. Age-based and size-based triggers. Daily rollups preserved in SQLite. Automatic 6-hour archive cycles with per-chunk error isolation.

Zero-dep single binary

Static musl binary under 10 MB. SQLite with WAL mode and connection pooling. One-command Docker deploy. No JVM, no runtime, no cloud account required.

Server-side key attribution

Authenticated keys are injected server-side, preventing clients from spoofing cost attribution. Each key carries a tier, so billing and retention are always tied to the actual caller.

12 providers, one API

OpenAI, Anthropic, Gemini, Bedrock, Azure, Mistral, Groq, xAI, DeepSeek, Cohere, Ollama, and any OpenAI-compatible endpoint. Provider-specific token handling built in.

Integration

Three lines to start.

$ curl -X POST http://localhost:8080/v1/events \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","provider":"openai",
      "usage":{"input_tokens":500,"output_tokens":200}}'

Returns cost in nanodollars, event ID, and normalized model/provider. See the integration guide for Python, Node.js, and LiteLLM examples.

<10 MB
Binary size
Single static musl binary
261K/s
Events throughput
Per core, fire-and-forget
2,263
Models priced
LiteLLM catalog, auto-refreshed
<1 ms
Ingestion overhead
p99 ingestion latency

Supported providers

Every major LLM provider.

OpenAIAnthropicGeminiVertex AIAWS BedrockAzure OpenAIMistralGroqxAI GrokDeepSeekCohereOllama + any OpenAI-compatible

Start observing.

docker compose up or build from source. No account, no API key, no credit card.