LLM Observability & Cost Accounting
Every LLM call.
Captured. Priced.
A single binary ingests every prompt and completion, auto-computes cost from a 2,263-model pricing catalog, and serves real-time aggregations. Multi-tenant with configurable retention tiers. Zero dependencies to start; S3/R2 when you scale.
How it works
Your app calls an LLM
OpenAI, Anthropic, Gemini, Bedrock, or any of 12 supported providers. Your code doesn't change.
POST the event to Keplor
Two required fields: model and provider. Add token counts and Keplor computes cost automatically. Each key gets its own retention tier.
Query costs and usage
Real-time quotas, daily rollups, per-model breakdowns. Filter by user, key, provider, or time range. Export as JSON Lines.
Capabilities
Everything you need.
Nothing you don't.
Automatic cost accounting
Bundled LiteLLM pricing catalog covers 2,263 models across all major providers. Handles cache discounts, reasoning tokens, batch pricing, and audio/image tokens. Cost stored as int64 nanodollars for precision.
Full request & response capture
Every prompt and completion stored alongside event metadata. Optionally archive old events to Cloudflare R2, AWS S3, or MinIO as compressed JSONL files to keep SQLite lean.
Multi-tenant with tiered retention
Assign API keys to named retention tiers: free (7 days), pro (90 days), team (180 days), or any custom tier. GC runs per-tier automatically. Tier names and durations are fully configurable.
Real-time aggregation API
Quota checks, daily rollups, and period statistics via REST. Filter by user, API key, model, provider, or time range. Cursor-based pagination for large result sets.
Event archival to S3/R2
Archive old events as compressed JSONL to Cloudflare R2, AWS S3, or MinIO. Age-based and size-based triggers. Daily rollups preserved in SQLite. Automatic 6-hour archive cycles with per-chunk error isolation.
Zero-dep single binary
Static musl binary under 10 MB. SQLite with WAL mode and connection pooling. One-command Docker deploy. No JVM, no runtime, no cloud account required.
Server-side key attribution
Authenticated keys are injected server-side, preventing clients from spoofing cost attribution. Each key carries a tier, so billing and retention are always tied to the actual caller.
12 providers, one API
OpenAI, Anthropic, Gemini, Bedrock, Azure, Mistral, Groq, xAI, DeepSeek, Cohere, Ollama, and any OpenAI-compatible endpoint. Provider-specific token handling built in.
Integration
Three lines to start.
$ curl -X POST http://localhost:8080/v1/events \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","provider":"openai",
"usage":{"input_tokens":500,"output_tokens":200}}'Returns cost in nanodollars, event ID, and normalized model/provider. See the integration guide for Python, Node.js, and LiteLLM examples.
Supported providers
Every major LLM provider.
Start observing.
docker compose up or build from source.
No account, no API key, no credit card.