Event Archival

Archive old events to S3, R2, or MinIO as compressed JSONL.

How it works

As events age past a configurable threshold, Keplor archives them to an S3-compatible object store and tombstones them in KeplorDB. Segment GC reclaims the disk space on the next retention sweep, keeping the data dir lean.

Data	After archival
Recent events (within `archive_after_days`)	Stay in KeplorDB — fully queryable
Old events (past threshold)	Compressed JSONL in S3/R2 — tombstoned locally; reclaimed by segment GC
Daily rollups	Always preserved — aggregation queries unaffected
Archive manifests	Tracked in `{data_dir}/archive_manifests.jsonl` for audit

All query, stats, rollup, and quota endpoints continue working on data that remains in the local store. The has_archived_data flag in query responses indicates when archived data exists for the queried time range. Pass ?include_archived=true on GET /v1/events to merge archived events into the response (one S3 round-trip per overlapping manifest; uncached).

Build with S3 support

$ cargo build --release --features keplor-cli/s3,keplor-cli/mimalloc

Or with Docker:

# Dockerfile already includes mimalloc.
# To add S3, edit the build line:
RUN cargo build --release --locked --target x86_64-unknown-linux-musl \
    -p keplor-cli --features keplor-cli/mimalloc --features keplor-cli/s3

Archive lifecycle

Every archive_interval_secs (default 1 hour), Keplor checks whether archival should run based on age and/or data-dir size triggers. When triggered:

Force rollup for affected days (preserves daily aggregations after tombstoning)
Query events older than archive_after_days, ordered by (user_id, timestamp)
Group by user + day, serialize to JSONL, compress with zstd, upload to S3/R2
Record manifest in the JSONL sidecar (archive_manifests.jsonl) and the in-memory index
Tombstone archived events in KeplorDB; segment GC reclaims their disk space on the next sweep

S3/R2 key format:
{prefix}/user_id={alice}/day=2026-04-15/{archive_id}.jsonl.zstd

Each chunk (user + day) is archived independently. If one upload fails, the remaining chunks continue. Failed events stay in KeplorDB and are retried on the next cycle.

Cloudflare R2

R2 is the recommended choice for most deployments: 10 GB free storage, zero egress fees, S3-compatible API.

Create a bucket in the Cloudflare dashboard (e.g. keplor-archive)
Create an R2 API token with Object Read & Write permissions
Add to keplor.toml:

[archive]
bucket = "keplor-archive"
endpoint = "https://<account-id>.r2.cloudflarestorage.com"
region = "auto"
access_key_id = "your-r2-access-key"
secret_access_key = "your-r2-secret-key"
prefix = "events"
archive_after_days = 30

AWS S3

[archive]
bucket = "keplor-archive"
endpoint = "https://s3.us-east-1.amazonaws.com"
region = "us-east-1"
access_key_id = "AKIA..."
secret_access_key = "..."
prefix = "events"
archive_after_days = 30

Standard S3 pricing applies. Consider S3 Intelligent-Tiering for infrequently accessed archives.

MinIO (self-hosted)

[archive]
bucket = "keplor-archive"
endpoint = "http://localhost:9000"
region = "us-east-1"
access_key_id = "minioadmin"
secret_access_key = "minioadmin"
path_style = true    # required for MinIO
archive_after_days = 30

Any S3-compatible service works: DigitalOcean Spaces, Backblaze B2, Wasabi, etc.

Configuration reference

Key	Type	Default	Description
`bucket`	string		S3 bucket name (required)
`endpoint`	string		S3 endpoint URL (required)
`region`	string		Region (`"auto"` for R2, `"us-east-1"` for AWS)
`access_key_id`	string		Access key (required)
`secret_access_key`	string		Secret key (required)
`prefix`	string	`""`	Key prefix in bucket (e.g. `"events"`)
`path_style`	bool	`false`	Path-style addressing (required for MinIO)
`archive_after_days`	u64	`30`	Archive events older than this many days
`archive_after_hours`	u64	`0`	Sub-day archival (hours). Overrides `archive_after_days` when non-zero. Set to `1` for hourly offload.
`archive_threshold_mb`	u64	`0`	Also archive when the data dir exceeds this size (MB). 0 = age-only.
`archive_batch_size`	usize	`10000`	Maximum events per JSONL archive file
`archive_interval_secs`	u64	`3600`	How often the archive loop runs (seconds). Default: 1 hour.

Archive vs. retention

If archive_after_days is greater than the shortest retention tier’s days value, GC will delete events before they can be archived. Keplor warns about this at startup. Always set archive_after_days lower than your shortest tier’s retention.

GC & cleanup

Archival runs before GC in the combined loop to prevent data loss. Daily rollups are force-refreshed before tombstoning, so aggregation queries remain accurate even after events are archived. Segment GC reclaims the on-disk space the next time it runs (segment-granular).

S3 connectivity is verified at startup with a HEAD probe. Bad credentials or unreachable endpoints fail immediately rather than silently misbehaving hours later on the first archive cycle.

CLI commands

Archive manually (outside the automatic cycle):

$ keplor archive --config keplor.toml --older-than-days 14

Check archive status:

$ keplor archive-status --data-dir /var/lib/keplor

Next steps

Configuration reference — all [archive] fields.
Storage config — max_db_size_mb and other storage settings.
Integration guide — full setup with retention tiers and auth.