Event Archival

Archive old events to S3, R2, or MinIO as compressed JSONL.

How it works

As events age past a configurable threshold, Keplor archives them to an S3-compatible object store and deletes them from SQLite to keep the database lean.

DataAfter archival
Recent events (within archive_after_days)Stay in SQLite — fully queryable
Old events (past threshold)Compressed JSONL in S3/R2 — deleted from SQLite
Daily rollupsAlways in SQLite — aggregation queries unaffected
Archive manifestsTracked in SQLite for audit and status

All query, stats, rollup, and quota endpoints continue working on data that remains in SQLite. The has_archived_data flag in query responses indicates when archived data exists for the queried time range.

Build with S3 support

$ cargo build --release --features mimalloc,s3

Or with Docker:

# Dockerfile already includes mimalloc.
# To add S3, edit the build line:
RUN cargo build --release --locked --target x86_64-unknown-linux-musl \
    -p keplor-cli --features mimalloc,s3

Archive lifecycle

Every archive_interval_secs (default 1 hour), Keplor checks whether archival should run based on age and/or database size triggers. When triggered:

  1. Force rollup for affected days (preserves daily aggregations after deletion)
  2. Query events older than archive_after_days, ordered by (user_id, timestamp)
  3. Group by user + day, serialize to JSONL, compress with zstd, upload to S3/R2
  4. Record manifest in SQLite for audit and tracking
  5. Delete archived events from SQLite, then VACUUM to reclaim disk space
S3/R2 key format:
{prefix}/user_id={alice}/day=2026-04-15/{archive_id}.jsonl.zstd

Each chunk (user + day) is archived independently. If one upload fails, the remaining chunks continue. Failed events stay in SQLite and are retried on the next cycle.

Cloudflare R2

R2 is the recommended choice for most deployments: 10 GB free storage, zero egress fees, S3-compatible API.

  1. Create a bucket in the Cloudflare dashboard (e.g. keplor-archive)
  2. Create an R2 API token with Object Read & Write permissions
  3. Add to keplor.toml:
[archive]
bucket = "keplor-archive"
endpoint = "https://<account-id>.r2.cloudflarestorage.com"
region = "auto"
access_key_id = "your-r2-access-key"
secret_access_key = "your-r2-secret-key"
prefix = "events"
archive_after_days = 30

AWS S3

[archive]
bucket = "keplor-archive"
endpoint = "https://s3.us-east-1.amazonaws.com"
region = "us-east-1"
access_key_id = "AKIA..."
secret_access_key = "..."
prefix = "events"
archive_after_days = 30

Standard S3 pricing applies. Consider S3 Intelligent-Tiering for infrequently accessed archives.

MinIO (self-hosted)

[archive]
bucket = "keplor-archive"
endpoint = "http://localhost:9000"
region = "us-east-1"
access_key_id = "minioadmin"
secret_access_key = "minioadmin"
path_style = true    # required for MinIO
archive_after_days = 30

Any S3-compatible service works: DigitalOcean Spaces, Backblaze B2, Wasabi, etc.

Configuration reference

KeyTypeDefaultDescription
bucketstringS3 bucket name (required)
endpointstringS3 endpoint URL (required)
regionstringRegion ("auto" for R2, "us-east-1" for AWS)
access_key_idstringAccess key (required)
secret_access_keystringSecret key (required)
prefixstring""Key prefix in bucket (e.g. "events")
path_styleboolfalsePath-style addressing (required for MinIO)
archive_after_daysu6430Archive events older than this many days
archive_after_hoursu640Sub-day archival (hours). Overrides archive_after_days when non-zero. Set to 1 for hourly offload.
archive_threshold_mbu640Also archive when SQLite exceeds this size (MB). 0 = age-only.
archive_batch_sizeusize10000Maximum events per JSONL archive file
archive_interval_secsu643600How often the archive loop runs (seconds). Default: 1 hour.

Archive vs. retention

If archive_after_days is greater than the shortest retention tier's days value, GC will delete events before they can be archived. Keplor warns about this at startup. Always set archive_after_days lower than your shortest tier's retention.

GC & cleanup

Archival runs before GC in the combined loop to prevent data loss. Daily rollups are force-refreshed before deletion, so aggregation queries remain accurate even after events are archived and removed from SQLite.

S3 connectivity is verified at startup. Bad credentials cause an immediate error log and disable archival, rather than silently failing hours later on the first archive cycle.

CLI commands

Archive manually (outside the automatic cycle):

$ keplor archive --config keplor.toml --older-than-days 14

Check archive status:

$ keplor archive_status --config keplor.toml

Next steps

Configuration reference — all [archive] fields.
Storage configmax_db_size_mb and other storage settings.
Integration guide — full setup with retention tiers and auth.