SAMIZDAT

The autovacuum system

A SAMIZDAT node is, among other things, a participating cache for the rest of the network. Every object you fetch sticks around on your disk and is served back out to anyone who asks for it, which is great for the network and slightly less great for your free space. The vacuum subsystem is what keeps that cache bounded.

Why the node needs vacuum

Every node has a storage budget set by the max_storage field in node.toml, expressed in megabytes. Two kinds of content count against the budget very differently:

  • Bookmarked content is content the node operator has decided is valuable: your own series’ editions, content you have explicitly pinned, objects referenced by series you subscribe to. Vacuum will never evict bookmarked content. If your bookmarks already exceed max_storage, you will simply run over budget.
  • Disposable cached content is everything else: bytes that happened to flow through your node because you visited a page or helped serve a request. Vacuum is free to evict any of this when the total storage gets near the budget.

The distinction is tracked per-object in a refcount-style table; see node/src/models/bookmark.rs for the on-disk encoding.

What vacuum does

Vacuum runs in two passes.

The eviction pass walks the object-statistics table, ranks each object by a usefulness score, and removes the least-useful disposable objects (heap-ordered, smallest usefulness popped first) until total storage is back under max_storage. Anything bookmarked is skipped in this loop; if vacuum runs out of disposable objects before getting under budget, it reports Insufficient and leaves the rest alone. See node/src/vacuum.rs for the exact eviction policy.

The garbage-collection pass cleans up the chunk layer underneath the object layer. Orphan chunks (chunks whose refcount has dropped to zero) get deleted, and dangling collection items whose backing object no longer exists get pruned. There is also a startup-only sweep that deletes chunks left behind by imports that crashed mid-flight; the refcount machinery that backs this lives in the CreateChunkRefCount migration in node/src/db/migrations.rs.

A daemon loop in run_vacuum_daemon paces the passes adaptively so vacuum never takes more than a configurable share (currently 5%) of the node’s wall-clock time.

Manual vacuum

If you do not want to wait for the daemon’s next pass, the CLI exposes a one-shot trigger:

samizdat vacuum

This runs a single round of the same eviction-plus-GC logic the daemon uses and prints whether it succeeded in getting under budget, failed to (because too much is bookmarked), or did not need to do anything in the first place.