Why I made run trees content-addressed
Content addressing is the idea that the identity of a piece of data is derived from its content, not from where it lives or when it was created. Git does this with commits. IPFS does it with files. opentine does it with agent steps.
In opentine, every Step has an ID that's the hash of its inputs, model configuration, and tool setup. This seemingly simple decision has profound implications.
First: deduplication. If two different runs happen to make the same API call with the same model and the same inputs, they produce the same step hash. It gets stored once. When you fork a run from step 7, steps 1-7 aren't copied — they're referenced. The forked run shares the same content-addressed steps.
Second: caching. If a step's hash already exists in the local store, we can skip re-executing it entirely. This is why forking is cheap. You don't re-pay for the steps you've already run.
Third: integrity. You can verify that a run hasn't been tampered with by checking the hash chain. If someone modifies a step's outputs, the hash won't match. This matters for audit trails in production.
Fourth: portability. A .tine file is a self-contained run tree where every step is identified by content hash. Move it between machines, share it with teammates, store it in S3 — the hashes are the same everywhere. No database IDs, no server state.
The implementation is straightforward: opentine uses SHA-256 over a canonical msgspec-serialized representation of the step's inputs. The hash is computed before execution starts, so we can check the cache before making any API calls.
This is the same principle that makes git fast and reliable — just applied to agent runs instead of source code.