Security threat model¶

Audience: intermediate (security review, production deployment)

ModelVault is a local, embedded database — one file in your app’s trust boundary, not a network service. This page defines threat model, expectations, and non-goals.

Evaluating ModelVault: Why ModelVault · Operations: Runbook

Scope and non-goals¶

In scope
Malicious or corrupted .modelvault files opened by the engine.
Untrusted input values passed through public APIs (Rust and Python).
Denial-of-service via pathological inputs (CPU, memory, disk growth).
Out of scope (for now)
Network server exposure (ModelVault is not a DB server).
Multi-tenant isolation / sandboxing beyond normal OS process boundaries.
Cryptographic confidentiality guarantees (encryption-at-rest is a future consideration, not a current guarantee).

Attacker model¶

Assume an attacker can provide:

A crafted .modelvault file with arbitrary bytes (including truncated, torn-write, or checksum-colliding attempts).
SQL text for the supported DB-API SELECT subset (Python), including adversarial whitespace and parameter edge cases.
Arbitrary JSON schema descriptors in Python (fields_json, indexes_json) and arbitrary row values.

Assume the attacker cannot:

Execute arbitrary code inside the process except through ModelVault’s bugs.
Bypass OS permissions (ModelVault has no elevated privileges).

Security invariants (must hold)¶

No unsafe: the workspace forbids unsafe ([workspace.lints.rust] unsafe_code = "forbid").
No panics from untrusted input: decoder and parser failures should return structured errors, not panic.
Deterministic corruption handling: checksums and decode failures yield deterministic, documented errors.
Recovery correctness: in AutoTruncate, the engine must only recover to a prefix that preserves durable invariants; in Strict, it must fail fast.
Ephemeral spill isolation: Temp segments are ignored by replay and must never influence durable state after reopen.

Primary risk areas¶

File-format decode surfaces: header, superblocks, segment headers, and payload decoders (catalog/record/index/checkpoint).
Replay logic: transaction framing and checkpoint-assisted replay must not produce inconsistent in-memory state.
Planner/executor: must not allocate unbounded memory for supported operator shapes; spill paths should engage as intended.
Python bindings: conversions between Python values ↔ RowValue must validate types and avoid panic paths.

Mitigations in the repo¶

Bounded decode: segment payloads, field bytes, list/checkpoint entry counts, and SQL LIMIT are capped to limit allocation and CPU on hostile inputs.
Bounded encode (0.16+): encode_tagged_scalar applies the same field-byte caps as decode so oversized strings/bytes fail at write time.
Regex registration: pattern length and nested-quantifier checks at schema registration; bounded regex cache.
Index integrity: unique index delete PK mismatches fail replay; modelvault verify rebuilds indexes from row data.
Cross-process locking: exclusive flock on the main database file (Unix) in addition to the sidecar lock file.
Property/invariant tests: snapshot roundtrips and other invariants are validated via proptest.
Coverage + doc verification: CI runs scripts/verify-doc-examples.sh to prevent doc drift in supported user workflows.

Supply chain and release risks¶

ModelVault’s attacker model is primarily “malicious local file”, but production deployments should also consider supply-chain threats:

Compromised GitHub Actions / CI runners
Malicious or vulnerable Rust/Python dependencies
Stolen crates.io / PyPI publishing credentials
Artifact substitution/tampering between CI and registries

Mitigations used (or recommended) in this repo:

Dependency update automation (Dependabot) and vulnerability scanning (cargo audit, pip-audit)
Prefer least-privilege release tokens and regular rotation
Keep release automation and checks reproducible and documented

Operational guidance¶

Treat .modelvault files as untrusted input when sourced externally.
Prefer RecoveryMode::Strict when you need fail-fast behavior (e.g. automated pipelines).
Use AutoTruncate when best-effort salvage is preferred and truncation is acceptable.