Core concepts¶
Audience: developers who want the mental model before diving into APIs.
For why ModelVault exists and how it compares to SQLite or JSON files, read Why ModelVault and Comparisons. This page explains how the pieces fit together. Implementation detail lives under Specifications.
The big picture¶
ModelVault is an embedded database for application models:
- You open one database (a file or memory image) inside your process.
- You define collections—typed containers analogous to tables, but governed by a schema, not free-form documents.
- Every write is validated; durable state is an append-only log inside a single
.modelvaultfile.
There is no separate server to install or operate.
1. Database — one embedded unit¶
A database is the handle you open in your application:
| Mode | API | Persistence |
|---|---|---|
| On-disk | Database.open("app.modelvault") or await AsyncDatabase.open(...) |
Durable single file; ship with your app |
| In-memory | Database.open_in_memory() or await AsyncDatabase.open_in_memory() |
Fast tests; use snapshots to export/import |
Use AsyncDatabase for asyncio apps (FastAPI, Starlette) so handlers can await storage without blocking the event loop. Sync Database is the default for scripts, CLIs, and desktop apps — see Async policy.
On open, ModelVault validates the file header, replays the schema catalog, reconstructs the latest row map from record segments, and restores index state. Recovery behavior depends on RecoveryMode.
2. Collection — typed container¶
A collection holds records that share one schema version. It is the unit you name in APIs ("books", "users").
- Each collection has a name, monotonic schema versions, and optional secondary indexes.
- Records are keyed by a declared primary key field.
- Inserts are replace-by-primary-key (last write wins for a given key).
Think “table,” but the contract is a typed schema, not arbitrary rows.
3. Schema — the contract¶
A schema defines what may be stored:
| Element | Role |
|---|---|
| Field paths | Identifiers such as title or nested profile.timezone |
| Types | Primitives, optionals, lists, objects, enums |
| Constraints | Min/max, length, regex, email, and similar checks |
| Primary key | Unique identifier per record within the collection |
| Indexes | Secondary lookups (unique or non-unique) |
ModelVault is schema-first: the engine rejects invalid states on write, so you do not discover type errors only after data is on disk.
4. Models — how you author schemas¶
| Language | Recommended approach |
|---|---|
| Python | @dataclass or Pydantic v2 + modelvault.models.collection |
| Rust | register_collection with FieldDef, or #[derive(DbModel)] where applicable |
Markers such as __modelvault_primary_key__ and __modelvault_indexes__ connect your class to storage. Subset models let you read or write projections of a larger schema—see Models & collections.
5. Validation — fail fast on write¶
Before a row is appended, ModelVault checks:
- Types — primitives, optionals, lists, objects, enums, and nested paths
- Constraints — engine rules on declared fields
- Unique indexes — no duplicate keys where uniqueness is required. Rows with an absent or null indexed optional field are not indexed (SQL
NULLsemantics): multiple rows may omit the field without violating uniqueness; duplicate non-null keys are still rejected.
Failures are structured with field paths and clear messages. In Python: ModelVaultValidationError / ValueError. In Rust: DbError::Validation.
6. Queries — typed, not SQL-first¶
Primary interfaces are typed, not ad-hoc SQL strings:
| Operation | Surface |
|---|---|
| Point read | get by primary key |
| Filters | Equality, ranges, AND / OR |
| Ordering | order_by, limit |
| Python | db.collection("name").where(...).all() |
| Rust | Query AST + query_iter |
A minimal read-only SQL subset exists for modelvault.dbapi interop. Broader SQL is on the roadmap.
File format — one versioned file¶
All durable state lives in a single .modelvault file: header, superblocks, append-only segments (schema, records, indexes, transactions, checkpoints), and recovery metadata. Operators and contributors should read On-disk format.
Storage modes (summary)¶
| Mode | When to use |
|---|---|
| On-disk | Production embedded apps; default |
| In-memory | Unit tests, scratch workflows, explicit snapshot export |
| Hybrid / streaming | Roadmap — bounded-memory operators for very large queries |
Details: Storage modes.
Where to go next¶
- Quickstart — install and first insert
- Python guide — full Python surface
- Types matrix — supported types and query shapes
- Operations runbook — backup, recovery, locking