Skip to content

Core concepts

Audience: developers who want the mental model before diving into APIs.

For why ModelVault exists and how it compares to SQLite or JSON files, read Why ModelVault and Comparisons. This page explains how the pieces fit together. Implementation detail lives under Specifications.

The big picture

ModelVault is an embedded database for application models:

  • You open one database (a file or memory image) inside your process.
  • You define collections—typed containers analogous to tables, but governed by a schema, not free-form documents.
  • Every write is validated; durable state is an append-only log inside a single .modelvault file.

There is no separate server to install or operate.

1. Database — one embedded unit

A database is the handle you open in your application:

Mode API Persistence
On-disk Database.open("app.modelvault") or await AsyncDatabase.open(...) Durable single file; ship with your app
In-memory Database.open_in_memory() or await AsyncDatabase.open_in_memory() Fast tests; use snapshots to export/import

Use AsyncDatabase for asyncio apps (FastAPI, Starlette) so handlers can await storage without blocking the event loop. Sync Database is the default for scripts, CLIs, and desktop apps — see Async policy.

On open, ModelVault validates the file header, replays the schema catalog, reconstructs the latest row map from record segments, and restores index state. Recovery behavior depends on RecoveryMode.

2. Collection — typed container

A collection holds records that share one schema version. It is the unit you name in APIs ("books", "users").

  • Each collection has a name, monotonic schema versions, and optional secondary indexes.
  • Records are keyed by a declared primary key field.
  • Inserts are replace-by-primary-key (last write wins for a given key).

Think “table,” but the contract is a typed schema, not arbitrary rows.

3. Schema — the contract

A schema defines what may be stored:

Element Role
Field paths Identifiers such as title or nested profile.timezone
Types Primitives, optionals, lists, objects, enums
Constraints Min/max, length, regex, email, and similar checks
Primary key Unique identifier per record within the collection
Indexes Secondary lookups (unique or non-unique)

ModelVault is schema-first: the engine rejects invalid states on write, so you do not discover type errors only after data is on disk.

4. Models — how you author schemas

Language Recommended approach
Python @dataclass or Pydantic v2 + modelvault.models.collection
Rust register_collection with FieldDef, or #[derive(DbModel)] where applicable

Markers such as __modelvault_primary_key__ and __modelvault_indexes__ connect your class to storage. Subset models let you read or write projections of a larger schema—see Models & collections.

5. Validation — fail fast on write

Before a row is appended, ModelVault checks:

  1. Types — primitives, optionals, lists, objects, enums, and nested paths
  2. Constraints — engine rules on declared fields
  3. Unique indexes — no duplicate keys where uniqueness is required. Rows with an absent or null indexed optional field are not indexed (SQL NULL semantics): multiple rows may omit the field without violating uniqueness; duplicate non-null keys are still rejected.

Failures are structured with field paths and clear messages. In Python: ModelVaultValidationError / ValueError. In Rust: DbError::Validation.

6. Queries — typed, not SQL-first

Primary interfaces are typed, not ad-hoc SQL strings:

Operation Surface
Point read get by primary key
Filters Equality, ranges, AND / OR
Ordering order_by, limit
Python db.collection("name").where(...).all()
Rust Query AST + query_iter

A minimal read-only SQL subset exists for modelvault.dbapi interop. Broader SQL is on the roadmap.

File format — one versioned file

All durable state lives in a single .modelvault file: header, superblocks, append-only segments (schema, records, indexes, transactions, checkpoints), and recovery metadata. Operators and contributors should read On-disk format.

Storage modes (summary)

Mode When to use
On-disk Production embedded apps; default
In-memory Unit tests, scratch workflows, explicit snapshot export
Hybrid / streaming Roadmap — bounded-memory operators for very large queries

Details: Storage modes.

Where to go next