jac migrate — project description

Inspiration

Jac programs persist graphs in local SQLite storage as pickled anchor blobs. When you evolve a schema—adding new has fields to node or edge types—older rows still unpickle, but the restored archetype instances may not have those attributes, so code that reads the new fields can fail at runtime (for example AttributeError). That gap is familiar from SQL migrations, but here the “rows” are live archetype objects inside pickles, not flat columns. We wanted a first-party workflow so developers can discover drift, generate migration stubs, and apply upgrades without hand-editing the database or rewriting history.

What it does

The jac migrate command (registered under the CLI project group) supports three actions:

  • status — Resolves the project’s SQLite file from jac.toml and the filesystem (including <name>.db, optional shelf_db_path, anchor_store.db, a single .jac/data/*.db, or legacy ~/.jac/data/<name>.db), then prints whether it exists, lists .jac/migrations/*.py, and shows applied vs pending using .jac/migrate_applied.txt.
  • generate — Parses project .jac sources with JacProgram, builds the current node/edge has map from the AST, primes the runtime so pickles that reference __main__ archetypes can load, scans the anchors table, unpickles blobs, and diffs declared fields vs fields seen on instances. For missing fields it proposes fills (literals / null for “empty” defaults). It writes a numbered Python migration under .jac/migrations/ (for example 0001_auto_Item.py) containing MIGRATION_ID and AUTO_FILLS (JSON embedded in the file). There is no separate jacmigrate.toml spec file in this iteration.
  • apply — Loads pending migration modules (skips IDs already listed in migrate_applied.txt), backs up the .db to a timestamped .bak copy by default, then for each anchor row: unpickle → setattr missing fields from AUTO_FILLS → write the blob back. -d / --dry_run reports how many rows would change without writing; -B / --no_backup skips the file copy. Applied migration IDs are appended to migrate_applied.txt (not a jac_migrations table or spec hash).

Fields without a literal initializer are still filled with None in the generated stub so you can edit AUTO_FILLS before apply if you need a real default.

How we built it

  • Schema + DB diff: jaclang.migrate.engine uses JacProgram.parse_str and walks UniTree Archetype nodes to collect has names, defaults where they are literals, and merges schemas across files. scan_db_field_usage reads anchors.data, unpickles, and unions attribute names on NodeAnchor / EdgeAnchor archetypes.
  • Unpickle context: prime_unpickle_context uses the project’s entry-point from jac.toml (then other *.jac candidates) with proc_file + Jac.jac_import(..., override_name="__main__") so __main__.Item-style pickles resolve during scan and apply.
  • CLI glue: migrate.jac registers the command; migrate.impl.jac calls jaclang.migrate.engine.run_cli and returns the exit code.
  • Supporting runtime tweak: JacRuntime.base_path_dir defaults to None so persistence is not accidentally anchored to an unrelated cwd when jac run is invoked from another directory—keeping DB location aligned with what migrate resolves.
  • Tests: The intended story is covered by a small demo project (v1 seed → v2 failure → generate/apply → v2 success); automated jac test integration tests are a natural follow-up.

Challenges we ran into

  • Where is the DB? Vanilla Jac vs plugins (e.g. jac-scale) may use migrate-demo-issue.db, anchor_store.db, or a configured path—migrate had to implement a clear resolution order and helpful error hints.
  • Pickles need the right module context: Without importing the project’s Jac, pickle.loads could succeed but archetypes would not match, or resolution would fail—priming __main__ from the entry file was essential.
  • Defaults in the AST: Only literal defaults are carried into AUTO_FILLS automatically; everything else becomes null in JSON until the developer edits the generated script.

Accomplishments that we're proud of

  • Human-readable migration scripts (Python + embedded JSON) that you can review and edit before apply.
  • status that shows which DB file is in use and which scripts are pending—quick answers before touching data.
  • Automatic DB backup and dry-run for apply, in the spirit of safe SQL-style workflows.
  • End-to-end demo (v1 → v2 break → migrate → v2 works) showing the tool closes a real OSP / SQLite pickle gap.

What we learned

  • Graph persistence in Jac is powerful but version-sensitive; a small CLI beats one-off pickle scripts once schemas churn.
  • Compiler metadata (UniTree + archetype declarations) is a good source of truth for “what fields exist now,” while runtime import must be used only to make unpickling faithful, not to accidentally run application with entry logic in the migrate path.
  • SQLite is a fine store for read–modify–write over many BLOBs when updates are explicit, logged, and paired with backup and skip behavior for bad rows.

What's next

The core apply path should stay deterministic and reviewable. On top of that, LLM integration (including Jac’s by llm / Meaning-Typed Programming) is a natural extension:

  • Assistive generate: Given AST diff and DB field usage, an LLM suggests richer AUTO_FILLS, rename notes, or inline comments in the migration file; humans review before apply.
  • Explainer: After status or generate, an LLM summarizes what drifted, what the migration will do, and risks in plain language.
  • Agentic wrapper: A small Jac workflow that uses jac migrate as a tool—multi-step flows such as detect drift → propose or refine migration → apply -d → then apply—good fit for agentic demos while the engine remains rule-based.

Guardrail: LLM output should be treated as draft; never silently rewrite pickles without a checked-in migration artifact. apply stays the trusted, non-LLM step.

Other roadmap items (non-LLM):

  • Richer operations than fill-missing (renames, removals, splits, custom hooks per archetype) with validation.
  • A dedicated plan / preview action that summarizes impact without loading full migration modules, plus optional JSON output for CI.
  • Optional SQL-style metadata (e.g. applied migrations table + checksums) for teams that outgrow a text file.
  • Tighter integration with jac run, project profiles, and docs on Jac for when to generate vs. apply in team workflows.
  • Automated tests in-tree and version-compatibility notes for long-lived databases.

Project context: Jaseci · Jac language & tooling.

Built With

  • claude
  • cursor
Share this project:

Updates