agent-memory: A Lightweight, Vendor-Neutral Memory + Cognitive-Loop System for Predictable AI–Human Delivery¶
Deterministic memory as a substrate; a lightweight cognitive loop as the control layer¶
Version: 1.2 (describes agent-memory v4.26.1) Status: Draft for peer / leadership review Date: June 29, 2026
Executive Summary¶
AI agents are moving from single-turn prompting to persistent, tool-using, memory-driven runtimes. Two problems dominate production use:
- Memory drift and context loss — context is re-explained every session, decisions are silently forgotten or contradicted, and different AI vendors can't share a common understanding of a project.
- The gap between intent and delivery — agents can be creative, but keeping what is built faithful to what was intended is largely unmanaged.
agent-memory addresses both with a no-code, markdown-only system layered over a
single, git-committed memory/ directory that any AI vendor can read and write. It has
two complementary layers:
- Backward layer — Evolving Memory. An event-sourced ledger (immutable session logs)
is projected into a live
continuity.md, with deterministic decay, supersession (facts that become false, not just unused), periodic invariant re-verification, write-time contradiction checks, and provenance. It answers "where are we, and why?" - Forward layer — the VBDI cognitive loop. Current State → Vision → Blueprint → Design → Implementation → Feedback, with an enforceable intent trace and a human gate at every altitude change. It answers "where are we going, and is delivery faithful to intent?"
The system is vendor-neutral (one shared memory; any agent — Claude, Gemini, Cursor, …), deterministic (no floating-point scoring — every agent reaches the same result by counting, never estimating), lightweight (you "point it at a repo"; there is no ceremony), and it migrates existing vendor AI files into the unified format.
The shared, committed layer carries three things across vendors: memory, steering, and — as of v4.1.0 — portable skills (reusable capabilities authored once and run by any agent; see §5). The skills layer has since matured into seven tool-managed built-ins spanning six vendor adapters (Claude, Gemini, Cursor, Kiro, GitHub Copilot, Google Antigravity), and is governed by a principle that emerged from real cross-vendor use: the judgment-vs-arithmetic boundary — the deterministic, mechanical parts of a memory ritual (re-tiering, the archive move, adapter regeneration, integrity checks) are mechanized as runnable helpers, while every act of judgment (what to archive, how to resolve a contradiction, what the Vision is) stays with the agent and the human (§5, §6).
A second line of maturation hardened operational reliability (§5b): the after-session
ritual no longer depends on an agent reliably self-triggering. Enable now installs a
vendor-neutral git post-commit hook + a CI floor (agent-activated, zero manual user
step), a one-command first-run init for fresh clones, Windows line-ending hardening, a
MERGE.md conflict protocol, and a safe-write discipline — all from the adoption
constraint that any manual step is a barrier once the protocol lands with untrained users.
Its central claim: pairing a deterministic memory substrate with a lightweight cognitive loop yields predictable innovation with human partnership — bold ideas, faithful delivery, a human in the loop at every altitude. This was demonstrated on a real Node.js→Rust rewrite that delivered with no drift, the tool was built using its own loop (dogfooded), and the design has been validated cross-vendor on a real product repo — GitHub Copilot (Gemini 3.1 Pro) independently exercised, critiqued, and even reinvented parts of the toolchain, converging on the same designs (§8).
1. Motivation¶
Most LLM applications begin stateless: each interaction restarts from scratch, and any persistent understanding of the project must be manually reintroduced. Recent industry and academic work converges on a view that the production bottleneck is no longer the model alone but the surrounding architecture — memory, planning coherence, and adaptive execution — and that memory is best understood as a write–manage–read loop coupled to action, not a passive context buffer.
Two consequences follow that existing tools handle poorly:
- Vendor silos. Each AI tool keeps its own steering files and history (
CLAUDE.md,.cursorrules, Aider chat logs, …). A team using more than one vendor — or migrating between them — has no shared, durable project memory. - Intent drift. Even with memory, nothing ties a delivered change back to the decision, the plan, and the goal it was meant to serve. "Confidently wrong" facts and reversed decisions accumulate silently.
agent-memory targets both: a shared, vendor-neutral memory that evolves faithfully, and a forward cognitive loop that makes intent traceable and drift detectable — without imposing a heavyweight process.
2. What agent-memory Is¶
A no-code, markdown-only system with three jobs in one repository:
- A shared memory system — a
memory/layer that persists project context across sessions and across vendors. It is committed to git and travels with the code. - An AI-enablement tool — point it at any repo and it generates a tailored memory system there ("AI enable this repo").
- A migration tool — when the target already has vendor AI files, it folds them into
the unified format (originals preserved under
legacy/, never deleted).
As of v4.1.0, the shared layer also carries portable skills (§5) — reusable
capabilities beside memory and steering — and a knowledge harvest that, at enable and on
demand, distills durable facts from the team's existing human-authored docs (ADRs, decision
logs, design specs) into the shared memory/ layer (§3, §7).
There is no build, lint, or test step: the markdown files are the product, and the
"runtime" is an AI agent reading and acting on them. Two memory layers coexist by design —
the repo's shared memory/ (team, committed) and each contributor's personal runtime store
(e.g. ~/.claude/, individual). The tool only ever touches the shared layer.
3. The Backward Layer — Evolving Memory¶
Memory here is event-sourced, not a mutable blob:
- The ledger. Each work segment writes an immutable session log containing a
## Memory Referencessection — the events (which facts were referenced, created, reactivated, superseded). Session logs are never edited. - The projection.
continuity.mdis the derived live state: facts, each carrying a metadata footer (id,created,last_used,uses,tier, optionalorigin). The projection is recomputable from the ledger at any time (full replay). - Deterministic decay. A periodic review recomputes usage by counting session
files — never a floating-point score — so any agent (Claude, Gemini, …) reaches the
same result. Facts fade through tiers (
working → active → archive-candidate → archived); nothing is ever deleted (archived, with a greppable index). The mechanical steps of this review — recomputing every fact's tier/usage, and the archive move — are now packaged as runnable helpers (refresh-metadata,archive-fact) so a capable agent can't silently skip them, while deciding what to retire stays the agent's judgment (§5). A review-cadence advisory (memory-lint) flags a layer that has gone too long without a review, or grown past a fact/line budget — so a lapsed review can't hide.
On top of that substrate sit four capabilities that make the memory trustworthy:
- Supersession (truth maintenance). When a decision is reversed or a fact becomes
false, it is marked
superseded(terminal) with asuperseded-bylink and archived flagged "superseded," not "faded." Memory can represent change, not just disuse. - Invariant re-verification. Never-decay facts (
core/ architectural invariants) are periodically surfaced for a human to re-confirm — because "never-decay" must not mean "never-checked" (the "confidently wrong" failure mode). - Write-time contradiction check. When a fact is added, the agent scans for one it contradicts → supersede it, or raise an Open Thread. The system never picks a winner.
- Provenance. Each fact can carry an
originpointer to its source session — one-hop traceability and a cheap defense against memory poisoning.
Beyond per-session writes, memory also seeds itself from the team's existing knowledge: a
knowledge harvest recursively reads human-authored docs (ADRs, decision logs, design
specs, roadmaps) and distills the durable facts into memory/ — additively,
map-don't-mirror, check-existing-first so a re-run never duplicates, conflicts raised as a
Contradiction thread (never silently resolved). It runs once at enable and on demand
thereafter (the harvest-knowledge built-in), scoped incrementally by a last_harvest
marker.
Retrieval is deliberately lexical + indexed (grep + a greppable archive index + provenance pointers), bounded by project scale — not a vector/index server. That keeps the layer no-code, human-auditable, and replayable.
4. The Forward Layer — the VBDI Cognitive Loop¶
The backward layer keeps memory faithful to what happened. The VBDI loop is its forward complement — it keeps delivery faithful to what was intended:
This integrates a generalized Agent Cognitive Framework (a lightweight, loop-based scaffold of six primitives) as the control layer above the memory substrate. The key integration insight is that most of the loop already exists in the memory layer — so only two primitives are genuinely new:
| Primitive | Realized by |
|---|---|
| Current State | continuity.md (read at session start) |
| Vision (new) | memory/vision.md — the north star (core, invariant-verified) |
| Blueprint (new) | typed (blueprint) Open Threads — the Vision↔reality gaps |
| Design | Key Decisions + Architectural Invariants |
| Implementation | code / commits, traced in session logs |
| Feedback | the review ritual + decay + supersession |
Four properties make the loop work without becoming heavyweight:
- The trace is the determinism. Implementation → Design → Blueprint (
serves: <gap>) → Vision (serves: <vision-id>), linked by stableids. A missing or broken link is drift — and it is grep-detectable. The trace and the gates are deterministic; the content (the vision, the design ideas) is the open human–AI partnership. No scoring. - Human gates. Each altitude transition (confirming the Vision, opening/closing a gap) is an Open Thread the human checks off — the agent proposes, the human approves. Not a phase review.
- Bootstrap, never fabricate. Enable and upgrade create a DRAFT Vision with only
the safe current-state context inferred — the target is left for the human, gated by a
(vision-bootstrap)thread. The Vision is the human's to set, like a user preference. - Process-neutral. The loop is the lightweight default; it neither requires nor forbids
a heavier process. A target's owner may layer SDLC / scrum on top — that is their call —
but ceremony and any scoring stay in the target's own space, never in
memory/.
5. The Capability Layer — Cross-Vendor Skills¶
Memory and steering were already shared across vendors. v4.1.0 adds the third shared
leg — skills: reusable capabilities (a name, a when-to-use description, a
procedure, optionally helper scripts) authored once and usable by any agent.
- Neutral source of truth. A committed
agent-skills/<name>/SKILL.md— vendor-neutral markdown — is the single definition; it travels with the repo, likememory/. - Universal runtime. The
AGENTS.md"Skills" section is the baseline: when a task matches a skill'sdescription, the agent reads and follows thatSKILL.md. Because the agent is the runtime, this works on any vendor with no engine. - Thin per-vendor adapters. For runtimes with a native skill/command system, the tool
regenerates pointers for native auto-trigger across six vendor targets —
.claude/skills/,.gemini/commands/,.cursor/rules/,.kiro/,.github/skills/(GitHub Copilot), and.agents/skills/(Google Antigravity). Adapters are gitignored and regenerated (never copies), so the neutral skill never drifts — and regeneration is itself a runnable script (sync-adapters, in bash / Node / Python at output parity) so it doesn't depend on an agent improvising the recipe. - Migration promotes, never flattens. A vendor's existing skills (e.g.
.claude/skills/) are promoted intoagent-skills/(originals preserved underlegacy/), not folded into steering — skills are procedures, not rules.
The layer honors the same invariants as the rest of the tool — vendor-neutral, never-pick-a-winner, additive/non-destructive — and it refined the tool's own "no-code" invariant into "no build step; agent-run": the tool runs no code, while a skill may carry optional, agent-invoked helper scripts.
Built-in skills (v4.10.0 → v4.26.x). The layer ships its own built-ins, installed into
every enabled repo and tool-managed (marked provenance: agent-memory-builtin; fork
under a new name to customize, upstream a genuine fix). There are now seven, and they
sort cleanly along the boundary the layer learned to draw — mechanize the arithmetic, leave
the judgment to the agent:
| Built-in | Role | Side of the boundary |
|---|---|---|
memory-lint |
Deterministic integrity verifier (decay counts, dangling links, over-archival, review-cadence, stale metadata, merge markers — nine checks, Python and Node at parity) | arithmetic — read-only |
refresh-metadata |
Recomputes every fact's tier/uses/last_used from the session ledger and writes the footers back |
arithmetic |
archive-fact |
Performs the archive move safely (append to archive + index, rewrite continuity, all-or-nothing, truncation structurally impossible) | arithmetic |
sync-adapters |
Regenerates the six vendor adapters from each neutral SKILL.md |
arithmetic |
harvest-knowledge |
Distills durable facts from the team's docs into memory/ (additive, dedup-guarded) |
judgment-assisted |
second-opinion |
Distils a snapshot from continuity.md + recent logs for a clean-memory reviewer |
judgment-assisted |
apply-critique |
Runs a returned critique through a bounded, validated, human-gated apply loop | judgment-assisted |
The fresh-context second opinion. second-opinion distills a compact snapshot from
continuity.md + recent session logs (never a parallel state file) and, behind a security
advisory the human must acknowledge, hands it to a reviewer with clean memory — a fresh
session or a different vendor that did not live the work. apply-critique feeds the returned
critique through a bounded, validated, human-gated loop (a few scoped fixes → build/tests
+ memory-lint → an applied-vs-rejected summary). The reviewer is a hypothesis generator,
not an authority: its critique is advisory, gated by deterministic checks and a human — the
lesson the layer learned when a clean-context reviewer once over-archived still-referenced
facts. The advisory extends target-repo-scope-only from what the tool touches to what the
human exports.
Why the arithmetic helpers exist. Real cross-vendor use surfaced a recurring failure
class: a capable agent silently does only part of a multi-step ritual. A truncate-before-read
shortcut wiped an archive; the cadence review never fired on a busy repo; an agent ran the
archive step of a review but skipped the re-tier. Each was fixed the same way — mechanize
the deterministic part (a runnable helper) + make the gap visible (a memory-lint
advisory) + leave the judgment to the agent. So refresh-metadata and archive-fact are
not automation of decisions — never-pick-a-winner still holds — they are safe execution of
the moves the agent has already decided. Like everything else, the built-ins are zero
overhead by default — installed ≠ run.
5b. Operational Reliability — Making the Ritual Happen¶
A protocol that depends on an agent reliably self-triggering fails the moment it lands with an untrained team or a less-agentic vendor. The governing adoption constraint became: any manual user step is a barrier. A line of work hardened the ritual's execution, not just its documentation:
- Vendor-neutral triggers, agent-activated. Enable installs a committed
post-commitgit hook (advisory; auto-stubs a session log when a commit did real work but carried none, and re-syncs adapters when a skill changed) and a CI floor (memory-lint+ a session-log presence check, zero per-user setup). The agent activates the local hook at enable — no manual user step in the common path.no-build-step-agent-runstill holds: git and CI invoke them in the user's environment; the tool itself runs nothing. Honest limit: git cannot auto-run a committed hook on a fresh clone, so CI is the always-on backstop. - One log per session, not per commit. The hook windows by session (the decay model counts session files, so per-commit logs would inflate the count and decay facts too fast) — a downstream report where one session minted ~6 near-identical logs drove this fix.
- First-run init + Windows hardening. A fresh clone has gitignored adapters absent and the
hook unactivated;
.githooks/init.shis a single idempotent command to regenerate adapters and activate the hook, and a.gitattributespins shell scripts to LF so Git for Windows doesn't break them.memory-lintalso catches an empty/malformed install manifest, so a botched stamp fails the lint instead of silently breaking upgrade detection. MERGE.md— conflict resolution without picking a winner. A no-code, human-gated protocol for a git conflict inmemory/: mechanical hunks reconcile deterministically (additive → union; scalar → take-later); a semantic clash preserves both + raises a Contradiction (a supersession only on the human's explicit instruction);memory-lintgates; the human approves the merge commit. Thestatusline was respecified as a short current-state line (not an accreted changelog) to stop it being a merge hotspot.- Safe-write discipline. The most-repeated bug was a truncate-before-read shortcut
(
open(f,"w").write(open(f).read()+…)) that wiped a file before reading it. The rule — append-mode or read-into-a-variable-then-write, and runmemory-lintafter any scripted memory mutation — is now in the shared protocol, and thearchive-facthelper makes truncation structurally impossible for the one move that kept hitting it. - Informed consent at enable. A fresh enable opens with a concise exec summary (what the protocol is, what it writes, what it leaves untouched, that it's committed + shared) and a cancel gate — a blind "yes" is replaced by informed consent before anything is written.
6. Design Principles¶
- No-code, markdown-only. The files are the product; the agent is the runtime.
- Vendor-neutral. One shared memory; thin per-vendor bootstrap pointers route every
agent to a single hub (
AGENTS.md). No lock-in. - Deterministic — no floating-point. Every decision reduces to counting or comparing integers, so results are reproducible across agents and runs.
- Reversible reconciliation. Immutable ledger + mutable projection + replay — the governance/audit story is just git + markdown.
- Lightweight; guide thinking, don't prescribe execution. Loop over process; simplicity over completeness.
- Never pick a winner; never fabricate intent. Contradictions and the Vision are surfaced for a human, not resolved silently.
- Mechanize the arithmetic, not the judgment. The deterministic, mechanical parts of a ritual (re-tiering, the archive move, adapter sync, integrity checks) become runnable helpers so a capable agent can't silently skip them; every act of judgment stays with the agent and the human. The boundary is judgment vs. arithmetic, never automate the decision.
- No manual user step. Once the protocol lands with untrained users, any manual op is an adoption barrier — so triggers, init, and adapter sync are agent-activated, with CI as the zero-config backstop.
- Additive, non-destructive upgrades. Versioned (
VERSION+ per-repo stamp); a repo on an older version upgrades in place via an idempotent ladder.
7. How It Works in Practice¶
- Enable a repo. "AI enable this repo
/path." It opens with an exec summary + cancel gate (informed consent), then detects any existing AI footprint and chooses a mode: Fresh (generate from analysis), Already-Ours (idempotent; upgrade in place if on an older version), or Migrate (fold vendor files in, preserving originals). It generatesmemory/, harvests durable facts from the repo's docs, installs the bootstrap pointers, the six skill adapters, the git hook + CI triggers, and a DRAFT Vision + gate — no manual user step. - A session. The agent reads Current State (
continuity.md) + the Vision, does the work tying it to a Blueprint gap and the Design it realizes, writes a session log with## Memory References, and updatescontinuity.md. - A review. On cadence (or on demand — and a
memory-lintadvisory flags an overdue one so it can't silently lapse), the review replays the ledger, re-tiers facts (refresh-metadata), archives the ones the agent judges faded (archive-fact), applies supersessions, prompts invariant re-verification, and scans for contradictions/altitude drift. The deterministic steps run via helpers; the judgment of what to retire stays with the agent. - A memory smoke test. A short, manual eval — questions a fresh agent should answer from memory alone. A failure is a memory gap to fill, not a test to soften.
8. Evidence & Validation¶
- A real rewrite, no drift. A Node.js→Rust rewrite of a TCP-proxy CLI was delivered against recorded intent (invariants, decisions, and their why were pinned and traceable) with deterministic, faithful results and no drift — the proof point the whole design rests on.
- Dogfooding. The tool builds itself: it carries its own
memory/, its own Vision and Blueprint, and the VBDI layer was designed and shipped using the VBDI loop ("using the thing to design the thing"). - Closing the known gaps. An industry-alignment self-assessment identified the real gaps (supersession, invariant re-checking, write-time contradiction, evaluation, provenance); each shipped as an additive, versioned release.
- Real-world cross-vendor validation. The skills layer (v4.1.0–4.1.1) was exercised
end-to-end by an in-place upgrade of a large, pre-existing project — promoting that
project's vendor skills into the shared
agent-skills/layer and regenerating the per-vendor adapters — confirming the migration path on a real codebase, not just a fixture. - The review loop, dogfooded. v4.10.0's fresh-context second-opinion pair was validated by
running it on its own milestone: a clean-context reviewer (a freshly-spawned agent with no
session memory) critiqued the change and surfaced a real invariant tension the in-session author
had missed — an upgrade silently overwriting a user-customized built-in vs. the additive-upgrades
invariant — which
apply-critiquethen fixed. Using the reviewer to review the reviewer. - Cross-vendor, on a real product repo. The protocol was driven end-to-end on a large
Accenture product repo (
mercury-composable) by GitHub Copilot / Gemini 3.1 Pro — a different vendor and model from the author. It exercised the second-opinion loop across vendors, ran the cadence-advisory-triggered review unprompted, and the over-archival guard caught a premature archive it then reverted. Two convergence signals stand out: a Copilot critique independently named the safe-write hardening that becamearchive-fact, and Copilot independently wrote a metadata-refresh script — converging on the same design that shipped asrefresh-metadata(the built-in is a strict, safer superset: reads the decay policy, preserves all footer fields, clamps at archive-candidate, dual-runtime with tests). - A repeating failure class, structurally closed. Three field incidents shared one shape —
a capable agent partially executes a multi-step ritual (a truncating write; a review that
never fired; a re-tier step skipped). Rather than exhort the agent to try harder, each was
closed structurally: a runnable helper for the arithmetic + a
memory-lintadvisory for the gap. This is the evidence behind the mechanize-the-arithmetic principle (§6).
9. Relationship to the Literature¶
agent-memory is not a new agent paradigm; it is a concrete, file-first, deterministic realization of patterns the field already endorses:
- Simple, composable structures over heavy frameworks (Anthropic's agent guidance) — realized as markdown + a small set of primitives.
- Memory as a write–manage–read loop (autonomous-agent memory surveys) — realized as sessions (write) → review (manage) → continuity (read), with replay.
- Interleaved reasoning and action (ReAct) — the loop's Design/Implementation/Feedback are not disjoint phases.
- Reversible reconciliation and pre-consolidation validation (recent memory-systems work) — realized as the immutable ledger + the write-time contradiction check.
Its differentiators relative to mainstream memory stacks: determinism (no scoring), truth-maintenance (supersession + contradiction + invariant re-verification), vendor-neutral shared memory, no-code/markdown, git-native governance, and the forward VBDI loop with human gates.
10. What Makes It Different¶
| Dimension | Common practice | agent-memory |
|---|---|---|
| Persistence | per-vendor files / vector store | one shared, git-committed markdown layer |
| Forgetting | similarity scores, TTLs | deterministic tiering by counting session files |
| Truth maintenance | overwrite / let stale persist | supersession + contradiction check + invariant re-verify |
| Retrieval | semantic / vector | lexical + indexed + provenance, by design |
| Intent → delivery | unmanaged | VBDI altitude trace, grep-detectable drift |
| Governance | opaque | git history + markdown + human gates |
| Vendor coupling | locked to one tool | neutral; thin pointers to one hub |
| Capabilities / skills | per-vendor skill files, not shared | neutral agent-skills/ + 7 built-ins, regenerated across 6 vendor adapters; authored once, any agent |
| Ritual execution | relies on the agent self-triggering | agent-activated git hook + CI floor + runnable helpers; deterministic arithmetic mechanized, judgment left to the agent; no manual user step |
| Second opinion / review | self-review in the same (polluted) context | fresh-context reviewer (any vendor) + bounded, deterministically-gated apply |
| Process weight | often heavy | lightweight default; SDLC is the target's opt-in |
11. Roadmap¶
Tracked as Blueprint gaps against the Vision:
- Greenfield flow — start from a Vision with no code yet (the substrate now exists).
- Multi-user hardening — first increment shipped (the
MERGE.mdconflict protocol + merge-friendlycontinuity.mdconventions + a merge-marker lint check); continue strengthening conventions for simultaneous contributors on one enabled repo. - Optional SDLC overlay — a scrum-inspired profile a target owner can opt into
(
(sprint)tagging + sprint-boundary review, no points/ceremony). Optional, never core.
12. Conclusion¶
As agents become persistent, memory-driven systems, they need scaffolds that are both operationally useful and light enough to embed in real repositories. agent-memory pairs a deterministic, event-sourced memory substrate (faithful to what happened) with a lightweight cognitive loop (faithful to what was intended), under a vendor-neutral, no-code, human-gated design.
Memory is the deterministic substrate; the loop is the lightweight control layer. Together they turn memory-aware agent work into predictable innovation with human partnership — bold ideas, faithful delivery, the human in the loop at every altitude.
References¶
- Anthropic. Building Effective Agents. 2024. https://www.anthropic.com/research/building-effective-agents
- Du, Pengfei. Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers. 2026. https://arxiv.org/html/2603.07670v1
- Hu, Yuyang et al. Memory in the Age of AI Agents. 2025/2026. https://arxiv.org/abs/2512.13564
- Yao, Shunyu et al. ReAct: Synergizing Reasoning and Acting in Language Models. 2022/2023. https://arxiv.org/abs/2210.03629
- Mem0 Engineering. State of AI Agent Memory 2026: Benchmarks, Architectures & Production Gaps. 2026. https://mem0.ai/blog/state-of-ai-agent-memory-2026
- Agent Cognitive Framework for Memory-Driven AI Systems (the framework integrated here as the forward layer). 2026.
docs/agent-cognitive-framework.md. - Internal design docs:
docs/DESIGN-evolving-memory.md(backward layer),docs/DESIGN-vbdi-lifecycle.md(forward layer),docs/assessments/2026-06-13-industry-alignment.md.