A 0.12% parameter add-on gives AI agents the working memory RAG can't

Researchers from Mind Lab and university collaborators unveiled delta-mem, a parameter-efficient technique that solves a persistent problem in AI agents: forgetting context across multi-turn interactions. The approach compresses historical information into a dynamically updated matrix without modifying the underlying model.

The efficiency gains are striking. Delta-mem adds only 0.12% additional parameters to a backbone model, compared to 76.40% for leading alternatives. This lean architecture cuts computational overhead while delivering superior performance on memory-dependent tasks where agents previously relied on expensive context window expansion or retrieval-augmented generation (RAG).

AI agents typically degrade as conversations lengthen. A coding assistant loses the thread of a debugging session. A data analysis agent re-ingests already-processed datasets, burning through token budgets and adding latency. Teams compensate by expanding context windows or layering RAG systems on top, both approaches that scale poorly and introduce reliability gaps.

Delta-mem bypasses these constraints. The technique works as a bolt-on module, leaving foundation models untouched while creating persistent working memory. The dynamically updated matrix acts as a compact representation of conversation history, letting agents reference prior interactions without re-encoding full contexts.

The research represents a timely answer to a cost problem that's become acute as enterprises scale AI agents internally. Every extra million tokens processed compounds across deployed instances. RAG systems, while popular, introduce their own brittleness through retrieval failures and ranking mistakes. Expanding context windows offers diminishing returns and hits transformer compute limits.

Delta-mem's parameter efficiency makes adoption straightforward. Teams can layer it onto existing models without retraining or infrastructure overhauls. The 0.12% overhead is negligible compared to the latency and cost savings from reduced token consumption and eliminated redundant processing.

The research emerged from Mind Lab alongside academic partners, positioning the work at the intersection of efficiency

A 0.12% parameter add-on gives AI agents the working memory RAG can't

Why prompt debt, retrieval debt, and evaluation debt are quietly reshaping enterprise AI risk

D&B's database of 642 million businesses was built for humans, not AI agents. So they rebuilt it.

Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code

Get Daily StartupWireDaily