← Writing

Why AI Agents Forget — And How MIA Fixes It

Why AI Agents Forget — And How MIA Fixes It

There’s a quiet problem at the heart of modern AI agents: they don’t remember how they think.

Ask an agent to research something complex today, and it’ll do a reasonable job. Ask it the same type of question tomorrow, and it starts from scratch — same mistakes, same inefficiencies, no accumulated wisdom. It’s like hiring a researcher who wakes up every morning with amnesia.

A new paper out of East China Normal University and the Shanghai Artificial Intelligence Laboratory proposes a fix: Memory Intelligence Agent (MIA).


The Problem With How Agents Use Memory Today

Current deep research agents — the kind that combine LLM reasoning with web search and external tools — have a memory problem that’s actually three problems in one:

Context bloat. Most systems just dump everything into a long context window. The longer it gets, the more attention gets diluted, and the more the model loses the thread.

The wrong kind of memory. Existing systems store what was found — facts, documents, retrieved chunks. But what agents actually need is how to find things — search strategies, failure modes, reasoning paths. The how is what makes future research better.

Static knowledge. Once a system is trained, it doesn’t evolve. It can’t learn from its own experience during deployment, without interrupting the whole pipeline to retrain.

These aren’t edge cases. They’re fundamental limitations that cap how good any research agent can get.


The MIA Architecture: Manager, Planner, Executor

MIA takes a different architectural approach, splitting the cognitive work across three specialized components:

Memory Manager — a non-parametric system that compresses and stores historical search trajectories. Not the raw results, but structured workflows: what the agent searched, in what order, what failed, what worked. Images get compressed to captions. Verbose reasoning traces get abstracted into numbered steps. Storage stays lean.

Planner — a trainable LLM that reads the Memory Manager’s compressed experiences and generates a step-by-step search plan for each new question. Crucially, it retrieves both successful trajectories (positive examples) and failed ones (negative constraints) — giving it a richer picture than just “here’s what worked before.”

Executor — another trainable LLM that takes the Planner’s instructions and actually does the research: searches, tool calls, reasoning over results. It operates in a ReAct loop and reports back to the Planner when it hits a dead end.

The Planner and Executor aren’t just running in sequence — they’re in dialogue. If the Executor gets stuck, the Planner can trigger a Reflect-Replan, generating a revised strategy on the fly.


Two Types of Memory Working Together

The real insight in this paper is the distinction between two kinds of memory that work in tandem:

Non-parametric memory — explicit, stored workflows in the Memory Manager. These are like field notes: concrete examples of how past questions were researched, kept in a retrievable buffer. When a new question comes in, similar experiences are retrieved and handed to the Planner as few-shot context.

Parametric memory — knowledge baked into the Planner’s weights through continuous training. This is the slow-burn version: over time, the Planner internalizes patterns so deeply that it doesn’t even need to retrieve them — they become instinct.

MIA establishes a bidirectional loop between these two. Non-parametric memory feeds the Planner during inference; the Planner’s successful trajectories get compressed back into non-parametric memory; and periodically, those trajectories are used to update the Planner’s weights. After each round of weight updates, the corresponding memory units are cleared, preventing storage explosion while retaining the knowledge in parametric form.


Learning While Running

Perhaps the most striking feature of MIA is its test-time learning mechanism. The Planner continues updating its parameters during deployment — not in a separate fine-tuning run, but simultaneously with inference, without interrupting the research pipeline.

For each batch of questions, the Planner generates multiple candidate plans. The Executor runs each one. The ones that succeed get compressed into non-parametric memory; the ones that fail get sampled as negative examples. Rewards are calculated, advantages are computed, and the Planner’s weights are updated via GRPO — all while the system is actively answering questions.

This creates a positive feedback loop: better reasoning → better training examples → better reasoning.


Unsupervised, Because the Real World Has No Answer Key

Most agent training assumes you have ground truth labels. MIA addresses the case where you don’t — which is most of the real world.

When no labels are available, MIA uses a three-reviewer framework that mimics scientific peer review: one reviewer checks logical consistency, one checks information sourcing and credibility, one checks result validity and completeness. All three are run by frozen LLMs. The majority judgment becomes the training signal.

Under these unsupervised conditions, MIA achieves performance comparable to its supervised counterpart — and continues improving across multiple training iterations.


The Numbers

  • +9% on LiveVQA, +6% on HotpotQA when wrapping GPT-5.4
  • +31% average across seven datasets with a 7B Executor
  • A 7B model with MIA outperforms a 32B model by 18%
  • +5% average over previous SOTA memory baselines across all seven benchmarks
  • Consistent improvement across training iterations in unsupervised settings

Why This Matters

The dominant paradigm in AI agent development right now is scale: bigger models, larger context windows, more retrieval. MIA suggests a different path — not more memory, but smarter memory. Not just storing what was found, but learning how to find things, and continuously getting better at it.

If agents are going to operate autonomously over extended periods — handling research, analysis, planning — they need to accumulate expertise the way humans do: through experience, reflection, and the slow internalization of what works.

MIA is an early, rigorous demonstration that this is achievable.


Paper: Memory Intelligence Agent · Qiao et al., 2026
Code: github.com/ECNU-SII/MIA