Most AI agents treat the world as a pile of documents. They retrieve text, reason over it, and return an answer. That works fine when knowledge is flat — but a huge amount of real-world data isn’t flat. It’s relational.
Citation networks. Product co-purchase graphs. Social platforms. In these environments, what a node means depends on what it’s connected to. Lexical similarity alone can’t capture that. You need to understand structure.
AgentGL is the first reinforcement learning–driven framework that teaches an LLM agent to navigate graph structure natively — not as a lookup table, but as a living environment to explore. The results are hard to ignore: up to 17.5% improvement in node classification and 28.4% in link prediction over strong baselines.
The problem with existing approaches
There are three established ways to apply LLMs to graph-structured data, and each has a fundamental limitation.
Graph Neural Networks (GNNs) model structural signals well but struggle with rich text semantics. They see the topology but not what the nodes are saying.
GraphLLMs integrate LLMs with graph information via prompted context or instruction tuning. But they extract that context once, at inference time. There’s no adaptive exploration — the model commits to a fixed view of the graph before it starts reasoning.
GraphRAG systems build large text-enriched knowledge graphs from corpora for retrieval. But these reconstructed graphs are expensive to build, and they don’t preserve the native topological correlations in real data. They also optimize for generation quality, not for solving graph-native tasks like classification or link prediction.
The gap: none of these give an agent the ability to dynamically navigate graph structure, accumulate evidence, and refine its search based on what it finds.
How AgentGL works
AgentGL reframes graph learning as an agentic decision-making process. The agent is given a task — classify this node, predict this link — and a suite of graph-native search tools. It then iterates: reason, search, observe, reason again.
The four tools cover the full information space:
- 1-hop and 2-hop neighborhood search — local grounding, prioritizing common neighbors between query targets
- Structure Salience Search — globally important nodes via PageRank scores
- Graph Dense Search — semantic similarity across the graph, bridging disconnected nodes by meaning
The agent decides which tool to use, what query to issue, and when to stop. This is trained end-to-end via reinforcement learning — no step-by-step supervision required.
The two-stage training strategy
A key insight in the paper is that tool use and reasoning efficiency are in tension — and you can’t optimize both at once.
Stage 1: Policy Bootstrapping. The agent learns to use all four tools reliably. A coverage reward encourages exploration of every tool type during training, preventing early collapse to a single default behavior. Without this, agents quickly degenerate to making no searches at all.
Stage 2: Mitigating Search Overuse. Once the agent knows how to search, it learns when not to. Two mechanisms drive this:
- A Retrospective Termination Trigger injects a cognitive pause after each tool call: “Let me review the evidence I have before searching again.” This turns searching from a habit into a deliberate decision.
- Cognitive Density Regularization penalizes thin reasoning between searches, ensuring the agent is actually processing retrieved evidence rather than skipping through it.
The result: compared to Stage 1 alone, the full two-stage training reduces tool calls by ~17.5% while improving accuracy by 2.4%. Less searching, better answers.
What the numbers show
Tested across 7 datasets spanning citation networks, Amazon products, and social graphs — with both in-domain and zero-shot transfer evaluation — AgentGL consistently outperforms all baselines.
A few highlights:
- With a 7B backbone, AgentGL outperforms the best baselines by an average of 12.7% in-domain and 24.4% zero-shot on node classification
- On link prediction, the gains are even larger: 26.3% in-domain and 22.4% zero-shot
- Scaling from a 3B to 7B backbone improves performance further, particularly on zero-shot transfer — suggesting the tool-use policy generalizes better with larger models
The paper also tests two RL algorithms (GRPO and REINFORCE++), finding a consistent tradeoff: GRPO is stronger on node classification, R++ on link prediction. Neither dominates everywhere.
One finding worth flagging: the zero-shot transfer gains are consistently larger than the in-domain gains. This suggests that static context injection (GraphRAG, GraphLLMs) is more brittle to distribution shifts — it overfits to the training graph. AgentGL’s interleaved search-and-reason loop can adapt to graphs it’s never seen.
Why this matters
The broader implication of AgentGL isn’t graph learning specifically — it’s what happens when you give agents the right abstraction for the environment they’re operating in.
Text agents get text-native tools: search, retrieval, summarization. Graph agents need graph-native tools: neighborhood expansion, structural ranking, semantic traversal. The lesson from this paper is that matching the tool set to the data structure isn’t optional — it’s where most of the performance comes from.
The “search-constrained thinking” paradigm is also worth generalizing. The insight that more searching isn’t better searching — that depth of reasoning on retrieved evidence matters more than breadth of retrieval — applies well beyond graph tasks.
Read the paper: arxiv.org/abs/2604.05846