Top 10 AI Agent Papers from March 2025

AI Agents are rapidly advancing in intelligence, speed, and autonomy, with cutting-edge research paving the way for their future evolution.

We’ve selected 10 most relevant papers out of total 545 Agent papers released in March on Arxiv that tackle key challenges like governance, collaboration, reasoning, and automation. These papers introduce new frameworks, improve AI’s ability to interact with humans and systems, and explore better ways to ensure accountability and efficiency.

From enhancing AI-driven decision-making to integrating agents with Web Browsing and APIs, this research will shape how future AI agents operate. Lets dive in.

Weekly Research Paper Tracker

Before we dive in to the weekly papers, checkout the weekly Research Paper Tracker where we update all the category wise Top Research Papers of the week including their Titles, Summaries, Links and their importance. Its a great resource to bookmark if you want to stay ahead on the curve of latest research. Access here

1) PLAN-AND-ACT: Improving Planning of Agents for Long-Horizon Tasks

PLAN-AND-ACT is a new framework that improves LLM-based agents' performance on complex, multi-step tasks by explicitly separating planning from execution. It features a PLANNER that creates structured, high-level plans and an EXECUTOR that translates them into actions. A novel synthetic data method trains the planner using annotated trajectories, enabling better generalization. The approach achieves a 54% success rate on the WebArena-Lite benchmark.

Why it Matters:

By enhancing planning capabilities, PLAN-AND-ACT advances the ability of LLM agents to handle real-world, long-horizon tasks—marking a key step toward more capable and autonomous AI systems.

Read Paper Here

2) Why Do Multi-Agent LLM Systems Fail?

This study investigates why Multi-Agent Systems (MAS) show limited performance gains over single-agent models despite rising interest. Analyzing five MAS frameworks across 150+ tasks with expert annotations, the authors identify 14 failure modes grouped into three categories: design issues, agent misalignment, and evaluation flaws. They introduce a robust taxonomy, use LLM-as-a-Judge for scalable assessment, and test interventions, finding that deeper solutions are needed.

Why it Matters:

By mapping out MAS failure patterns, this work provides a foundational guide for improving multi-agent collaboration, paving the way for more effective and reliable AI systems.

Read Paper Here

3) Agents Play Thousands of 3D Video Games

PORTAL is a new framework that enables AI agents to play thousands of 3D video games by turning decision-making into a language modeling task. It uses large language models to generate behavior trees via a domain-specific language, avoiding the complexity of traditional reinforcement learning. With a hybrid rule-based and neural policy structure and dual-feedback loops, PORTAL enables interpretable, adaptable, and generalizable game-playing agents.

Why it Matters:

PORTAL drastically reduces development time while boosting performance and adaptability in game AI, marking a breakthrough in creating scalable, intelligent agents for complex gaming environments.