Multi-Agent Systems

Multi-agent systems (MAS) involve multiple autonomous agents interacting in a shared environment. They can cooperate, compete, or do both.

Agent Architectures

Reactive (Brooks' subsumption): No internal model. Direct sensor-to-action mappings. Fast response. Limited reasoning. Layers of behaviors with priority.

Deliberative (BDI — Beliefs, Desires, Intentions): Maintain an internal model. Plan actions. Beliefs = knowledge about the world. Desires = goals. Intentions = committed plans.

Hybrid: Reactive layer for fast responses + deliberative layer for planning. The dominant architecture in practice.

Communication

KQML / FIPA-ACL

Standardized agent communication languages with performatives (speech acts):

Performative	Meaning
inform	"I believe X is true"
request	"Please perform action A"
query-ref	"What is the value of X?"
propose	"I offer to do A for price P"
accept-proposal	"I accept your offer"
refuse	"I will not do A"

Coordination Mechanisms

Blackboard: Shared data structure. Agents read/write. No direct communication.

Contract Net Protocol: Manager announces tasks → agents bid → manager awards contract to best bidder.

Organizational structures: Hierarchies (military), markets (auctions), teams (joint plans).

Game Theory in MAS

Nash Equilibrium

A strategy profile where no agent can improve by unilaterally changing strategy.

Prisoner's Dilemma: Both cooperating is better for both, but both defecting is the Nash equilibrium.

           B cooperate  B defect
A cooperate   (-1,-1)     (-3, 0)
A defect      ( 0,-3)     (-2,-2)  ← Nash equilibrium

Finding Nash equilibria: Lemke-Howson (two-player), support enumeration. PPAD-complete in general.

Dominant Strategies

Strategy that is best regardless of what others do. When it exists, the rational choice is clear.

Pareto Optimality

An outcome is Pareto optimal if no agent can be made better off without making another worse off. Nash equilibria may not be Pareto optimal (prisoner's dilemma!).

Mechanism Design

Inverse game theory: Design the game (rules, incentives) so that agents' self-interested behavior leads to desirable outcomes.

Auctions

English (ascending): Open, ascending bids. Winner pays highest bid. Dominant strategy: bid up to true value.

Dutch (descending): Price starts high, drops. First bidder wins at current price.

First-price sealed-bid: Highest bid wins, pays bid amount. Bid below true value (shade bid).

Vickrey (second-price sealed-bid): Highest bid wins, pays second-highest bid. Dominant strategy: bid true value (incentive-compatible!).

VCG Mechanism (Vickrey-Clarke-Groves)

Generalization of Vickrey auction to multi-item/combinatorial settings.

Property: Truthful reporting is a dominant strategy. Socially optimal allocation. Each agent pays the externality they impose on others.

Limitation: Not budget-balanced (total payments may not cover costs). Susceptible to collusion.

Aggregate individual preferences into a collective decision.

Arrow's Impossibility Theorem: No voting rule with ≥ 3 candidates satisfies all of: Pareto efficiency, independence of irrelevant alternatives, non-dictatorship. A fundamental impossibility result.

Multi-Agent Reinforcement Learning (MARL)

Multiple agents learning simultaneously in a shared environment.

Challenges: Non-stationarity (other agents are also learning → environment changes). Credit assignment (who contributed to the team reward?). Exploration vs exploitation with multiple learners.

Approaches

Independent learners: Each agent treats others as part of the environment. Simple but ignores agent interactions. May not converge.

Centralized training, decentralized execution (CTDE): Train with global information. Execute with only local observations. QMIX, MAPPO.

Communication learning: Agents learn to communicate (send/receive messages) to coordinate. CommNet, TarMAC.

Self-play: Agent trains against copies of itself. Used by AlphaGo/AlphaZero. Can discover sophisticated strategies.

Applications in CS

Autonomous vehicles: Multi-vehicle coordination, traffic management, intersection protocols.
E-commerce: Automated negotiation, auction platforms, price optimization.
Robotics: Multi-robot coordination (search and rescue, warehouse robots, drone swarms).
Resource allocation: Spectrum allocation, cloud resource management, smart grid.
Social simulation: Epidemiological modeling, traffic simulation, economic modeling.
LLM agents: Tool-using AI agents cooperating on complex tasks. Multi-agent debate for reasoning.