3 min read
On this page

Adversarial Search

Adversarial search handles competitive environments where agents have conflicting goals — primarily two-player zero-sum games.

MCTS algorithm phases

Minimax Algorithm

For two-player zero-sum games with perfect information. MAX tries to maximize the score. MIN tries to minimize it.

PROCEDURE MINIMAX(state, depth, maximizing) → integer
    IF depth = 0 OR IS_TERMINAL(state) THEN
        RETURN EVALUATE(state)   // heuristic evaluation

    IF maximizing THEN
        best ← -INFINITY
        FOR EACH move IN LEGAL_MOVES(state) DO
            child ← APPLY_MOVE(state, move)
            best ← MAX(best, MINIMAX(child, depth - 1, FALSE))
        RETURN best
    ELSE
        best ← +INFINITY
        FOR EACH move IN LEGAL_MOVES(state) DO
            child ← APPLY_MOVE(state, move)
            best ← MIN(best, MINIMAX(child, depth - 1, TRUE))
        RETURN best

Complete: Yes (for finite games). Optimal: Yes (against an optimal opponent). Time: O(b^m). Space: O(bm) (DFS).

Alpha-Beta Pruning

Prune branches that can't affect the minimax decision. Maintains two values:

  • α: Best (highest) value MAX can guarantee along the path so far.
  • β: Best (lowest) value MIN can guarantee along the path so far.

Prune when α ≥ β — the current branch can't produce a better result than an already-found alternative.

PROCEDURE ALPHA_BETA(state, depth, alpha, beta, maximizing) → integer
    IF depth = 0 OR IS_TERMINAL(state) THEN
        RETURN EVALUATE(state)

    IF maximizing THEN
        value ← -INFINITY
        FOR EACH move IN LEGAL_MOVES(state) DO
            child ← APPLY_MOVE(state, move)
            value ← MAX(value, ALPHA_BETA(child, depth - 1, alpha, beta, FALSE))
            alpha ← MAX(alpha, value)
            IF alpha ≥ beta THEN BREAK   // beta-cutoff
        RETURN value
    ELSE
        value ← +INFINITY
        FOR EACH move IN LEGAL_MOVES(state) DO
            child ← APPLY_MOVE(state, move)
            value ← MIN(value, ALPHA_BETA(child, depth - 1, alpha, beta, TRUE))
            beta ← MIN(beta, value)
            IF alpha ≥ beta THEN BREAK   // alpha-cutoff
        RETURN value

Effectiveness: With perfect move ordering, reduces effective branching factor from b to √b. Time: O(b^(m/2)) — doubles the searchable depth! Random ordering: O(b^(3m/4)).

Move ordering is critical: try the best moves first. Techniques: iterative deepening (use previous depth's best move), killer heuristic (moves that caused cutoffs in sibling nodes), history heuristic.

Transposition Tables

Cache evaluated positions in a hash table. If the same position is reached via different move sequences, reuse the stored result.

Zobrist hashing: XOR-based incremental hash. Each (piece, square) pair has a random 64-bit number. Hash = XOR of all occupied squares' values. Update: XOR out old position, XOR in new position — O(1) incremental update.

Iterative Deepening Minimax

Run alpha-beta with increasing depth limits: 1, 2, 3, .... Use the best move from each depth as the first move to try at the next depth (improves move ordering).

Time control: Stop when time runs out. Return the best move from the deepest completed search.

Expectiminimax

For games with chance nodes (dice rolls, card draws).

MAX nodes: maximize over children
MIN nodes: minimize over children
CHANCE nodes: weighted average over children (by probability)

Time: O(b^m × n^m) where n = number of outcomes per chance node. Much more expensive than deterministic minimax.

Monte Carlo Tree Search (MCTS)

Build the game tree selectively using random simulations. No evaluation function needed.

Algorithm (UCT)

1. SELECTION: Start at root. Descend by choosing child that maximizes
   UCB1 = win_rate + C × √(ln(parent_visits) / child_visits)
   Until reaching an unexpanded node.

2. EXPANSION: Add one new child node.

3. SIMULATION (rollout): Play a random game from the new node to completion.

4. BACKPROPAGATION: Update win counts and visit counts along the path back to root.

Repeat many times. Choose the root's most-visited child as the best move.

UCB1 formula balances:

  • Exploitation: win_rate (choose moves that win often)
  • Exploration: C × √(ln(N)/nᵢ) (choose less-explored moves)

C ≈ √2 (theoretically optimal). Tuned per application.

MCTS Properties

  • Anytime: Can return a move at any time. More computation → better moves.
  • No evaluation function needed (learns from simulations).
  • Handles large branching factors (doesn't explore uniformly — focuses on promising branches).
  • Handles stochastic games naturally.

AlphaGo / AlphaZero

AlphaGo (2016): MCTS + deep neural networks.

  • Policy network: Predicts good moves (guides selection). Trained from human games + self-play.
  • Value network: Evaluates board positions (replaces random rollouts). Trained from self-play outcomes.

AlphaZero (2017): Tabula rasa — learns entirely from self-play, no human data. Uses a single neural network for both policy and value. Defeated the best chess, shogi, and Go programs.

MuZero (2019): Learns a model of the environment (not given game rules). Plans in a learned latent space. Applies to Atari games as well as board games.

Games of Imperfect Information

Poker and Bluffing

Information sets: Group game states that are indistinguishable to the acting player (can't see opponent's cards).

Nash equilibrium: A strategy profile where no player can improve by changing their strategy alone. For two-player zero-sum games, the minimax value = Nash equilibrium value.

Counterfactual Regret Minimization (CFR): Iterative algorithm that converges to Nash equilibrium. Used by Libratus and Pluribus (superhuman poker AIs).

Pluribus (2019): Beat top human professionals in 6-player no-limit Texas Hold'em. Used modified CFR + real-time search.

Applications in CS

  • Game AI: Chess engines (Stockfish: alpha-beta + NNUE), Go (AlphaGo/KataGo: MCTS + DNN), poker (Pluribus: CFR).
  • Decision making: Multi-agent systems, negotiation, auction strategy.
  • Cybersecurity: Attacker-defender models as adversarial games.
  • Economics: Mechanism design, market simulations.
  • Verified AI: Adversarial robustness testing — find worst-case inputs.