5 min read
On this page

Robotics and Perception

Robotics integrates AI, control theory, and mechanical engineering to build agents that interact with the physical world.

Robot Architectures

Sense-Plan-Act (Classical)

Sensors → [Perception] → [Planning] → [Control] → Actuators
              World Model    Plan      Motor Commands

Deliberative: Build a complete world model, plan a complete action sequence, execute. Works well for structured environments. Too slow for dynamic environments.

Subsumption Architecture (Brooks)

Layered reactive behaviors with priority (higher layers subsume lower).

Layer 3: Explore (wander when idle)
Layer 2: Avoid obstacles (override explore)
Layer 1: Emergency stop (override all)

No world model. Direct sensor-to-action mappings. Fast, robust, handles dynamic environments. Limited planning ability.

Hybrid Architecture

Three-layer: Reactive (fast, reflexive) + Executive (sequences behaviors) + Deliberative (plans, learns).

Most modern robots use hybrid architectures.

Kinematics

Forward Kinematics

Given joint angles θ₁, θ₂, ..., θₙ → compute end-effector position (x, y, z) and orientation.

DH parameters (Denavit-Hartenberg): Standard parameterization of robot links. Each joint adds a 4×4 transformation matrix. Forward kinematics = product of all link matrices.

Inverse Kinematics

Given desired end-effector pose → compute joint angles.

Analytical solutions: For simple robots (6-DOF with specific geometries). Closed-form formulas.

Numerical solutions: Jacobian-based. Iterate: Δθ = J⁻¹Δx (or J† for redundant robots).

CCD (Cyclic Coordinate Descent): Iteratively adjust one joint at a time to reduce error. Simple, fast, popular in animation.

FABRIK (Forward and Backward Reaching Inverse Kinematics): Iteratively adjust joint positions. Fast convergence, intuitive.

Path Planning

Configuration Space (C-space)

The space of all possible robot configurations (joint angles or positions). Obstacles in the workspace map to forbidden regions in C-space.

Problem: Find a path in C-space from start configuration to goal configuration, avoiding obstacle regions.

Grid/Cell Decomposition

Discretize C-space into a grid. Use graph search (A*, Dijkstra) on the grid.

Trapezoidal decomposition: Decompose free space into trapezoids. Build a connectivity graph. Search for a path.

Visibility Graph

Connect the start, goal, and all obstacle vertices with lines that don't intersect obstacles. Search for shortest path in this graph.

Probabilistic Roadmap (PRM)

  1. Sample random configurations in free space.
  2. Connect nearby configurations with edges (if the path between them is collision-free).
  3. Search the resulting graph for a path.

Multi-query: Build the roadmap once, answer many queries. Good for static environments.

Rapidly-Exploring Random Tree (RRT)

  1. Start with a tree rooted at the start configuration.
  2. Sample a random configuration q_rand.
  3. Find the nearest node q_near in the tree.
  4. Extend from q_near toward q_rand by a step size → q_new.
  5. If q_new is collision-free, add it to the tree.
  6. Repeat until the tree reaches the goal.

RRT*: Rewires the tree to find shorter paths. Converges to optimal path (given enough time).

RRT-Connect: Grow two trees (one from start, one from goal). Try to connect them. Faster for narrow passages.

Properties: Probabilistically complete (will find a path if one exists, given enough time). Works in high-dimensional C-spaces. No discretization. The dominant algorithm for motion planning.

Potential Fields

Define an attractive potential pulling toward the goal and repulsive potentials pushing away from obstacles. Follow the negative gradient.

Advantage: Simple, real-time computation. Disadvantage: Local minima (robot gets stuck). Combine with global planners.

SLAM (Simultaneous Localization and Mapping)

Build a map of an unknown environment while localizing the robot within it.

Chicken-and-egg: Need a map to localize. Need localization to build a map. Solve both simultaneously.

Approaches

EKF-SLAM: Extended Kalman Filter. State = robot pose + all landmark positions. Quadratic in the number of landmarks (covariance matrix). Limited scalability.

Particle Filter SLAM (FastSLAM): Particles represent possible robot trajectories. Each particle has its own map. Scalable to large environments.

Graph-based SLAM: Poses as nodes, constraints as edges. Optimize the graph (least squares) to find consistent poses and map. Used by ORB-SLAM, Cartographer, LOAM.

Visual SLAM: Use camera images for localization. Feature-based (ORB-SLAM) or direct (LSD-SLAM, DSO). Monocular, stereo, or RGB-D.

LiDAR SLAM: Use laser scans. More robust than visual in textureless environments. Point cloud registration (ICP). LOAM, LeGO-LOAM.

Computer Vision Basics

Feature Detection

Corners: Harris corner detector, FAST.

Blobs: SIFT (scale-invariant), SURF, ORB (fast binary descriptor).

Feature matching: Compare descriptors between images. FLANN for fast approximate matching. RANSAC for robust estimation (reject outliers).

Object Recognition

Classical: HOG + SVM, Haar cascades (Viola-Jones face detection).

Deep learning: CNNs (ResNet, EfficientNet) for classification. YOLO, SSD for real-time detection. Mask R-CNN for instance segmentation.

Image Segmentation

Classical: Thresholding, watershed, graph cuts.

Deep learning: U-Net, DeepLab, Segment Anything Model (SAM).

Sensor Fusion

Combine data from multiple sensors (cameras, LiDAR, IMU, GPS) for more robust perception.

Kalman filter: Optimal for linear Gaussian systems. Fuse IMU + GPS.

Particle filter: For nonlinear, non-Gaussian systems. More general but computationally expensive.

Factor graphs: Unified framework. Each sensor measurement adds a factor. Optimize the joint distribution over all variables.

Control Theory Basics

PID Control

u(t) = Kp·e(t) + Ki·∫e(t)dt + Kd·de/dt

P (Proportional): Respond proportionally to error. Faster response but may overshoot.

I (Integral): Eliminate steady-state error. Accumulates past errors.

D (Derivative): Dampen oscillations. Responds to rate of change.

Tuning: Ziegler-Nichols, manual tuning, auto-tuning. The most widely used controller in industry.

Applications in CS

  • Autonomous vehicles: Perception (cameras, LiDAR) + planning (motion planning, route planning) + control (steering, braking).
  • Warehouse robots: Amazon's Kiva robots. Path planning in structured environments with many agents.
  • Surgical robots: Da Vinci system. Precision control with force feedback.
  • Drones: SLAM for autonomous navigation. Path planning for delivery, inspection, mapping.
  • AR/VR: Visual-inertial SLAM for headset tracking. Real-time pose estimation.