Robotics and Perception
Robotics integrates AI, control theory, and mechanical engineering to build agents that interact with the physical world.
Robot Architectures
Sense-Plan-Act (Classical)
Sensors → [Perception] → [Planning] → [Control] → Actuators
World Model Plan Motor Commands
Deliberative: Build a complete world model, plan a complete action sequence, execute. Works well for structured environments. Too slow for dynamic environments.
Subsumption Architecture (Brooks)
Layered reactive behaviors with priority (higher layers subsume lower).
Layer 3: Explore (wander when idle)
Layer 2: Avoid obstacles (override explore)
Layer 1: Emergency stop (override all)
No world model. Direct sensor-to-action mappings. Fast, robust, handles dynamic environments. Limited planning ability.
Hybrid Architecture
Three-layer: Reactive (fast, reflexive) + Executive (sequences behaviors) + Deliberative (plans, learns).
Most modern robots use hybrid architectures.
Kinematics
Forward Kinematics
Given joint angles θ₁, θ₂, ..., θₙ → compute end-effector position (x, y, z) and orientation.
DH parameters (Denavit-Hartenberg): Standard parameterization of robot links. Each joint adds a 4×4 transformation matrix. Forward kinematics = product of all link matrices.
Inverse Kinematics
Given desired end-effector pose → compute joint angles.
Analytical solutions: For simple robots (6-DOF with specific geometries). Closed-form formulas.
Numerical solutions: Jacobian-based. Iterate: Δθ = J⁻¹Δx (or J† for redundant robots).
CCD (Cyclic Coordinate Descent): Iteratively adjust one joint at a time to reduce error. Simple, fast, popular in animation.
FABRIK (Forward and Backward Reaching Inverse Kinematics): Iteratively adjust joint positions. Fast convergence, intuitive.
Path Planning
Configuration Space (C-space)
The space of all possible robot configurations (joint angles or positions). Obstacles in the workspace map to forbidden regions in C-space.
Problem: Find a path in C-space from start configuration to goal configuration, avoiding obstacle regions.
Grid/Cell Decomposition
Discretize C-space into a grid. Use graph search (A*, Dijkstra) on the grid.
Trapezoidal decomposition: Decompose free space into trapezoids. Build a connectivity graph. Search for a path.
Visibility Graph
Connect the start, goal, and all obstacle vertices with lines that don't intersect obstacles. Search for shortest path in this graph.
Probabilistic Roadmap (PRM)
- Sample random configurations in free space.
- Connect nearby configurations with edges (if the path between them is collision-free).
- Search the resulting graph for a path.
Multi-query: Build the roadmap once, answer many queries. Good for static environments.
Rapidly-Exploring Random Tree (RRT)
- Start with a tree rooted at the start configuration.
- Sample a random configuration q_rand.
- Find the nearest node q_near in the tree.
- Extend from q_near toward q_rand by a step size → q_new.
- If q_new is collision-free, add it to the tree.
- Repeat until the tree reaches the goal.
RRT*: Rewires the tree to find shorter paths. Converges to optimal path (given enough time).
RRT-Connect: Grow two trees (one from start, one from goal). Try to connect them. Faster for narrow passages.
Properties: Probabilistically complete (will find a path if one exists, given enough time). Works in high-dimensional C-spaces. No discretization. The dominant algorithm for motion planning.
Potential Fields
Define an attractive potential pulling toward the goal and repulsive potentials pushing away from obstacles. Follow the negative gradient.
Advantage: Simple, real-time computation. Disadvantage: Local minima (robot gets stuck). Combine with global planners.
SLAM (Simultaneous Localization and Mapping)
Build a map of an unknown environment while localizing the robot within it.
Chicken-and-egg: Need a map to localize. Need localization to build a map. Solve both simultaneously.
Approaches
EKF-SLAM: Extended Kalman Filter. State = robot pose + all landmark positions. Quadratic in the number of landmarks (covariance matrix). Limited scalability.
Particle Filter SLAM (FastSLAM): Particles represent possible robot trajectories. Each particle has its own map. Scalable to large environments.
Graph-based SLAM: Poses as nodes, constraints as edges. Optimize the graph (least squares) to find consistent poses and map. Used by ORB-SLAM, Cartographer, LOAM.
Visual SLAM: Use camera images for localization. Feature-based (ORB-SLAM) or direct (LSD-SLAM, DSO). Monocular, stereo, or RGB-D.
LiDAR SLAM: Use laser scans. More robust than visual in textureless environments. Point cloud registration (ICP). LOAM, LeGO-LOAM.
Computer Vision Basics
Feature Detection
Corners: Harris corner detector, FAST.
Blobs: SIFT (scale-invariant), SURF, ORB (fast binary descriptor).
Feature matching: Compare descriptors between images. FLANN for fast approximate matching. RANSAC for robust estimation (reject outliers).
Object Recognition
Classical: HOG + SVM, Haar cascades (Viola-Jones face detection).
Deep learning: CNNs (ResNet, EfficientNet) for classification. YOLO, SSD for real-time detection. Mask R-CNN for instance segmentation.
Image Segmentation
Classical: Thresholding, watershed, graph cuts.
Deep learning: U-Net, DeepLab, Segment Anything Model (SAM).
Sensor Fusion
Combine data from multiple sensors (cameras, LiDAR, IMU, GPS) for more robust perception.
Kalman filter: Optimal for linear Gaussian systems. Fuse IMU + GPS.
Particle filter: For nonlinear, non-Gaussian systems. More general but computationally expensive.
Factor graphs: Unified framework. Each sensor measurement adds a factor. Optimize the joint distribution over all variables.
Control Theory Basics
PID Control
u(t) = Kp·e(t) + Ki·∫e(t)dt + Kd·de/dt
P (Proportional): Respond proportionally to error. Faster response but may overshoot.
I (Integral): Eliminate steady-state error. Accumulates past errors.
D (Derivative): Dampen oscillations. Responds to rate of change.
Tuning: Ziegler-Nichols, manual tuning, auto-tuning. The most widely used controller in industry.
Applications in CS
- Autonomous vehicles: Perception (cameras, LiDAR) + planning (motion planning, route planning) + control (steering, braking).
- Warehouse robots: Amazon's Kiva robots. Path planning in structured environments with many agents.
- Surgical robots: Da Vinci system. Precision control with force feedback.
- Drones: SLAM for autonomous navigation. Path planning for delivery, inspection, mapping.
- AR/VR: Visual-inertial SLAM for headset tracking. Real-time pose estimation.