8 min read
On this page

Input Modalities

Keyboard and Mouse

The oldest and most well-studied digital input modalities. Optimized for precise, high-throughput interaction at a desk.

Keyboard

  • QWERTY layout: Designed in 1873, now a de facto standard despite alternatives (Dvorak, Colemak)
  • Expert throughput: 50-100 WPM for typical users, 150+ WPM for professionals
  • Keyboard shortcuts: Critical for power users. Reduce mouse travel and speed up repetitive tasks

Shortcut design principles:

  • Mnemonics: Ctrl+B for Bold, Ctrl+S for Save
  • Consistency: Same shortcut across applications for the same action
  • Discoverability: Show shortcuts in menus and tooltips
  • Modifier hierarchy: Ctrl for commands, Shift for modification, Alt for alternatives

Mouse / Trackpad

  • Direct manipulation: Point, click, drag --- maps to physical action metaphors
  • Precision: Sub-pixel accuracy, suitable for design tools and text editing
  • Right-click context menus: Secondary actions scoped to the target element
  • Scroll: Vertical primary, horizontal secondary. Scroll direction conventions differ by platform

Cursor states communicate affordance:

| Cursor | Meaning | |--------|---------| | Arrow | Default, no special interaction | | Pointer (hand) | Clickable link or button | | I-beam | Text is selectable/editable | | Crosshair | Precision selection | | Grab / Grabbing | Draggable element | | Not-allowed | Action is disabled | | Wait / Progress | System is busy | | Resize arrows | Edge or corner can be dragged to resize |


Touch Gestures

Touch is the dominant input for mobile devices. Interaction is direct (finger on content) rather than indirect (mouse moves cursor).

Standard Touch Gestures

| Gesture | Action | Example | |---------|--------|---------| | Tap | Select, activate | Open an app | | Double tap | Zoom, select word | Zoom into a photo | | Long press | Secondary action, context menu | Select an item in a list | | Swipe | Scroll, navigate, dismiss | Scroll a list, go back | | Pinch / Spread | Zoom out / Zoom in | Map zoom | | Rotate | Rotate object | Rotate a photo | | Drag | Move object | Reorder a list | | Edge swipe | System navigation | iOS swipe from left edge = back |

Touch Design Constraints

  • Fat finger problem: Human fingertip is ~10mm, covering ~57 pixels at 160 DPI. Minimum target: 44x44pt (Apple), 48x48dp (Google).
  • Occlusion: Finger covers the target during touch. Consider placement of tooltips and feedback above/beside the touch point.
  • No hover state: Touch interfaces lack hover. Information revealed on hover must be accessible through tap or visible by default.
  • Accidental touch: Require deliberate gestures for destructive actions. Distinguish tap from scroll start.

Gesture Design Guidelines

  1. Use standard platform gestures --- do not reinvent swipe or pinch
  2. Gestures must be discoverable (not the only way to perform an action)
  3. Provide visual feedback during gesture (element follows finger)
  4. Support gesture cancellation (drag back to original position)
  5. Avoid gestures requiring more than two fingers (motor difficulty, discoverability)

Pen and Stylus

Pen input combines the precision of a mouse with the naturalness of handwriting.

Advantages over touch:

  • Finer precision (1-2mm tip vs 10mm finger)
  • Pressure sensitivity (line weight variation)
  • Palm rejection (rest hand on screen while writing)
  • Hover detection (cursor feedback before contact)
  • Tilt detection (shading in drawing apps)

Applications:

  • Digital note-taking (handwriting recognition, annotation)
  • Illustration and design (pressure-sensitive brushes)
  • Signature capture (legal documents)
  • Precision selection in data-dense interfaces

Design considerations:

  • Support both pen and touch simultaneously
  • Map pen buttons to contextual actions (eraser, right-click)
  • Provide ink smoothing and latency compensation (< 20ms ideal)

Voice Interaction (VUI Design)

Voice user interfaces enable hands-free, eyes-free interaction through speech.

Voice Interaction Model

User speaks utterance
        |
        v
Automatic Speech Recognition (ASR) -> text
        |
        v
Natural Language Understanding (NLU) -> intent + entities
        |
        v
Dialog Management -> decide response
        |
        v
Natural Language Generation (NLG) -> response text
        |
        v
Text-to-Speech (TTS) -> audio output

VUI Design Principles

  1. Set expectations: Tell users what they can say. "You can say 'play music,' 'set a timer,' or 'what's the weather?'"
  2. Confirm understanding: Echo back key information. "Setting a timer for 10 minutes."
  3. Handle errors gracefully: Mishearing is inevitable. Offer reprompts with guidance.
  4. Keep responses brief: Audio is sequential; users cannot scan. 1-2 sentences per response.
  5. Provide escape hatches: "Stop," "Cancel," "Go back" should always work.
  6. Progressive disclosure: Start with a summary, offer details on request.
  7. Persona: Consistent voice character (tone, vocabulary, personality).

VUI Error Handling

Levels of error recovery:

1st failure: Simple reprompt
   "Sorry, I didn't catch that. What city?"

2nd failure: Reprompt with guidance
   "I didn't understand. You can say a city name, like 'San Francisco.'"

3rd failure: Offer alternative modality
   "I'm having trouble understanding. Would you like to type your answer?"

Voice vs. Screen: When to Use Voice

| Voice Works Well | Voice Works Poorly | |-----------------|-------------------| | Hands/eyes busy (driving, cooking) | Browsing or comparing options | | Short, specific queries | Complex data entry | | Device control (lights, music) | Private/sensitive information | | Accessibility (vision, motor) | Noisy environments |


Eye Tracking

Eye tracking measures gaze position, fixation duration, and saccade patterns.

Research Applications

  • Heatmaps: Visualize where users look most frequently
  • Gaze plots: Show the sequence and duration of fixations
  • Areas of interest (AOI): Measure attention on specific regions
  • Time to first fixation: How quickly users notice an element

Interaction Applications

  • Gaze-based selection: Dwell time on a target triggers selection (accessibility)
  • Foveated rendering: Render highest detail only where the user is looking (VR optimization)
  • Attention-aware interfaces: Adapt content based on whether the user is looking

Design Insights from Eye Tracking Research

  • Users follow F-patterns on text-heavy pages
  • Banner areas are systematically ignored (banner blindness)
  • Users look at faces in images, then follow the face's gaze direction
  • Left-side navigation receives more visual attention in LTR cultures

Motion and Gesture

Body and hand movements as input, captured by cameras or sensors.

Technologies

| Technology | Sensing Method | Example | |------------|---------------|---------| | Depth cameras | Structured light / ToF | Microsoft Kinect, Azure Kinect | | Hand tracking | Computer vision | Meta Quest hand tracking, Leap Motion | | IMU sensors | Accelerometer + gyroscope | Wii Remote, phone gestures | | Skeletal tracking | Full body pose estimation | Kinect, MediaPipe Pose |

Gesture Design Considerations

  • Fatigue: "Gorilla arm" --- extended arm gestures cause rapid fatigue. Prefer subtle motions.
  • Social acceptability: Users resist conspicuous gestures in public.
  • Precision: Mid-air gestures are less precise than touch. Use larger targets and tolerance zones.
  • Feedback: Without physical contact, haptic feedback is absent. Visual and audio feedback are essential.
  • Discoverability: Non-obvious gestures need explicit teaching or onboarding.

Brain-Computer Interfaces (BCI)

BCIs translate neural signals into computer commands.

Types

| Type | Method | Invasiveness | Signal Quality | |------|--------|-------------|---------------| | EEG | Scalp electrodes | Non-invasive | Low (noisy) | | ECoG | Cortical surface electrodes | Surgical | Medium | | Intracortical | Implanted electrode arrays | Highly invasive | High | | fNIRS | Near-infrared spectroscopy | Non-invasive | Low-medium |

Current Applications

  • Assistive communication: Patients with ALS or locked-in syndrome can select letters/words
  • Prosthetic control: Neural signals drive robotic limbs
  • Neurofeedback: Training attention or relaxation through real-time brain activity display
  • P300 speller: User focuses on target letter in a flashing grid; EEG detects the P300 response

Limitations

  • Low information transfer rate (~5-25 bits/minute for non-invasive)
  • Extensive calibration required per user
  • Signal noise from muscle movement, electrode drift
  • Ethical considerations around neural data privacy and autonomy

Haptic Feedback

Haptic feedback uses touch sensations (vibration, force, texture) to communicate information.

Types of Haptic Feedback

| Type | Mechanism | Example | |------|-----------|---------| | Vibrotactile | Vibration motors (LRA, ERM) | Phone vibration on keypress | | Force feedback | Motors resisting movement | Gaming steering wheel resistance | | Surface haptics | Electrostatic friction on screen | Texture simulation on touchscreen | | Ultrasonic | Focused ultrasound in air | Mid-air haptic feedback (Ultraleap) |

Design Applications

  • Confirmation: Short vibration pulse on successful action
  • Warning: Distinct vibration pattern for errors or alerts
  • Texture: Simulating material properties on touchscreens
  • Navigation: Directional vibration for wayfinding (wristband buzz patterns)
  • Accessibility: Haptic output for deaf or deaf-blind users

Multimodal Interaction

Combining multiple input and output modalities for richer, more robust interaction.

Input Fusion Strategies

| Strategy | Description | Example | |----------|-------------|---------| | Redundant | Same command via multiple modalities | Tap + voice "delete" | | Complementary | Different modalities provide different parts | Point at map + say "navigate here" | | Sequential | Modalities used in sequence | Wake word (voice) then touch selection | | Simultaneous | Modalities used at the same time | Gesture + speech (Bolt: "put that there") |

Design Principles for Multimodal Systems

  1. Modality equivalence: Critical functions should not depend on a single modality
  2. Graceful degradation: System works when one modality is unavailable
  3. User choice: Let users choose their preferred modality
  4. Mutual disambiguation: If one modality is ambiguous, the other clarifies
  5. Minimize cognitive load: Do not force users to manage multiple modalities simultaneously unless it is natural

CASA Paradigm

Computers Are Social Actors (Nass & Reeves): Users unconsciously apply social rules to computers. Multimodal systems that use voice, gesture, and facial expression are perceived more socially, raising expectations for human-like behavior. Design must manage these expectations to avoid the uncanny valley of interaction.