8 min read
On this page

Input Modalities

Keyboard and Mouse

The oldest and most well-studied digital input modalities. Optimized for precise, high-throughput interaction at a desk.

Keyboard

  • QWERTY layout: Designed in 1873, now a de facto standard despite alternatives (Dvorak, Colemak)
  • Expert throughput: 50-100 WPM for typical users, 150+ WPM for professionals
  • Keyboard shortcuts: Critical for power users. Reduce mouse travel and speed up repetitive tasks

Shortcut design principles:

  • Mnemonics: Ctrl+B for Bold, Ctrl+S for Save
  • Consistency: Same shortcut across applications for the same action
  • Discoverability: Show shortcuts in menus and tooltips
  • Modifier hierarchy: Ctrl for commands, Shift for modification, Alt for alternatives

Mouse / Trackpad

  • Direct manipulation: Point, click, drag --- maps to physical action metaphors
  • Precision: Sub-pixel accuracy, suitable for design tools and text editing
  • Right-click context menus: Secondary actions scoped to the target element
  • Scroll: Vertical primary, horizontal secondary. Scroll direction conventions differ by platform

Cursor states communicate affordance:

Cursor Meaning
Arrow Default, no special interaction
Pointer (hand) Clickable link or button
I-beam Text is selectable/editable
Crosshair Precision selection
Grab / Grabbing Draggable element
Not-allowed Action is disabled
Wait / Progress System is busy
Resize arrows Edge or corner can be dragged to resize

Touch Gestures

Touch is the dominant input for mobile devices. Interaction is direct (finger on content) rather than indirect (mouse moves cursor).

Standard Touch Gestures

Gesture Action Example
Tap Select, activate Open an app
Double tap Zoom, select word Zoom into a photo
Long press Secondary action, context menu Select an item in a list
Swipe Scroll, navigate, dismiss Scroll a list, go back
Pinch / Spread Zoom out / Zoom in Map zoom
Rotate Rotate object Rotate a photo
Drag Move object Reorder a list
Edge swipe System navigation iOS swipe from left edge = back

Touch Design Constraints

  • Fat finger problem: Human fingertip is ~10mm, covering ~57 pixels at 160 DPI. Minimum target: 44x44pt (Apple), 48x48dp (Google).
  • Occlusion: Finger covers the target during touch. Consider placement of tooltips and feedback above/beside the touch point.
  • No hover state: Touch interfaces lack hover. Information revealed on hover must be accessible through tap or visible by default.
  • Accidental touch: Require deliberate gestures for destructive actions. Distinguish tap from scroll start.

Gesture Design Guidelines

  1. Use standard platform gestures --- do not reinvent swipe or pinch
  2. Gestures must be discoverable (not the only way to perform an action)
  3. Provide visual feedback during gesture (element follows finger)
  4. Support gesture cancellation (drag back to original position)
  5. Avoid gestures requiring more than two fingers (motor difficulty, discoverability)

Pen and Stylus

Pen input combines the precision of a mouse with the naturalness of handwriting.

Advantages over touch:

  • Finer precision (1-2mm tip vs 10mm finger)
  • Pressure sensitivity (line weight variation)
  • Palm rejection (rest hand on screen while writing)
  • Hover detection (cursor feedback before contact)
  • Tilt detection (shading in drawing apps)

Applications:

  • Digital note-taking (handwriting recognition, annotation)
  • Illustration and design (pressure-sensitive brushes)
  • Signature capture (legal documents)
  • Precision selection in data-dense interfaces

Design considerations:

  • Support both pen and touch simultaneously
  • Map pen buttons to contextual actions (eraser, right-click)
  • Provide ink smoothing and latency compensation (< 20ms ideal)

Voice Interaction (VUI Design)

Voice user interfaces enable hands-free, eyes-free interaction through speech.

Voice Interaction Model

User speaks utterance
        |
        v
Automatic Speech Recognition (ASR) -> text
        |
        v
Natural Language Understanding (NLU) -> intent + entities
        |
        v
Dialog Management -> decide response
        |
        v
Natural Language Generation (NLG) -> response text
        |
        v
Text-to-Speech (TTS) -> audio output

VUI Design Principles

  1. Set expectations: Tell users what they can say. "You can say 'play music,' 'set a timer,' or 'what's the weather?'"
  2. Confirm understanding: Echo back key information. "Setting a timer for 10 minutes."
  3. Handle errors gracefully: Mishearing is inevitable. Offer reprompts with guidance.
  4. Keep responses brief: Audio is sequential; users cannot scan. 1-2 sentences per response.
  5. Provide escape hatches: "Stop," "Cancel," "Go back" should always work.
  6. Progressive disclosure: Start with a summary, offer details on request.
  7. Persona: Consistent voice character (tone, vocabulary, personality).

VUI Error Handling

Levels of error recovery:

1st failure: Simple reprompt
   "Sorry, I didn't catch that. What city?"

2nd failure: Reprompt with guidance
   "I didn't understand. You can say a city name, like 'San Francisco.'"

3rd failure: Offer alternative modality
   "I'm having trouble understanding. Would you like to type your answer?"

Voice vs. Screen: When to Use Voice

Voice Works Well Voice Works Poorly
Hands/eyes busy (driving, cooking) Browsing or comparing options
Short, specific queries Complex data entry
Device control (lights, music) Private/sensitive information
Accessibility (vision, motor) Noisy environments

Eye Tracking

Eye tracking measures gaze position, fixation duration, and saccade patterns.

Research Applications

  • Heatmaps: Visualize where users look most frequently
  • Gaze plots: Show the sequence and duration of fixations
  • Areas of interest (AOI): Measure attention on specific regions
  • Time to first fixation: How quickly users notice an element

Interaction Applications

  • Gaze-based selection: Dwell time on a target triggers selection (accessibility)
  • Foveated rendering: Render highest detail only where the user is looking (VR optimization)
  • Attention-aware interfaces: Adapt content based on whether the user is looking

Design Insights from Eye Tracking Research

  • Users follow F-patterns on text-heavy pages
  • Banner areas are systematically ignored (banner blindness)
  • Users look at faces in images, then follow the face's gaze direction
  • Left-side navigation receives more visual attention in LTR cultures

Motion and Gesture

Body and hand movements as input, captured by cameras or sensors.

Technologies

Technology Sensing Method Example
Depth cameras Structured light / ToF Microsoft Kinect, Azure Kinect
Hand tracking Computer vision Meta Quest hand tracking, Leap Motion
IMU sensors Accelerometer + gyroscope Wii Remote, phone gestures
Skeletal tracking Full body pose estimation Kinect, MediaPipe Pose

Gesture Design Considerations

  • Fatigue: "Gorilla arm" --- extended arm gestures cause rapid fatigue. Prefer subtle motions.
  • Social acceptability: Users resist conspicuous gestures in public.
  • Precision: Mid-air gestures are less precise than touch. Use larger targets and tolerance zones.
  • Feedback: Without physical contact, haptic feedback is absent. Visual and audio feedback are essential.
  • Discoverability: Non-obvious gestures need explicit teaching or onboarding.

Brain-Computer Interfaces (BCI)

BCIs translate neural signals into computer commands.

Types

Type Method Invasiveness Signal Quality
EEG Scalp electrodes Non-invasive Low (noisy)
ECoG Cortical surface electrodes Surgical Medium
Intracortical Implanted electrode arrays Highly invasive High
fNIRS Near-infrared spectroscopy Non-invasive Low-medium

Current Applications

  • Assistive communication: Patients with ALS or locked-in syndrome can select letters/words
  • Prosthetic control: Neural signals drive robotic limbs
  • Neurofeedback: Training attention or relaxation through real-time brain activity display
  • P300 speller: User focuses on target letter in a flashing grid; EEG detects the P300 response

Limitations

  • Low information transfer rate (~5-25 bits/minute for non-invasive)
  • Extensive calibration required per user
  • Signal noise from muscle movement, electrode drift
  • Ethical considerations around neural data privacy and autonomy

Haptic Feedback

Haptic feedback uses touch sensations (vibration, force, texture) to communicate information.

Types of Haptic Feedback

Type Mechanism Example
Vibrotactile Vibration motors (LRA, ERM) Phone vibration on keypress
Force feedback Motors resisting movement Gaming steering wheel resistance
Surface haptics Electrostatic friction on screen Texture simulation on touchscreen
Ultrasonic Focused ultrasound in air Mid-air haptic feedback (Ultraleap)

Design Applications

  • Confirmation: Short vibration pulse on successful action
  • Warning: Distinct vibration pattern for errors or alerts
  • Texture: Simulating material properties on touchscreens
  • Navigation: Directional vibration for wayfinding (wristband buzz patterns)
  • Accessibility: Haptic output for deaf or deaf-blind users

Multimodal Interaction

Combining multiple input and output modalities for richer, more robust interaction.

Input Fusion Strategies

Strategy Description Example
Redundant Same command via multiple modalities Tap + voice "delete"
Complementary Different modalities provide different parts Point at map + say "navigate here"
Sequential Modalities used in sequence Wake word (voice) then touch selection
Simultaneous Modalities used at the same time Gesture + speech (Bolt: "put that there")

Design Principles for Multimodal Systems

  1. Modality equivalence: Critical functions should not depend on a single modality
  2. Graceful degradation: System works when one modality is unavailable
  3. User choice: Let users choose their preferred modality
  4. Mutual disambiguation: If one modality is ambiguous, the other clarifies
  5. Minimize cognitive load: Do not force users to manage multiple modalities simultaneously unless it is natural

CASA Paradigm

Computers Are Social Actors (Nass & Reeves): Users unconsciously apply social rules to computers. Multimodal systems that use voice, gesture, and facial expression are perceived more socially, raising expectations for human-like behavior. Design must manage these expectations to avoid the uncanny valley of interaction.