Input Modalities
Keyboard and Mouse
The oldest and most well-studied digital input modalities. Optimized for precise, high-throughput interaction at a desk.
Keyboard
- QWERTY layout: Designed in 1873, now a de facto standard despite alternatives (Dvorak, Colemak)
- Expert throughput: 50-100 WPM for typical users, 150+ WPM for professionals
- Keyboard shortcuts: Critical for power users. Reduce mouse travel and speed up repetitive tasks
Shortcut design principles:
- Mnemonics: Ctrl+B for Bold, Ctrl+S for Save
- Consistency: Same shortcut across applications for the same action
- Discoverability: Show shortcuts in menus and tooltips
- Modifier hierarchy: Ctrl for commands, Shift for modification, Alt for alternatives
Mouse / Trackpad
- Direct manipulation: Point, click, drag --- maps to physical action metaphors
- Precision: Sub-pixel accuracy, suitable for design tools and text editing
- Right-click context menus: Secondary actions scoped to the target element
- Scroll: Vertical primary, horizontal secondary. Scroll direction conventions differ by platform
Cursor states communicate affordance:
| Cursor | Meaning | |--------|---------| | Arrow | Default, no special interaction | | Pointer (hand) | Clickable link or button | | I-beam | Text is selectable/editable | | Crosshair | Precision selection | | Grab / Grabbing | Draggable element | | Not-allowed | Action is disabled | | Wait / Progress | System is busy | | Resize arrows | Edge or corner can be dragged to resize |
Touch Gestures
Touch is the dominant input for mobile devices. Interaction is direct (finger on content) rather than indirect (mouse moves cursor).
Standard Touch Gestures
| Gesture | Action | Example | |---------|--------|---------| | Tap | Select, activate | Open an app | | Double tap | Zoom, select word | Zoom into a photo | | Long press | Secondary action, context menu | Select an item in a list | | Swipe | Scroll, navigate, dismiss | Scroll a list, go back | | Pinch / Spread | Zoom out / Zoom in | Map zoom | | Rotate | Rotate object | Rotate a photo | | Drag | Move object | Reorder a list | | Edge swipe | System navigation | iOS swipe from left edge = back |
Touch Design Constraints
- Fat finger problem: Human fingertip is ~10mm, covering ~57 pixels at 160 DPI. Minimum target: 44x44pt (Apple), 48x48dp (Google).
- Occlusion: Finger covers the target during touch. Consider placement of tooltips and feedback above/beside the touch point.
- No hover state: Touch interfaces lack hover. Information revealed on hover must be accessible through tap or visible by default.
- Accidental touch: Require deliberate gestures for destructive actions. Distinguish tap from scroll start.
Gesture Design Guidelines
- Use standard platform gestures --- do not reinvent swipe or pinch
- Gestures must be discoverable (not the only way to perform an action)
- Provide visual feedback during gesture (element follows finger)
- Support gesture cancellation (drag back to original position)
- Avoid gestures requiring more than two fingers (motor difficulty, discoverability)
Pen and Stylus
Pen input combines the precision of a mouse with the naturalness of handwriting.
Advantages over touch:
- Finer precision (1-2mm tip vs 10mm finger)
- Pressure sensitivity (line weight variation)
- Palm rejection (rest hand on screen while writing)
- Hover detection (cursor feedback before contact)
- Tilt detection (shading in drawing apps)
Applications:
- Digital note-taking (handwriting recognition, annotation)
- Illustration and design (pressure-sensitive brushes)
- Signature capture (legal documents)
- Precision selection in data-dense interfaces
Design considerations:
- Support both pen and touch simultaneously
- Map pen buttons to contextual actions (eraser, right-click)
- Provide ink smoothing and latency compensation (< 20ms ideal)
Voice Interaction (VUI Design)
Voice user interfaces enable hands-free, eyes-free interaction through speech.
Voice Interaction Model
User speaks utterance
|
v
Automatic Speech Recognition (ASR) -> text
|
v
Natural Language Understanding (NLU) -> intent + entities
|
v
Dialog Management -> decide response
|
v
Natural Language Generation (NLG) -> response text
|
v
Text-to-Speech (TTS) -> audio output
VUI Design Principles
- Set expectations: Tell users what they can say. "You can say 'play music,' 'set a timer,' or 'what's the weather?'"
- Confirm understanding: Echo back key information. "Setting a timer for 10 minutes."
- Handle errors gracefully: Mishearing is inevitable. Offer reprompts with guidance.
- Keep responses brief: Audio is sequential; users cannot scan. 1-2 sentences per response.
- Provide escape hatches: "Stop," "Cancel," "Go back" should always work.
- Progressive disclosure: Start with a summary, offer details on request.
- Persona: Consistent voice character (tone, vocabulary, personality).
VUI Error Handling
Levels of error recovery:
1st failure: Simple reprompt
"Sorry, I didn't catch that. What city?"
2nd failure: Reprompt with guidance
"I didn't understand. You can say a city name, like 'San Francisco.'"
3rd failure: Offer alternative modality
"I'm having trouble understanding. Would you like to type your answer?"
Voice vs. Screen: When to Use Voice
| Voice Works Well | Voice Works Poorly | |-----------------|-------------------| | Hands/eyes busy (driving, cooking) | Browsing or comparing options | | Short, specific queries | Complex data entry | | Device control (lights, music) | Private/sensitive information | | Accessibility (vision, motor) | Noisy environments |
Eye Tracking
Eye tracking measures gaze position, fixation duration, and saccade patterns.
Research Applications
- Heatmaps: Visualize where users look most frequently
- Gaze plots: Show the sequence and duration of fixations
- Areas of interest (AOI): Measure attention on specific regions
- Time to first fixation: How quickly users notice an element
Interaction Applications
- Gaze-based selection: Dwell time on a target triggers selection (accessibility)
- Foveated rendering: Render highest detail only where the user is looking (VR optimization)
- Attention-aware interfaces: Adapt content based on whether the user is looking
Design Insights from Eye Tracking Research
- Users follow F-patterns on text-heavy pages
- Banner areas are systematically ignored (banner blindness)
- Users look at faces in images, then follow the face's gaze direction
- Left-side navigation receives more visual attention in LTR cultures
Motion and Gesture
Body and hand movements as input, captured by cameras or sensors.
Technologies
| Technology | Sensing Method | Example | |------------|---------------|---------| | Depth cameras | Structured light / ToF | Microsoft Kinect, Azure Kinect | | Hand tracking | Computer vision | Meta Quest hand tracking, Leap Motion | | IMU sensors | Accelerometer + gyroscope | Wii Remote, phone gestures | | Skeletal tracking | Full body pose estimation | Kinect, MediaPipe Pose |
Gesture Design Considerations
- Fatigue: "Gorilla arm" --- extended arm gestures cause rapid fatigue. Prefer subtle motions.
- Social acceptability: Users resist conspicuous gestures in public.
- Precision: Mid-air gestures are less precise than touch. Use larger targets and tolerance zones.
- Feedback: Without physical contact, haptic feedback is absent. Visual and audio feedback are essential.
- Discoverability: Non-obvious gestures need explicit teaching or onboarding.
Brain-Computer Interfaces (BCI)
BCIs translate neural signals into computer commands.
Types
| Type | Method | Invasiveness | Signal Quality | |------|--------|-------------|---------------| | EEG | Scalp electrodes | Non-invasive | Low (noisy) | | ECoG | Cortical surface electrodes | Surgical | Medium | | Intracortical | Implanted electrode arrays | Highly invasive | High | | fNIRS | Near-infrared spectroscopy | Non-invasive | Low-medium |
Current Applications
- Assistive communication: Patients with ALS or locked-in syndrome can select letters/words
- Prosthetic control: Neural signals drive robotic limbs
- Neurofeedback: Training attention or relaxation through real-time brain activity display
- P300 speller: User focuses on target letter in a flashing grid; EEG detects the P300 response
Limitations
- Low information transfer rate (~5-25 bits/minute for non-invasive)
- Extensive calibration required per user
- Signal noise from muscle movement, electrode drift
- Ethical considerations around neural data privacy and autonomy
Haptic Feedback
Haptic feedback uses touch sensations (vibration, force, texture) to communicate information.
Types of Haptic Feedback
| Type | Mechanism | Example | |------|-----------|---------| | Vibrotactile | Vibration motors (LRA, ERM) | Phone vibration on keypress | | Force feedback | Motors resisting movement | Gaming steering wheel resistance | | Surface haptics | Electrostatic friction on screen | Texture simulation on touchscreen | | Ultrasonic | Focused ultrasound in air | Mid-air haptic feedback (Ultraleap) |
Design Applications
- Confirmation: Short vibration pulse on successful action
- Warning: Distinct vibration pattern for errors or alerts
- Texture: Simulating material properties on touchscreens
- Navigation: Directional vibration for wayfinding (wristband buzz patterns)
- Accessibility: Haptic output for deaf or deaf-blind users
Multimodal Interaction
Combining multiple input and output modalities for richer, more robust interaction.
Input Fusion Strategies
| Strategy | Description | Example | |----------|-------------|---------| | Redundant | Same command via multiple modalities | Tap + voice "delete" | | Complementary | Different modalities provide different parts | Point at map + say "navigate here" | | Sequential | Modalities used in sequence | Wake word (voice) then touch selection | | Simultaneous | Modalities used at the same time | Gesture + speech (Bolt: "put that there") |
Design Principles for Multimodal Systems
- Modality equivalence: Critical functions should not depend on a single modality
- Graceful degradation: System works when one modality is unavailable
- User choice: Let users choose their preferred modality
- Mutual disambiguation: If one modality is ambiguous, the other clarifies
- Minimize cognitive load: Do not force users to manage multiple modalities simultaneously unless it is natural
CASA Paradigm
Computers Are Social Actors (Nass & Reeves): Users unconsciously apply social rules to computers. Multimodal systems that use voice, gesture, and facial expression are perceived more socially, raising expectations for human-like behavior. Design must manage these expectations to avoid the uncanny valley of interaction.