Input Modalities

Keyboard and Mouse

The oldest and most well-studied digital input modalities. Optimized for precise, high-throughput interaction at a desk.

Keyboard

QWERTY layout: Designed in 1873, now a de facto standard despite alternatives (Dvorak, Colemak)
Expert throughput: 50-100 WPM for typical users, 150+ WPM for professionals
Keyboard shortcuts: Critical for power users. Reduce mouse travel and speed up repetitive tasks

Shortcut design principles:

Mnemonics: Ctrl+B for Bold, Ctrl+S for Save
Consistency: Same shortcut across applications for the same action
Discoverability: Show shortcuts in menus and tooltips
Modifier hierarchy: Ctrl for commands, Shift for modification, Alt for alternatives

Mouse / Trackpad

Direct manipulation: Point, click, drag --- maps to physical action metaphors
Precision: Sub-pixel accuracy, suitable for design tools and text editing
Right-click context menus: Secondary actions scoped to the target element
Scroll: Vertical primary, horizontal secondary. Scroll direction conventions differ by platform

Cursor states communicate affordance:

Cursor	Meaning
Arrow	Default, no special interaction
Pointer (hand)	Clickable link or button
I-beam	Text is selectable/editable
Crosshair	Precision selection
Grab / Grabbing	Draggable element
Not-allowed	Action is disabled
Wait / Progress	System is busy
Resize arrows	Edge or corner can be dragged to resize

Touch Gestures

Touch is the dominant input for mobile devices. Interaction is direct (finger on content) rather than indirect (mouse moves cursor).

Standard Touch Gestures

Gesture	Action	Example
Tap	Select, activate	Open an app
Double tap	Zoom, select word	Zoom into a photo
Long press	Secondary action, context menu	Select an item in a list
Swipe	Scroll, navigate, dismiss	Scroll a list, go back
Pinch / Spread	Zoom out / Zoom in	Map zoom
Rotate	Rotate object	Rotate a photo
Drag	Move object	Reorder a list
Edge swipe	System navigation	iOS swipe from left edge = back

Touch Design Constraints

Fat finger problem: Human fingertip is ~10mm, covering ~57 pixels at 160 DPI. Minimum target: 44x44pt (Apple), 48x48dp (Google).
Occlusion: Finger covers the target during touch. Consider placement of tooltips and feedback above/beside the touch point.
No hover state: Touch interfaces lack hover. Information revealed on hover must be accessible through tap or visible by default.
Accidental touch: Require deliberate gestures for destructive actions. Distinguish tap from scroll start.

Gesture Design Guidelines

Use standard platform gestures --- do not reinvent swipe or pinch
Gestures must be discoverable (not the only way to perform an action)
Provide visual feedback during gesture (element follows finger)
Support gesture cancellation (drag back to original position)
Avoid gestures requiring more than two fingers (motor difficulty, discoverability)

Pen and Stylus

Pen input combines the precision of a mouse with the naturalness of handwriting.

Advantages over touch:

Finer precision (1-2mm tip vs 10mm finger)
Pressure sensitivity (line weight variation)
Palm rejection (rest hand on screen while writing)
Hover detection (cursor feedback before contact)
Tilt detection (shading in drawing apps)

Applications:

Digital note-taking (handwriting recognition, annotation)
Illustration and design (pressure-sensitive brushes)
Signature capture (legal documents)
Precision selection in data-dense interfaces

Design considerations:

Support both pen and touch simultaneously
Map pen buttons to contextual actions (eraser, right-click)
Provide ink smoothing and latency compensation (< 20ms ideal)

Voice Interaction (VUI Design)

Voice user interfaces enable hands-free, eyes-free interaction through speech.

Voice Interaction Model

User speaks utterance
        |
        v
Automatic Speech Recognition (ASR) -> text
        |
        v
Natural Language Understanding (NLU) -> intent + entities
        |
        v
Dialog Management -> decide response
        |
        v
Natural Language Generation (NLG) -> response text
        |
        v
Text-to-Speech (TTS) -> audio output

VUI Design Principles

Set expectations: Tell users what they can say. "You can say 'play music,' 'set a timer,' or 'what's the weather?'"
Confirm understanding: Echo back key information. "Setting a timer for 10 minutes."
Handle errors gracefully: Mishearing is inevitable. Offer reprompts with guidance.
Keep responses brief: Audio is sequential; users cannot scan. 1-2 sentences per response.
Provide escape hatches: "Stop," "Cancel," "Go back" should always work.
Progressive disclosure: Start with a summary, offer details on request.
Persona: Consistent voice character (tone, vocabulary, personality).

VUI Error Handling

Levels of error recovery:

1st failure: Simple reprompt
   "Sorry, I didn't catch that. What city?"

2nd failure: Reprompt with guidance
   "I didn't understand. You can say a city name, like 'San Francisco.'"

3rd failure: Offer alternative modality
   "I'm having trouble understanding. Would you like to type your answer?"

Voice vs. Screen: When to Use Voice

Voice Works Well	Voice Works Poorly
Hands/eyes busy (driving, cooking)	Browsing or comparing options
Short, specific queries	Complex data entry
Device control (lights, music)	Private/sensitive information
Accessibility (vision, motor)	Noisy environments

Eye Tracking

Eye tracking measures gaze position, fixation duration, and saccade patterns.

Research Applications

Heatmaps: Visualize where users look most frequently
Gaze plots: Show the sequence and duration of fixations
Areas of interest (AOI): Measure attention on specific regions
Time to first fixation: How quickly users notice an element

Interaction Applications

Gaze-based selection: Dwell time on a target triggers selection (accessibility)
Foveated rendering: Render highest detail only where the user is looking (VR optimization)
Attention-aware interfaces: Adapt content based on whether the user is looking

Design Insights from Eye Tracking Research

Users follow F-patterns on text-heavy pages
Banner areas are systematically ignored (banner blindness)
Users look at faces in images, then follow the face's gaze direction
Left-side navigation receives more visual attention in LTR cultures

Motion and Gesture

Body and hand movements as input, captured by cameras or sensors.

Technologies

Technology	Sensing Method	Example
Depth cameras	Structured light / ToF	Microsoft Kinect, Azure Kinect
Hand tracking	Computer vision	Meta Quest hand tracking, Leap Motion
IMU sensors	Accelerometer + gyroscope	Wii Remote, phone gestures
Skeletal tracking	Full body pose estimation	Kinect, MediaPipe Pose

Gesture Design Considerations

Fatigue: "Gorilla arm" --- extended arm gestures cause rapid fatigue. Prefer subtle motions.
Social acceptability: Users resist conspicuous gestures in public.
Precision: Mid-air gestures are less precise than touch. Use larger targets and tolerance zones.
Feedback: Without physical contact, haptic feedback is absent. Visual and audio feedback are essential.
Discoverability: Non-obvious gestures need explicit teaching or onboarding.

Brain-Computer Interfaces (BCI)

BCIs translate neural signals into computer commands.

Types

Type	Method	Invasiveness	Signal Quality
EEG	Scalp electrodes	Non-invasive	Low (noisy)
ECoG	Cortical surface electrodes	Surgical	Medium
Intracortical	Implanted electrode arrays	Highly invasive	High
fNIRS	Near-infrared spectroscopy	Non-invasive	Low-medium

Current Applications

Assistive communication: Patients with ALS or locked-in syndrome can select letters/words
Prosthetic control: Neural signals drive robotic limbs
Neurofeedback: Training attention or relaxation through real-time brain activity display
P300 speller: User focuses on target letter in a flashing grid; EEG detects the P300 response

Limitations

Low information transfer rate (~5-25 bits/minute for non-invasive)
Extensive calibration required per user
Signal noise from muscle movement, electrode drift
Ethical considerations around neural data privacy and autonomy

Haptic Feedback

Haptic feedback uses touch sensations (vibration, force, texture) to communicate information.

Types of Haptic Feedback

Type	Mechanism	Example
Vibrotactile	Vibration motors (LRA, ERM)	Phone vibration on keypress
Force feedback	Motors resisting movement	Gaming steering wheel resistance
Surface haptics	Electrostatic friction on screen	Texture simulation on touchscreen
Ultrasonic	Focused ultrasound in air	Mid-air haptic feedback (Ultraleap)

Design Applications

Confirmation: Short vibration pulse on successful action
Warning: Distinct vibration pattern for errors or alerts
Texture: Simulating material properties on touchscreens
Navigation: Directional vibration for wayfinding (wristband buzz patterns)
Accessibility: Haptic output for deaf or deaf-blind users

Multimodal Interaction

Combining multiple input and output modalities for richer, more robust interaction.

Input Fusion Strategies

Strategy	Description	Example
Redundant	Same command via multiple modalities	Tap + voice "delete"
Complementary	Different modalities provide different parts	Point at map + say "navigate here"
Sequential	Modalities used in sequence	Wake word (voice) then touch selection
Simultaneous	Modalities used at the same time	Gesture + speech (Bolt: "put that there")

Design Principles for Multimodal Systems

Modality equivalence: Critical functions should not depend on a single modality
Graceful degradation: System works when one modality is unavailable
User choice: Let users choose their preferred modality
Mutual disambiguation: If one modality is ambiguous, the other clarifies
Minimize cognitive load: Do not force users to manage multiple modalities simultaneously unless it is natural

Computers Are Social Actors (Nass & Reeves): Users unconsciously apply social rules to computers. Multimodal systems that use voice, gesture, and facial expression are perceived more socially, raising expectations for human-like behavior. Design must manage these expectations to avoid the uncanny valley of interaction.