HTML Character References & XSS Prevention

In HTML, certain characters have special meaning and must be escaped when displayed as text. HTML character references provide a way to represent any Unicode character using named entities (like ©) or numeric references (like © or ©).

Named vs. Numeric References

  • Named entities: Human-readable references like &amp; (for &), &lt; (for <), &gt; (for >), and &quot; (for "). Preferred when available.
  • Decimal numeric references: Format &#nnn; where nnn is the decimal Unicode code point. Example: &#60; for <.
  • Hexadecimal numeric references: Format &#xnnn; where nnn is the hex Unicode code point. Example: &#x3C; for <.

XSS Prevention

Cross-Site Scripting (XSS) attacks occur when malicious scripts are injected into web pages. HTML encoding is a critical defense layer:

  • Always encode user input before rendering in HTML contexts
  • The five characters that MUST be encoded: &, <, >, ", and '
  • In attribute contexts, also encode any quotes used as delimiters

HTML5 Specification

Per the HTML5 specification, user agents must recognize the full range of named entities defined in the HTML specification. Our encoder follows these standards to ensure maximum compatibility.