Domain-Specific Languages

Overview

A domain-specific language (DSL) is a programming language tailored to a particular application domain. DSLs trade generality for expressiveness, safety, and optimization opportunities within their domain.

Examples: SQL (databases), HTML/CSS (web layout), regex (pattern matching), Make (build systems), TikZ (diagrams), Verilog (hardware description).

Internal vs. External DSLs

External DSLs

Standalone language with its own syntax, parser, and tooling
Full control over syntax and error messages
Requires building a complete language pipeline (lexer, parser, type checker, evaluator/compiler)
Examples: SQL, GraphQL, Protocol Buffers, Terraform HCL

Internal (Embedded) DSLs

Hosted within a general-purpose language (the host)
Reuse the host language's parser, type system, and tooling
Constrained by the host language's syntax
Examples: Rails ActiveRecord (Ruby), ScalaTest (Scala), Parsec (Haskell), React JSX (JavaScript, with transpilation)

Tradeoffs

Aspect	External	Internal
Syntax freedom	Full control	Limited by host
Tooling cost	High (build from scratch)	Low (reuse host)
Error messages	Custom, domain-specific	Often confusing (host errors)
Learning curve	New language to learn	Leverages host knowledge
Optimization	Domain-specific transforms	Limited to host optimizations
Interop	Requires FFI or codegen	Native (same language)

Embedded DSL Techniques

Shallow Embedding

DSL constructs are directly interpreted as host language values. Each construct immediately computes its result.

-- Shallow embedding of a simple expression language
type Expr = Int

lit :: Int -> Expr
lit n = n

add :: Expr -> Expr -> Expr
add x y = x + y

mul :: Expr -> Expr -> Expr
mul x y = x * y

-- Usage: add (lit 3) (mul (lit 2) (lit 5))  =>  13

Simple and direct
Only one interpretation: cannot inspect, transform, or optimize the DSL program
Adding new operations is easy (new functions); adding new interpretations requires changing all constructors

Deep Embedding

DSL constructs build an AST (data structure) representing the program. Interpretation is separate.

data Expr = Lit Int | Add Expr Expr | Mul Expr Expr

eval :: Expr -> Int
eval (Lit n)   = n
eval (Add x y) = eval x + eval y
eval (Mul x y) = eval x * eval y

pretty :: Expr -> String
pretty (Lit n)   = show n
pretty (Add x y) = "(" ++ pretty x ++ " + " ++ pretty y ++ ")"
pretty (Mul x y) = "(" ++ pretty x ++ " * " ++ pretty y ++ ")"

Multiple interpretations: evaluation, pretty-printing, optimization, compilation
Adding new constructors requires modifying all interpreters (the expression problem)
Enables analysis and transformation of DSL programs

Tagless Final (Finally Tagless)

Resolves the expression problem by representing DSL terms via type class methods rather than data types.

class ExprSym repr where
  lit :: Int -> repr
  add :: repr -> repr -> repr
  mul :: repr -> repr -> repr

-- Interpretation 1: evaluation
newtype Eval = Eval { runEval :: Int }
instance ExprSym Eval where
  lit n   = Eval n
  add x y = Eval (runEval x + runEval y)
  mul x y = Eval (runEval x * runEval y)

-- Interpretation 2: pretty-printing
newtype Pretty = Pretty { runPretty :: String }
instance ExprSym Pretty where
  lit n   = Pretty (show n)
  add x y = Pretty ("(" ++ runPretty x ++ " + " ++ runPretty y ++ ")")
  mul x y = Pretty ("(" ++ runPretty x ++ " * " ++ runPretty y ++ ")")

Extensible in both dimensions: new operations (new class methods via subclassing) and new interpretations (new instances)
No intermediate data structure; terms are built directly in the target representation
Type safety ensures only well-formed DSL programs are expressible
Pioneered by Carette, Kiselyov, and Shan

Free Monads

Build a DSL as a free monad over a functor describing the operations.

data TeletypeF next
  = PutStr String next
  | GetLine (String -> next)
  deriving Functor

type Teletype = Free TeletypeF

putStr' :: String -> Teletype ()
putStr' s = liftF (PutStr s ())

getLine' :: Teletype String
getLine' = liftF (GetLine id)

-- Program as data:
greet :: Teletype ()
greet = do
  putStr' "Name? "
  name <- getLine'
  putStr' ("Hello, " ++ name)

-- Interpret to IO, to test mock, etc.
runIO :: Teletype a -> IO a
runIO (Pure a) = return a
runIO (Free (PutStr s next)) = putStrLn s >> runIO next
runIO (Free (GetLine k))     = getLine >>= runIO . k

Programs are data: can be inspected, optimized, and interpreted in multiple ways
Performance concern: quadratic append with naive >>=. Solutions: Codensity transform, freer monads, or effect systems
Freer monads (extensible effects): avoid the Functor requirement, use open unions of effect types

Language Workbenches

Integrated environments for designing, implementing, and evolving DSLs.

Key Features

Grammar/syntax definition with projectional or textual editing
Automatic generation of parsers, editors, type checkers
IDE support (syntax highlighting, completion, refactoring) derived from language definition

Examples

JetBrains MPS: projectional editor; AST is the primary representation, not text. Supports language composition
Xtext: generates Eclipse-based IDE from EBNF-like grammar definitions; integrates with EMF
Spoofax: based on SDF (syntax definition formalism) and Stratego (term rewriting); supports declarative specification of scoping and typing
Racket: language-oriented programming; #lang mechanism for defining new languages with full macro support

Projectional Editing

Users edit the AST directly through a projection (visual representation)
No parsing needed; eliminates syntactic ambiguity
Supports non-textual notations (tables, diagrams, math)
Tradeoff: unfamiliar editing experience; version control on structured data is harder

Macro-Based DSLs

Use the host language's macro system to extend syntax.

Scheme/Racket Macros

(define-syntax-rule (when condition body ...)
  (if condition (begin body ...) (void)))

;; Pattern-matching DSL
(define-syntax match
  (syntax-rules ()
    [(match val [pat body] ...)
     (cond [(matches? val 'pat) => (lambda (bindings) body)] ...)]))

Hygienic macros prevent accidental variable capture
syntax-parse provides pattern matching on syntax with error reporting
Racket's #lang allows full language redefinition

Rust Procedural Macros

// Derive macro for serialization (compile-time code generation)
@DERIVE(Serialize, Deserialize)
STRUCTURE Point { x: real, y: real }

// Macro for SQL-like queries (compile-time checked)
results ← SQL_QUERY!("SELECT * FROM users WHERE id = ?", user_id)
    .FETCH_ALL(pool)

Operate on token streams; can generate arbitrary code
Compile-time execution with full access to Rust's type system
Used extensively: serde, sqlx, rocket, clap

Lisp Macros vs. Template-Based Macros

Lisp: code as data (homoiconicity); macros are arbitrary code transformations
C/C++ macros: textual substitution; no hygiene, no structure awareness
Template Haskell: typed, staged metaprogramming; AST quotation and splicing

Parser Combinators for DSLs

Build parsers by composing small parsing functions.

Core Combinators

-- Monadic parser type
type Parser a = String -> [(a, String)]

-- Primitives
item :: Parser Char             -- consume one character
sat :: (Char -> Bool) -> Parser Char  -- conditional consume

-- Combinators
(<|>) :: Parser a -> Parser a -> Parser a    -- choice
(>>=) :: Parser a -> (a -> Parser b) -> Parser b  -- sequence
many  :: Parser a -> Parser [a]              -- repetition

Libraries

Parsec/Megaparsec (Haskell): monadic, excellent error messages, widely used
FParsec (F#): Parsec port for .NET
Nom (Rust): zero-copy, streaming parser combinators
pyparsing (Python): operator overloading for combinator syntax

Advantages for DSLs

DSL grammar is expressed in the host language: no separate grammar file
Composable: combine parsers from different DSLs
First-class parsers: can abstract, parameterize, and reuse
Incremental development: test parsers interactively

Limitations

Left recursion requires transformation (or packrat parsing)
Error messages need careful engineering (committed alternatives, labels)
Performance: backtracking can be expensive without cuts/commits

DSL Design Guidelines

Start with the domain model: understand the concepts, relationships, and operations before designing syntax
Prefer internal DSLs when: tight integration with host language is needed, rapid development is important, domain is well-served by host syntax
Prefer external DSLs when: non-programmer users, domain-specific notation is critical, strong optimization opportunities
Keep it small: a DSL should do one thing well. Avoid feature creep toward general-purpose
Composability: design DSLs to compose with each other and with the host language
Formal semantics: even a small DSL benefits from precise semantic specification

Summary

Technique	Extensibility	Performance	Analysis
Shallow embedding	Easy to add ops	Direct execution	None
Deep embedding	Easy to add interps	Overhead from AST	Full
Tagless final	Both dimensions	Direct (no AST)	Limited
Free monads	Both + effects	Overhead (fixable)	Full
Macros	Syntax extension	Compile-time	Varies
External DSL	Full control	Custom optimizations	Full

The choice depends on the domain requirements, the host language capabilities, and the desired balance between flexibility and implementation effort.