5 min read
On this page

Domain-Specific Languages

Overview

A domain-specific language (DSL) is a programming language tailored to a particular application domain. DSLs trade generality for expressiveness, safety, and optimization opportunities within their domain.

Examples: SQL (databases), HTML/CSS (web layout), regex (pattern matching), Make (build systems), TikZ (diagrams), Verilog (hardware description).


Internal vs. External DSLs

External DSLs

  • Standalone language with its own syntax, parser, and tooling
  • Full control over syntax and error messages
  • Requires building a complete language pipeline (lexer, parser, type checker, evaluator/compiler)
  • Examples: SQL, GraphQL, Protocol Buffers, Terraform HCL

Internal (Embedded) DSLs

  • Hosted within a general-purpose language (the host)
  • Reuse the host language's parser, type system, and tooling
  • Constrained by the host language's syntax
  • Examples: Rails ActiveRecord (Ruby), ScalaTest (Scala), Parsec (Haskell), React JSX (JavaScript, with transpilation)

Tradeoffs

| Aspect | External | Internal | |---|---|---| | Syntax freedom | Full control | Limited by host | | Tooling cost | High (build from scratch) | Low (reuse host) | | Error messages | Custom, domain-specific | Often confusing (host errors) | | Learning curve | New language to learn | Leverages host knowledge | | Optimization | Domain-specific transforms | Limited to host optimizations | | Interop | Requires FFI or codegen | Native (same language) |


Embedded DSL Techniques

Shallow Embedding

DSL constructs are directly interpreted as host language values. Each construct immediately computes its result.

-- Shallow embedding of a simple expression language
type Expr = Int

lit :: Int -> Expr
lit n = n

add :: Expr -> Expr -> Expr
add x y = x + y

mul :: Expr -> Expr -> Expr
mul x y = x * y

-- Usage: add (lit 3) (mul (lit 2) (lit 5))  =>  13
  • Simple and direct
  • Only one interpretation: cannot inspect, transform, or optimize the DSL program
  • Adding new operations is easy (new functions); adding new interpretations requires changing all constructors

Deep Embedding

DSL constructs build an AST (data structure) representing the program. Interpretation is separate.

data Expr = Lit Int | Add Expr Expr | Mul Expr Expr

eval :: Expr -> Int
eval (Lit n)   = n
eval (Add x y) = eval x + eval y
eval (Mul x y) = eval x * eval y

pretty :: Expr -> String
pretty (Lit n)   = show n
pretty (Add x y) = "(" ++ pretty x ++ " + " ++ pretty y ++ ")"
pretty (Mul x y) = "(" ++ pretty x ++ " * " ++ pretty y ++ ")"
  • Multiple interpretations: evaluation, pretty-printing, optimization, compilation
  • Adding new constructors requires modifying all interpreters (the expression problem)
  • Enables analysis and transformation of DSL programs

Tagless Final (Finally Tagless)

Resolves the expression problem by representing DSL terms via type class methods rather than data types.

class ExprSym repr where
  lit :: Int -> repr
  add :: repr -> repr -> repr
  mul :: repr -> repr -> repr

-- Interpretation 1: evaluation
newtype Eval = Eval { runEval :: Int }
instance ExprSym Eval where
  lit n   = Eval n
  add x y = Eval (runEval x + runEval y)
  mul x y = Eval (runEval x * runEval y)

-- Interpretation 2: pretty-printing
newtype Pretty = Pretty { runPretty :: String }
instance ExprSym Pretty where
  lit n   = Pretty (show n)
  add x y = Pretty ("(" ++ runPretty x ++ " + " ++ runPretty y ++ ")")
  mul x y = Pretty ("(" ++ runPretty x ++ " * " ++ runPretty y ++ ")")
  • Extensible in both dimensions: new operations (new class methods via subclassing) and new interpretations (new instances)
  • No intermediate data structure; terms are built directly in the target representation
  • Type safety ensures only well-formed DSL programs are expressible
  • Pioneered by Carette, Kiselyov, and Shan

Free Monads

Build a DSL as a free monad over a functor describing the operations.

data TeletypeF next
  = PutStr String next
  | GetLine (String -> next)
  deriving Functor

type Teletype = Free TeletypeF

putStr' :: String -> Teletype ()
putStr' s = liftF (PutStr s ())

getLine' :: Teletype String
getLine' = liftF (GetLine id)

-- Program as data:
greet :: Teletype ()
greet = do
  putStr' "Name? "
  name <- getLine'
  putStr' ("Hello, " ++ name)

-- Interpret to IO, to test mock, etc.
runIO :: Teletype a -> IO a
runIO (Pure a) = return a
runIO (Free (PutStr s next)) = putStrLn s >> runIO next
runIO (Free (GetLine k))     = getLine >>= runIO . k
  • Programs are data: can be inspected, optimized, and interpreted in multiple ways
  • Performance concern: quadratic append with naive >>=. Solutions: Codensity transform, freer monads, or effect systems
  • Freer monads (extensible effects): avoid the Functor requirement, use open unions of effect types

Language Workbenches

Integrated environments for designing, implementing, and evolving DSLs.

Key Features

  • Grammar/syntax definition with projectional or textual editing
  • Automatic generation of parsers, editors, type checkers
  • IDE support (syntax highlighting, completion, refactoring) derived from language definition

Examples

  • JetBrains MPS: projectional editor; AST is the primary representation, not text. Supports language composition
  • Xtext: generates Eclipse-based IDE from EBNF-like grammar definitions; integrates with EMF
  • Spoofax: based on SDF (syntax definition formalism) and Stratego (term rewriting); supports declarative specification of scoping and typing
  • Racket: language-oriented programming; #lang mechanism for defining new languages with full macro support

Projectional Editing

  • Users edit the AST directly through a projection (visual representation)
  • No parsing needed; eliminates syntactic ambiguity
  • Supports non-textual notations (tables, diagrams, math)
  • Tradeoff: unfamiliar editing experience; version control on structured data is harder

Macro-Based DSLs

Use the host language's macro system to extend syntax.

Scheme/Racket Macros

(define-syntax-rule (when condition body ...)
  (if condition (begin body ...) (void)))

;; Pattern-matching DSL
(define-syntax match
  (syntax-rules ()
    [(match val [pat body] ...)
     (cond [(matches? val 'pat) => (lambda (bindings) body)] ...)]))
  • Hygienic macros prevent accidental variable capture
  • syntax-parse provides pattern matching on syntax with error reporting
  • Racket's #lang allows full language redefinition

Rust Procedural Macros

// Derive macro for serialization (compile-time code generation)
@DERIVE(Serialize, Deserialize)
STRUCTURE Point { x: real, y: real }

// Macro for SQL-like queries (compile-time checked)
results ← SQL_QUERY!("SELECT * FROM users WHERE id = ?", user_id)
    .FETCH_ALL(pool)
  • Operate on token streams; can generate arbitrary code
  • Compile-time execution with full access to Rust's type system
  • Used extensively: serde, sqlx, rocket, clap

Lisp Macros vs. Template-Based Macros

  • Lisp: code as data (homoiconicity); macros are arbitrary code transformations
  • C/C++ macros: textual substitution; no hygiene, no structure awareness
  • Template Haskell: typed, staged metaprogramming; AST quotation and splicing

Parser Combinators for DSLs

Build parsers by composing small parsing functions.

Core Combinators

-- Monadic parser type
type Parser a = String -> [(a, String)]

-- Primitives
item :: Parser Char             -- consume one character
sat :: (Char -> Bool) -> Parser Char  -- conditional consume

-- Combinators
(<|>) :: Parser a -> Parser a -> Parser a    -- choice
(>>=) :: Parser a -> (a -> Parser b) -> Parser b  -- sequence
many  :: Parser a -> Parser [a]              -- repetition

Libraries

  • Parsec/Megaparsec (Haskell): monadic, excellent error messages, widely used
  • FParsec (F#): Parsec port for .NET
  • Nom (Rust): zero-copy, streaming parser combinators
  • pyparsing (Python): operator overloading for combinator syntax

Advantages for DSLs

  • DSL grammar is expressed in the host language: no separate grammar file
  • Composable: combine parsers from different DSLs
  • First-class parsers: can abstract, parameterize, and reuse
  • Incremental development: test parsers interactively

Limitations

  • Left recursion requires transformation (or packrat parsing)
  • Error messages need careful engineering (committed alternatives, labels)
  • Performance: backtracking can be expensive without cuts/commits

DSL Design Guidelines

  1. Start with the domain model: understand the concepts, relationships, and operations before designing syntax
  2. Prefer internal DSLs when: tight integration with host language is needed, rapid development is important, domain is well-served by host syntax
  3. Prefer external DSLs when: non-programmer users, domain-specific notation is critical, strong optimization opportunities
  4. Keep it small: a DSL should do one thing well. Avoid feature creep toward general-purpose
  5. Composability: design DSLs to compose with each other and with the host language
  6. Formal semantics: even a small DSL benefits from precise semantic specification

Summary

| Technique | Extensibility | Performance | Analysis | |---|---|---|---| | Shallow embedding | Easy to add ops | Direct execution | None | | Deep embedding | Easy to add interps | Overhead from AST | Full | | Tagless final | Both dimensions | Direct (no AST) | Limited | | Free monads | Both + effects | Overhead (fixable) | Full | | Macros | Syntax extension | Compile-time | Varies | | External DSL | Full control | Custom optimizations | Full |

The choice depends on the domain requirements, the host language capabilities, and the desired balance between flexibility and implementation effort.