Domain-Specific Languages
Overview
A domain-specific language (DSL) is a programming language tailored to a particular application domain. DSLs trade generality for expressiveness, safety, and optimization opportunities within their domain.
Examples: SQL (databases), HTML/CSS (web layout), regex (pattern matching), Make (build systems), TikZ (diagrams), Verilog (hardware description).
Internal vs. External DSLs
External DSLs
- Standalone language with its own syntax, parser, and tooling
- Full control over syntax and error messages
- Requires building a complete language pipeline (lexer, parser, type checker, evaluator/compiler)
- Examples: SQL, GraphQL, Protocol Buffers, Terraform HCL
Internal (Embedded) DSLs
- Hosted within a general-purpose language (the host)
- Reuse the host language's parser, type system, and tooling
- Constrained by the host language's syntax
- Examples: Rails ActiveRecord (Ruby), ScalaTest (Scala), Parsec (Haskell), React JSX (JavaScript, with transpilation)
Tradeoffs
| Aspect | External | Internal | |---|---|---| | Syntax freedom | Full control | Limited by host | | Tooling cost | High (build from scratch) | Low (reuse host) | | Error messages | Custom, domain-specific | Often confusing (host errors) | | Learning curve | New language to learn | Leverages host knowledge | | Optimization | Domain-specific transforms | Limited to host optimizations | | Interop | Requires FFI or codegen | Native (same language) |
Embedded DSL Techniques
Shallow Embedding
DSL constructs are directly interpreted as host language values. Each construct immediately computes its result.
-- Shallow embedding of a simple expression language
type Expr = Int
lit :: Int -> Expr
lit n = n
add :: Expr -> Expr -> Expr
add x y = x + y
mul :: Expr -> Expr -> Expr
mul x y = x * y
-- Usage: add (lit 3) (mul (lit 2) (lit 5)) => 13
- Simple and direct
- Only one interpretation: cannot inspect, transform, or optimize the DSL program
- Adding new operations is easy (new functions); adding new interpretations requires changing all constructors
Deep Embedding
DSL constructs build an AST (data structure) representing the program. Interpretation is separate.
data Expr = Lit Int | Add Expr Expr | Mul Expr Expr
eval :: Expr -> Int
eval (Lit n) = n
eval (Add x y) = eval x + eval y
eval (Mul x y) = eval x * eval y
pretty :: Expr -> String
pretty (Lit n) = show n
pretty (Add x y) = "(" ++ pretty x ++ " + " ++ pretty y ++ ")"
pretty (Mul x y) = "(" ++ pretty x ++ " * " ++ pretty y ++ ")"
- Multiple interpretations: evaluation, pretty-printing, optimization, compilation
- Adding new constructors requires modifying all interpreters (the expression problem)
- Enables analysis and transformation of DSL programs
Tagless Final (Finally Tagless)
Resolves the expression problem by representing DSL terms via type class methods rather than data types.
class ExprSym repr where
lit :: Int -> repr
add :: repr -> repr -> repr
mul :: repr -> repr -> repr
-- Interpretation 1: evaluation
newtype Eval = Eval { runEval :: Int }
instance ExprSym Eval where
lit n = Eval n
add x y = Eval (runEval x + runEval y)
mul x y = Eval (runEval x * runEval y)
-- Interpretation 2: pretty-printing
newtype Pretty = Pretty { runPretty :: String }
instance ExprSym Pretty where
lit n = Pretty (show n)
add x y = Pretty ("(" ++ runPretty x ++ " + " ++ runPretty y ++ ")")
mul x y = Pretty ("(" ++ runPretty x ++ " * " ++ runPretty y ++ ")")
- Extensible in both dimensions: new operations (new class methods via subclassing) and new interpretations (new instances)
- No intermediate data structure; terms are built directly in the target representation
- Type safety ensures only well-formed DSL programs are expressible
- Pioneered by Carette, Kiselyov, and Shan
Free Monads
Build a DSL as a free monad over a functor describing the operations.
data TeletypeF next
= PutStr String next
| GetLine (String -> next)
deriving Functor
type Teletype = Free TeletypeF
putStr' :: String -> Teletype ()
putStr' s = liftF (PutStr s ())
getLine' :: Teletype String
getLine' = liftF (GetLine id)
-- Program as data:
greet :: Teletype ()
greet = do
putStr' "Name? "
name <- getLine'
putStr' ("Hello, " ++ name)
-- Interpret to IO, to test mock, etc.
runIO :: Teletype a -> IO a
runIO (Pure a) = return a
runIO (Free (PutStr s next)) = putStrLn s >> runIO next
runIO (Free (GetLine k)) = getLine >>= runIO . k
- Programs are data: can be inspected, optimized, and interpreted in multiple ways
- Performance concern: quadratic append with naive
>>=. Solutions: Codensity transform, freer monads, or effect systems - Freer monads (extensible effects): avoid the Functor requirement, use open unions of effect types
Language Workbenches
Integrated environments for designing, implementing, and evolving DSLs.
Key Features
- Grammar/syntax definition with projectional or textual editing
- Automatic generation of parsers, editors, type checkers
- IDE support (syntax highlighting, completion, refactoring) derived from language definition
Examples
- JetBrains MPS: projectional editor; AST is the primary representation, not text. Supports language composition
- Xtext: generates Eclipse-based IDE from EBNF-like grammar definitions; integrates with EMF
- Spoofax: based on SDF (syntax definition formalism) and Stratego (term rewriting); supports declarative specification of scoping and typing
- Racket: language-oriented programming;
#langmechanism for defining new languages with full macro support
Projectional Editing
- Users edit the AST directly through a projection (visual representation)
- No parsing needed; eliminates syntactic ambiguity
- Supports non-textual notations (tables, diagrams, math)
- Tradeoff: unfamiliar editing experience; version control on structured data is harder
Macro-Based DSLs
Use the host language's macro system to extend syntax.
Scheme/Racket Macros
(define-syntax-rule (when condition body ...)
(if condition (begin body ...) (void)))
;; Pattern-matching DSL
(define-syntax match
(syntax-rules ()
[(match val [pat body] ...)
(cond [(matches? val 'pat) => (lambda (bindings) body)] ...)]))
- Hygienic macros prevent accidental variable capture
syntax-parseprovides pattern matching on syntax with error reporting- Racket's
#langallows full language redefinition
Rust Procedural Macros
// Derive macro for serialization (compile-time code generation)
@DERIVE(Serialize, Deserialize)
STRUCTURE Point { x: real, y: real }
// Macro for SQL-like queries (compile-time checked)
results ← SQL_QUERY!("SELECT * FROM users WHERE id = ?", user_id)
.FETCH_ALL(pool)
- Operate on token streams; can generate arbitrary code
- Compile-time execution with full access to Rust's type system
- Used extensively: serde, sqlx, rocket, clap
Lisp Macros vs. Template-Based Macros
- Lisp: code as data (homoiconicity); macros are arbitrary code transformations
- C/C++ macros: textual substitution; no hygiene, no structure awareness
- Template Haskell: typed, staged metaprogramming; AST quotation and splicing
Parser Combinators for DSLs
Build parsers by composing small parsing functions.
Core Combinators
-- Monadic parser type
type Parser a = String -> [(a, String)]
-- Primitives
item :: Parser Char -- consume one character
sat :: (Char -> Bool) -> Parser Char -- conditional consume
-- Combinators
(<|>) :: Parser a -> Parser a -> Parser a -- choice
(>>=) :: Parser a -> (a -> Parser b) -> Parser b -- sequence
many :: Parser a -> Parser [a] -- repetition
Libraries
- Parsec/Megaparsec (Haskell): monadic, excellent error messages, widely used
- FParsec (F#): Parsec port for .NET
- Nom (Rust): zero-copy, streaming parser combinators
- pyparsing (Python): operator overloading for combinator syntax
Advantages for DSLs
- DSL grammar is expressed in the host language: no separate grammar file
- Composable: combine parsers from different DSLs
- First-class parsers: can abstract, parameterize, and reuse
- Incremental development: test parsers interactively
Limitations
- Left recursion requires transformation (or packrat parsing)
- Error messages need careful engineering (committed alternatives, labels)
- Performance: backtracking can be expensive without cuts/commits
DSL Design Guidelines
- Start with the domain model: understand the concepts, relationships, and operations before designing syntax
- Prefer internal DSLs when: tight integration with host language is needed, rapid development is important, domain is well-served by host syntax
- Prefer external DSLs when: non-programmer users, domain-specific notation is critical, strong optimization opportunities
- Keep it small: a DSL should do one thing well. Avoid feature creep toward general-purpose
- Composability: design DSLs to compose with each other and with the host language
- Formal semantics: even a small DSL benefits from precise semantic specification
Summary
| Technique | Extensibility | Performance | Analysis | |---|---|---|---| | Shallow embedding | Easy to add ops | Direct execution | None | | Deep embedding | Easy to add interps | Overhead from AST | Full | | Tagless final | Both dimensions | Direct (no AST) | Limited | | Free monads | Both + effects | Overhead (fixable) | Full | | Macros | Syntax extension | Compile-time | Varies | | External DSL | Full control | Custom optimizations | Full |
The choice depends on the domain requirements, the host language capabilities, and the desired balance between flexibility and implementation effort.