The pretty_expressive library

This library implements a pretty expressive printer, following the algorithm presented in A Pretty Expressive Printer (OOPSLA'23). The pretty printer is expressive, provably optimal, and practically efficient.

Getting Started

open Pretty_expressive
(* Sets the page width limit to 80 *)
let cf = Printer.default_cost_factory ~page_width:80 ()
module P = Printer.Make (val cf)
open P
# pretty_format (text "Hello" ^^ text " World!");;
- : string = "Hello World!"

API Reference

See the documentation for the module Pretty_expressive.

Pretty Expressive Guide

General-purpose pretty printing is a process that produces human readable text from structured data. Users encode the structured data together with styling choices in an abstract document doc. This document contains printing instructions: things like text, newlines, and indentation. It can also contain choices (<|>) between two or more alternatives, resulting in many possible layouts for a document. Pretty_expressive’s job is to pick a prettiest layout (according to a specified optimality objective) from among all of the choices. E.g., the one that minimizes the number of lines while not exceeding the page width limit.

Here’s a simple example that pretty prints a document encoding a fragment of code.

open Pretty_expressive

(** Prints the example document [d] with the page width limit [w] *)
let print_doc (w : int) =
  let cf = Printer.default_cost_factory ~page_width:w () in
  let module P = Printer.Make (val cf) in
  let open P in

  let d = text "while (true) {" ^^
          nest 4
            (nl ^^ text "f();" ^^ nl ^^ text "if (done())" ^^
             (let exit_d = text "exit();" in
              (space ^^ exit_d) <|> nest 4 (nl ^^ exit_d))) ^^
          nl ^^ text "}"
  in
  pretty_print print_string d

There is a lot of code above, so let's unpack it.

Document Construction

In the above code, let d = text "while (true) {" ^^ .... defines a document d.

As a result, the above document d encodes two possible layouts:

while (true) {
    f();
    if (done()) exit();
}

and

while (true) {
    f();
    if (done())
        exit();
}

Printer Construction

Most pretty printers are parameterized by a page width limit, which indicates the number of characters that each line should not exceed. Pretty_expressive is instead parameterized by a cost factory, which can control not only the page width limit, but also other aspects of prettiness. For the sake of simplicity, we will for now use a pre-defined cost factory Printer.default_cost_factory, which only allows the page width limit adjustment through the labeled argument ~page_width. Thus, let cf = Printer.default_cost_factory ~page_width:w () creates a (first-class module) cost factory cf that sets the page width limit to w.

The pretty printer can then be instantiated by using Printer.Make. It is a functor that consumes a CostFactory module. Here, let module P = Printer.Make (val cf) creates a pretty printer P with the above cost factory.

We then let open P so that combinators like text, ^^, and the pretty printing function pretty_print are in scope.

Putting them all together

With the above setup, we can pretty-print d with the cost factory cf by calling pretty_print print_string d, which uses print_string to output content.

Let's now actually use the pretty printer with page width limit of 80.

# print_doc 80;;
while (true) {
    f();
    if (done()) exit();
}
- : unit = ()

It outputs the first layout because the layout fits the page width limit, while having fewer lines compared to the second layout. By contrast, with the page width limit of 20, the second layout is now chosen.

# print_doc 20;;
while (true) {
    f();
    if (done())
        exit();
}
- : unit = ()

This is because the first layout does not fit the page width limit, leaving the second layout as the only option.

Alternative Document Construction

There are many ways to construct a document that encodes the same set of layouts. Some may be easier than the other.

For example, another way to construct a document for the above example could be:

let d = text "while (true) {" ^^
        nest 4
          (nl ^^ text "f();" ^^ nl ^^ text "if (done())" ^^
           group (nest 4 (nl ^^ text "exit();"))) ^^
        nl ^^ text "}"

Here, the group combinator is used. It creates a choice: whether to collapse nl to a single space or not.

See Printer.Make for the full list of combinators that we provide. Since combinators are simply regular functions, users may also compose the existing combinators together to create new user-defined combinators.

Explainers

In this section, we explain some important concepts. The full design consideration of Pretty_expressive can be found in the paper.

Best Practice for Document Construction

While we can put arbitrary sub-documents as the operands of the choice combinator <|>, we should share sub-documents across choices. This matters because the performance of Pretty_expressive depends on the DAG size of the input document. Without sharing, the input document would get unfolded into a tree, whose size could be exponentially large compared to the possible DAG size.

As an example, say we want to pretty print an S-expression with three possible styles for each list node: the horizontal style, the vertical style, and the argument list style. That is:

(a b c d)

could be printed as itself or

(a
 b
 c
 d)

or

(a b
   c
   d)

We can construct a function to convert an S-expression to a document as follows:

type sexp = Atom of string | List of sexp list

let print_sexp (s : sexp) (w : int) =
  let cf = Printer.default_cost_factory ~page_width:w () in
  let module P = Printer.Make (val cf) in
  let open P in

  let acat = fold_doc (fun x y -> x <+> space <+> y) in

  let rec pretty (s : sexp) =
    match s with
    | Atom s -> text s
    | List [] -> lparen <+> rparen
    | List [x] -> lparen <+> pretty x <+> rparen
    | List (x :: xs) ->
      let x_d = pretty x in
      let xs_d = List.map pretty xs in
      lparen <+>
      (* Share x_d and xs_d across choices *)
      (acat (x_d :: xs_d) <|> (* the horizontal style *)
       vcat (x_d :: xs_d) <|> (* the vertical style *)
       (x_d <+> space <+> vcat xs_d)) <+> (* the argument list style *)
      rparen
  in
  pretty_print print_string (pretty s)

The important point is that we reuse x_d and xs_d across <|>. Had we written the following code instead, the document construction could take exponential time, and the resulting document whose DAG size is very large would also cause pretty-printing to be inefficient.

(* Don't do this! *)
lparen <+>
(acat (pretty x :: List.map pretty xs) <|> (* the horizontal style *)
 vcat (pretty x :: List.map pretty xs) <|> (* the vertical style *)
 (pretty x <+> space <+> vcat List.map pretty xs)) <+> (* the argument list style *)
rparen

The Computation Width Limit

Unlike other pretty printers, Pretty_expressive needs an additional parameter: the computation width limit. Regular users should not need to concern much with this parameter (default_cost_factory will already provide a sensible value for the parameter), but advanced users who need a fine-grained control may want to adjust this parameter.

The parameter is used by the pretty printer to bound the computation. On the flip side, the pretty printer is only guaranteed to return an optimal layout among layout printings that do not exceed the parameter. As a result, if the parameter is too high, the performance could be negatively impacted, but if the parameter is too low, the output layout quality could be negatively impacted.

Pretty_expressive employs various heuristics to make the output layout pretty even when the computation width limit is exceeded, however. In most applications, the value of 1.2 \times w should suffice, where w is the page width limit.

Technical Notes

A layout printing exceeds the computation width limit when either the column position or indentation level exceeds the limit. For example:

text "Racket"

exceeds the computation width limit of 5, since there are 6 characters in a line. Similarly:

nest 6 (text "Rack")

exceeds the computation width limit of 5, since the indentation level exceeds 5 (even though this indentation level is completely unused).

When all possible layout printing due to a document exceeds the computation width limit, Pretty_expressive will still output a layout, with no guarantee that the layout is optimal. In such case, we say that the output layout is tainted. The info record and functions such as pretty_print_info can be used to find if the output layout is tainted or not.

Cost Factory

Pretty printers choose an optimal layout from a document by minimizing an optimality objective. Unlike other pretty printers, which have built-in optimality objectives, Pretty_expressive allows users to customize an optimality objective via the cost factory.

The cost factory interface is given in CostFactory. A valid cost factory must also satisfy the interface, as well as various contracts specified in the documentation of CostFactory. These contracts ensure that the concept of a cost for a layout is well-defined, and make it possible to efficiently find a layout with minimal cost.

See default_cost_factory and the paper for examples of cost factories.

Custom Cost Factory

Consider the example in Best Practice for Document Construction. Each list node can be rendered with three possible styles: the horizontal style, the vertical style, and the argument list style.

let example_sexp =
  List [Atom "abc"; Atom "def"; List [Atom "ghi"; Atom "jkl"; Atom "mno"]]
# print_sexp example_sexp 15;;
(abc
 def
 (ghi jkl mno))
- : unit = ()

Indeed, this is an optimal layout according to default_cost_factory (though not the only optimal layout), because it does not have any badness, and two newlines are minimal.

However, let’s say that we consider the vertical style to be not as pretty. The vertical style should still be a possibility however, since it can help us avoid going over the page width limit and minimize the number of newlines in many situations. We simply would prefer other styles when all else is equal. In this case, we would prefer the output:

(abc def
     (ghi jkl
          mno))

To address this issue, we construct a new cost factory that is similar to default_cost_factory, but with an extra component: the style cost.

let my_cost_factory ~page_width ?computation_width () =
  (module struct
    let cf = Printer.default_cost_factory ~page_width:page_width ?computation_width:computation_width ()
    module F = (val cf)

    type t = F.t * int

    let limit = F.limit

    let text pos len = F.text pos len, 0

    let newline _ = F.newline 0, 0

    let combine (c1, s1) (c2, s2) =
      F.combine c1 c2, s1 + s2

    let le (c1, s1) (c2, s2) =
      if c1 = c2 then
        s1 <= s2
      else
        F.le c1 c2

    let two_columns_overflow w = F.two_columns_overflow w, 0

    let two_columns_bias w = F.two_columns_bias w, 0

    let string_of_cost (c, s) = Printf.sprintf "(%s %d)" (F.string_of_cost c) s

    let debug_format = F.debug_format
  end: Signature.CostFactory with type t = (int * int * int) * int)

We now construct a function to convert an S-expression into a document, and utilize the cost construct to add a style cost to the vertical style document, thus discouraging it.

let revised_print_sexp (s : sexp) (w : int) =
  let cf = my_cost_factory ~page_width:w () in
  let module P = Printer.Make (val cf) in
  let open P in

  let acat = fold_doc (fun x y -> x <+> space <+> y) in

  let rec pretty (s : sexp) =
    match s with
    | Atom s -> text s
    | List [] -> lparen <+> rparen
    | List [x] -> lparen <+> pretty x <+> rparen
    | List (x :: xs) ->
      let x_d = pretty x in
      let xs_d = List.map pretty xs in
      lparen <+>
      (acat (x_d :: xs_d) <|> (* the horizontal style *)
       (cost ((0, 0, 0), 1) (vcat (x_d :: xs_d))) <|> (* the vertical style -- penalized *)
       (x_d <+> space <+> vcat xs_d)) <+> (* the argument list style *)
      rparen
  in
  pretty_print print_string (pretty s)

Now we can pretty print as we desired:

# revised_print_sexp example_sexp 15;;
(abc def
     (ghi jkl
          mno))
- : unit = ()

This does not mean that the vertical style won't ever be used, however. It is simply discouraged. With an even lower page width limit, the vertical style is the only way to avoid overflow, so it is employed.

# revised_print_sexp example_sexp 10;;
(abc
 def
 (ghi
  jkl
  mno))
- : unit = ()