pretty_expressive
libraryThis library implements a pretty expressive printer, following the algorithm presented in A Pretty Expressive Printer (OOPSLA'23). The pretty printer is expressive, provably optimal, and practically efficient.
open Pretty_expressive
(* Sets the page width limit to 80 *)
let cf = Printer.default_cost_factory ~page_width:80 ()
module P = Printer.Make (val cf)
open P
# pretty_format (text "Hello" ^^ text " World!");;
- : string = "Hello World!"
See the documentation for the module Pretty_expressive
.
General-purpose pretty printing is a process that produces human readable text from structured data. Users encode the structured data together with styling choices in an abstract document doc
. This document contains printing instructions: things like text, newlines, and indentation. It can also contain choices (<|>
) between two or more alternatives, resulting in many possible layouts for a document. Pretty_expressive
’s job is to pick a prettiest layout (according to a specified optimality objective) from among all of the choices. E.g., the one that minimizes the number of lines while not exceeding the page width limit.
Here’s a simple example that pretty prints a document encoding a fragment of code.
open Pretty_expressive
(** Prints the example document [d] with the page width limit [w] *)
let print_doc (w : int) =
let cf = Printer.default_cost_factory ~page_width:w () in
let module P = Printer.Make (val cf) in
let open P in
let d = text "while (true) {" ^^
nest 4
(nl ^^ text "f();" ^^ nl ^^ text "if (done())" ^^
(let exit_d = text "exit();" in
(space ^^ exit_d) <|> nest 4 (nl ^^ exit_d))) ^^
nl ^^ text "}"
in
pretty_print print_string d
There is a lot of code above, so let's unpack it.
In the above code, let d = text "while (true) {" ^^ ....
defines a document d
.
text
prints a string.^^
prints a concatenation of two documents.nl
prints a newline.nest
adds indentation level so that nl
adds indentation spaces.<|>
creates a choice.As a result, the above document d
encodes two possible layouts:
while (true) { f(); if (done()) exit(); }
and
while (true) { f(); if (done()) exit(); }
Most pretty printers are parameterized by a page width limit, which indicates the number of characters that each line should not exceed. Pretty_expressive
is instead parameterized by a cost factory, which can control not only the page width limit, but also other aspects of prettiness. For the sake of simplicity, we will for now use a pre-defined cost factory Printer.default_cost_factory
, which only allows the page width limit adjustment through the labeled argument ~page_width
. Thus, let cf = Printer.default_cost_factory ~page_width:w ()
creates a (first-class module) cost factory cf
that sets the page width limit to w
.
The pretty printer can then be instantiated by using Printer.Make
. It is a functor that consumes a CostFactory
module. Here, let module P = Printer.Make (val cf)
creates a pretty printer P
with the above cost factory.
We then let open P
so that combinators like text
, ^^
, and the pretty printing function pretty_print
are in scope.
With the above setup, we can pretty-print d
with the cost factory cf
by calling pretty_print print_string d
, which uses print_string
to output content.
Let's now actually use the pretty printer with page width limit of 80.
# print_doc 80;;
while (true) {
f();
if (done()) exit();
}
- : unit = ()
It outputs the first layout because the layout fits the page width limit, while having fewer lines compared to the second layout. By contrast, with the page width limit of 20, the second layout is now chosen.
# print_doc 20;;
while (true) {
f();
if (done())
exit();
}
- : unit = ()
This is because the first layout does not fit the page width limit, leaving the second layout as the only option.
There are many ways to construct a document that encodes the same set of layouts. Some may be easier than the other.
For example, another way to construct a document for the above example could be:
let d = text "while (true) {" ^^
nest 4
(nl ^^ text "f();" ^^ nl ^^ text "if (done())" ^^
group (nest 4 (nl ^^ text "exit();"))) ^^
nl ^^ text "}"
Here, the group
combinator is used. It creates a choice: whether to collapse nl
to a single space or not.
See Printer.Make
for the full list of combinators that we provide. Since combinators are simply regular functions, users may also compose the existing combinators together to create new user-defined combinators.
In this section, we explain some important concepts. The full design consideration of Pretty_expressive
can be found in the paper.
While we can put arbitrary sub-documents as the operands of the choice combinator <|>
, we should share sub-documents across choices. This matters because the performance of Pretty_expressive
depends on the DAG size of the input document. Without sharing, the input document would get unfolded into a tree, whose size could be exponentially large compared to the possible DAG size.
As an example, say we want to pretty print an S-expression with three possible styles for each list node: the horizontal style, the vertical style, and the argument list style. That is:
(a b c d)
could be printed as itself or
(a b c d)
or
(a b c d)
We can construct a function to convert an S-expression to a document as follows:
type sexp = Atom of string | List of sexp list
let print_sexp (s : sexp) (w : int) =
let cf = Printer.default_cost_factory ~page_width:w () in
let module P = Printer.Make (val cf) in
let open P in
let acat = fold_doc (fun x y -> x <+> space <+> y) in
let rec pretty (s : sexp) =
match s with
| Atom s -> text s
| List [] -> lparen <+> rparen
| List [x] -> lparen <+> pretty x <+> rparen
| List (x :: xs) ->
let x_d = pretty x in
let xs_d = List.map pretty xs in
lparen <+>
(* Share x_d and xs_d across choices *)
(acat (x_d :: xs_d) <|> (* the horizontal style *)
vcat (x_d :: xs_d) <|> (* the vertical style *)
(x_d <+> space <+> vcat xs_d)) <+> (* the argument list style *)
rparen
in
pretty_print print_string (pretty s)
The important point is that we reuse x_d
and xs_d
across <|>
. Had we written the following code instead, the document construction could take exponential time, and the resulting document whose DAG size is very large would also cause pretty-printing to be inefficient.
(* Don't do this! *)
lparen <+>
(acat (pretty x :: List.map pretty xs) <|> (* the horizontal style *)
vcat (pretty x :: List.map pretty xs) <|> (* the vertical style *)
(pretty x <+> space <+> vcat List.map pretty xs)) <+> (* the argument list style *)
rparen
Unlike other pretty printers, Pretty_expressive
needs an additional parameter: the computation width limit. Regular users should not need to concern much with this parameter (default_cost_factory
will already provide a sensible value for the parameter), but advanced users who need a fine-grained control may want to adjust this parameter.
The parameter is used by the pretty printer to bound the computation. On the flip side, the pretty printer is only guaranteed to return an optimal layout among layout printings that do not exceed the parameter. As a result, if the parameter is too high, the performance could be negatively impacted, but if the parameter is too low, the output layout quality could be negatively impacted.
Pretty_expressive
employs various heuristics to make the output layout pretty even when the computation width limit is exceeded, however. In most applications, the value of 1.2 \times w
should suffice, where w
is the page width limit.
A layout printing exceeds the computation width limit when either the column position or indentation level exceeds the limit. For example:
text "Racket"
exceeds the computation width limit of 5, since there are 6 characters in a line. Similarly:
nest 6 (text "Rack")
exceeds the computation width limit of 5, since the indentation level exceeds 5 (even though this indentation level is completely unused).
When all possible layout printing due to a document exceeds the computation width limit, Pretty_expressive
will still output a layout, with no guarantee that the layout is optimal. In such case, we say that the output layout is tainted. The info
record and functions such as pretty_print_info
can be used to find if the output layout is tainted or not.
Pretty printers choose an optimal layout from a document by minimizing an optimality objective. Unlike other pretty printers, which have built-in optimality objectives, Pretty_expressive
allows users to customize an optimality objective via the cost factory.
The cost factory interface is given in CostFactory
. A valid cost factory must also satisfy the interface, as well as various contracts specified in the documentation of CostFactory
. These contracts ensure that the concept of a cost for a layout is well-defined, and make it possible to efficiently find a layout with minimal cost.
See default_cost_factory
and the paper for examples of cost factories.
Consider the example in Best Practice for Document Construction. Each list node can be rendered with three possible styles: the horizontal style, the vertical style, and the argument list style.
let example_sexp =
List [Atom "abc"; Atom "def"; List [Atom "ghi"; Atom "jkl"; Atom "mno"]]
# print_sexp example_sexp 15;;
(abc
def
(ghi jkl mno))
- : unit = ()
Indeed, this is an optimal layout according to default_cost_factory
(though not the only optimal layout), because it does not have any badness, and two newlines are minimal.
However, let’s say that we consider the vertical style to be not as pretty. The vertical style should still be a possibility however, since it can help us avoid going over the page width limit and minimize the number of newlines in many situations. We simply would prefer other styles when all else is equal. In this case, we would prefer the output:
(abc def (ghi jkl mno))
To address this issue, we construct a new cost factory that is similar to default_cost_factory
, but with an extra component: the style cost.
let my_cost_factory ~page_width ?computation_width () =
(module struct
let cf = Printer.default_cost_factory ~page_width:page_width ?computation_width:computation_width ()
module F = (val cf)
type t = F.t * int
let limit = F.limit
let text pos len = F.text pos len, 0
let newline _ = F.newline 0, 0
let combine (c1, s1) (c2, s2) =
F.combine c1 c2, s1 + s2
let le (c1, s1) (c2, s2) =
if c1 = c2 then
s1 <= s2
else
F.le c1 c2
let two_columns_overflow w = F.two_columns_overflow w, 0
let two_columns_bias w = F.two_columns_bias w, 0
let string_of_cost (c, s) = Printf.sprintf "(%s %d)" (F.string_of_cost c) s
let debug_format = F.debug_format
end: Signature.CostFactory with type t = (int * int * int) * int)
We now construct a function to convert an S-expression into a document, and utilize the cost
construct to add a style cost to the vertical style document, thus discouraging it.
let revised_print_sexp (s : sexp) (w : int) =
let cf = my_cost_factory ~page_width:w () in
let module P = Printer.Make (val cf) in
let open P in
let acat = fold_doc (fun x y -> x <+> space <+> y) in
let rec pretty (s : sexp) =
match s with
| Atom s -> text s
| List [] -> lparen <+> rparen
| List [x] -> lparen <+> pretty x <+> rparen
| List (x :: xs) ->
let x_d = pretty x in
let xs_d = List.map pretty xs in
lparen <+>
(acat (x_d :: xs_d) <|> (* the horizontal style *)
(cost ((0, 0, 0), 1) (vcat (x_d :: xs_d))) <|> (* the vertical style -- penalized *)
(x_d <+> space <+> vcat xs_d)) <+> (* the argument list style *)
rparen
in
pretty_print print_string (pretty s)
Now we can pretty print as we desired:
# revised_print_sexp example_sexp 15;;
(abc def
(ghi jkl
mno))
- : unit = ()
This does not mean that the vertical style won't ever be used, however. It is simply discouraged. With an even lower page width limit, the vertical style is the only way to avoid overflow, so it is employed.
# revised_print_sexp example_sexp 10;;
(abc
def
(ghi
jkl
mno))
- : unit = ()