Yorick Peterse

How to write a code formatter

2024-04-13T12:00:00Z

Ask ten developers how they think a certain piece of code should be formatted, and you'll likely get ten different opinions. Worse, these opinions are almost never based on facts. Instead, when you ask why they prefer style X over Y the answer is almost always the equivalent of "I just do".

What if we could sidestep this entire debate and let a computer decide for us? No, I'm not talking about asking ChatGPT to format your code for you, I'm talking about "code formatters".

A code formatter is a program that takes your source code as its input, formats it using a particular style and then writes it back to disk or STDOUT. While such tools have existed for a long time, their usage has become increasingly more popular in the last 15 years or so. Go's gofmt in particular appears to have been a driving force behind the move towards using code formatters more, as many popular formatters in use today started showing up in the years following the release of gofmt. To illustrate, here's a short list of various formatters that appear to be reasonably popular, along with the year in which they were first introduced:

Formatter	Language	First introduced in
autopep8	Python	2010
gofmt	Go	2013
rustfmt	Rust	2015
google-java-format	Java	2015
prettier	JavaScript, HTML, and more	2016
rufo	Ruby¹	2017
Standard Ruby	Ruby¹	2018
mix format	Elixir	2017
black	Python	2018
erlfmt	Erlang	2019
inko fmt	Inko²	2024

I suspect it's not so much that gofmt in itself is a particular noteworthy formatter (other than not allowing you to configure it in any way, as it should be), but rather that Go itself is incredibly popular and thus subjected many developers to the beauty of not having to worry about manually formatting your code. This then caught on over time, resulting in an increase in the number of available code formatters since the introduction of gofmt.

So how do you actually build a code formatter? Does it require decades of experience working with Haskell and mastering the ways of the monad? Or maybe you have to read hundreds of computer science papers to understand the deeper meaning of the lambda? What about acquiring a crippling student debt by studying at MIT for four years in an attempt to better understand computer science as a whole?

No, writing a decent code formatter is in fact straightforward, it just isn't explained in a simple way, like so many other topics in computer science. Lucky for you, I recently spent several weeks writing a code formatter for Inko, so naturally I'm now an expert on everything related to code formatting.

The setup we'll take a look at in this article is based on Inko's formatter, which in turn is based on Prettier and the paper "A prettier printer" (which Prettier is also based on, if I'm not mistaken). The paper itself is somewhat mundane and I've already forgotten 80% of it, but the concept is deviously simple.

We'll be using Inko as the language of choice to show how to write a formatter, but it should be easy enough to translate the code into different languages.

Oh, and before I forget: if you're also interested in learning how pattern matching is implemented, take a look at this Git repository that contains two implementations in Rust. Like the code we'll discuss today, the Rust code is well documented and should be easy to understand. Fun fact: Gleam based its implementation of pattern matching on this exact code. Neat!

Nodes and trees

The basic idea of the formatter is as follows: we take an Abstract Syntax Tree (AST) of sorts, specifically one that includes comments, and convert that into a formatting tree. The formatting tree has various nodes, such as "just render this text" or "try to fit all sub nodes onto a single line". After constructing the tree, we visit each node and render it to a string. The resulting string is then written to a file or STDOUT.

Our tree will be created using a sum type, or "enum". In Inko, you define an enum as follows:

class enum Letter {
  case A
  case B
  case C
}

The Rust equivalent is the following:

enum Letter {
  A,
  B,
  C,
}

In Inko, enum cases can wrap values when defined like so:

class enum Option[T] {
  # This case stores some value of type "T", whatever that is.
  case Some(T)
  case None
}

In Inko, you create an instance of an enum like so:

Option.Some(42)
Option.None

For our tree, we'll start with the basic definition:

class enum Node {}

Now let's look at the different nodes we'll need.

Text

The two most basic nodes of our tree are Text(value) and Unicode(value, size).

The Text node stores an ASCII string (e.g. keywords in your language), while the Unicode node stores a string containing one or more multi-byte characters, along with its size expressed as the number of extended grapheme clusters. The size for Unicode nodes is cached because depending on the structure of our tree, we may end up having to calculate the width of such a node multiple times. Since counting grapheme clusters is an O(n) operation, caching this value speeds things up a bit.

We define these nodes as follows:

class enum Node {
  case Text(String)
  case Unicode(String, Int)
}

The String arguments store the string to render, while the Int argument is used to store the number of extended grapheme clusters. For the Unicode node we'll also add a helper method to make constructing them a little easier:

class enum Node {
  case Text(String)
  case Unicode(String, Int)

  fn static unicode(value: String) -> Node {
    # `value.chars` returns an iterator over the extended grapheme clusters,
    # and `count` simply counts them.
    Node.Unicode(value, value.chars.count)
  }
}

Using this method, we construct the Unicode nodes as follows:

Node.unicode('this is the string to render')

Whitespace and indentation

For handling whitespace and indentation we'll define three nodes: SpaceOrLine, Line, and Indent.

SpaceOrLine is a node that renders to a space if it resides in a group that doesn't need wrapping, and renders to a line when wrapping is needed.

Line is a node that renders to a new line if it resides in a group that needs wrapping, otherwise it renders to nothing.

Indent(nodes) is a node that renders one or more nodes, indenting each new line, but only if it resides in a group that for which wrapping is needed.

In Inko, we define these nodes like so:

class enum Node {
  ...
  case SpaceOrLine
  case Line
  case Indent(Array[Node])
}

To help understand these nodes and when to use them, consider the following array we want to format:

[100, 200]

We'll construct the following tree to format this array:

# I'll explain what "Group" is in just a moment.
Node.Group(
  0,
  [
    Node.Text('['),
    Node.Line,
    Node.Indent(
      [
        Node.Text('100'),
        Node.Text(','),
        Node.SpaceOrLine,
        Node.Text('200')
      ]
    ),
    Node.Line,
    Node.Text(']')
  ]
)

When no wrapping is needed, the array is rendered as-is, because Line is rendered to nothing, Indent only indents when wrapping is needed, and SpaceOrLine renders to a space. When wrapping is needed, the array is rendered as follows:

Grouping nodes

To group nodes together, we can use one of two nodes: Group or Nodes.

Group(id, nodes) is a collection of nodes that we try to fit onto the current line. If this doesn't fit, each sub node is placed on its own line. Each group has an ID (just a number in the range 0 <= id <= N) unique to the document that we're formatting.

When nesting Group nodes (e.g. Group -> something else -> Group), the need for wrapping is checked on a per group basis. This means that if an outer Group requires wrapping, this doesn't immediately force all child groups to also wrap.

Nodes(nodes) is a collection of nodes that we just render without any special handling. This makes it easier code wise to have certain helper functions that produce multiple nodes that we just want to concatenate together.

We define these nodes like so:

class enum Node {
  ...
  case Group(Int, Array[Node])
  case Nodes(Array[Node])
}

The Int argument is the group ID, while the Array[Node] arguments store the child nodes.

When constructing the Group nodes we'll need to keep track of the next ID to use. This is done by storing a counter somewhere, taking the existing value for the new Group, followed by incrementing it:

let id = the_id_counter

the_id_counter += 1
Node.Group(id, nodes)

In Inko we can shorten this to the following:

Node.Group(the_id_counter := the_id_counter + 1, nodes)

The := operator assigns the variable a new value, returning the previous value. In contrast, the = operator discards the old value.

Conditional formatting

The last node we'll introduce is the IfWrap(id, A, B) node. This is a node that renders node A if the group using ID id needs to be wrapped, otherwise it renders node B.

Using the array example shown earlier, we can use this node to add a trailing comma when wrapping is necessary by using this tree:

Node.Group(
  0,
  [
    Node.Text('['),
    Node.Line,
    Node.Indent(
      [
        Node.Text('100'),
        Node.Text(','),
        Node.SpaceOrLine,
        Node.Text('200'),
        Node.IfWrap(0, Node.Text(','), Node.Text(''))
      ]
    ),
    Node.Line,
    Node.Text(']')
  ]
)

When wrapping is needed, the array is now rendered as follows:

[
  100,
  200,
]

Computing widths

When formatting trees, we need to know how many characters a node occupies on the current line, as this is used to determine if wrapping is needed. This means we'll need a method to compute the width of a Node, which we'll define as follows:

class enum Node {
  ...

  fn width(wrapped: ref Set[Int]) -> Int {
    match self {
      case Nodes(nodes) or Group(_, nodes) or Indent(nodes) -> {
        Int.sum(nodes.iter.map(fn (n) { n.width(wrapped) }))
      }
      case IfWrap(id, node, _) if wrapped.contains?(id) -> node.width(wrapped)
      case IfWrap(_, _, node) -> node.width(wrapped)
      case Text(str) -> str.size
      case Unicode(_, chars) -> chars
      case SpaceOrLine -> 1
      case _ -> 0
    }
  }
}

The wrapped argument is an immutable borrow of a hash set containing the IDs of all groups that we've processed thus far and that needed to be wrapped. The return value is the width as an integer. In the body we pattern match against the current node (self). For nodes that contain other nodes, such as Nodes and Group, the width is the sum of the widths of all child nodes.

For IfWrap we have to calculate the width differently based on whether wrapping is needed or not. This is also why we can't compute the width once and cache it: the width for a deeply nested node may change based on the wrapping needs of parent nodes.

For Text we use String.size to get the size in bytes (which happens to also be its character count, as Text nodes only store ASCII text), while for Unicode nodes we use the pre-computed grapheme cluster count.

The implementation is a recursive algorithm instead of an iterative one, mainly for the sake of simplicity and because it's good enough due to formatting trees typically not being that deeply nested.

The final result is as follows:

class enum Node {
  case Group(Int, Array[Node])
  case Nodes(Array[Node])
  case IfWrap(Int, Node, Node)
  case Text(String)
  case Unicode(String, Int)
  case SpaceOrLine
  case Line
  case Indent(Array[Node])

  fn static unicode(value: String) -> Node {
    Node.Unicode(value, value.chars.count)
  }

  fn width(wrapped: ref Set[Int]) -> Int {
    match self {
      case Nodes(nodes) or Group(_, nodes) or Indent(nodes) -> {
        Int.sum(nodes.iter.map(fn (n) { n.width(wrapped) }))
      }
      case IfWrap(id, node, _) if wrapped.contains?(id) -> node.width(wrapped)
      case IfWrap(_, _, node) -> node.width(wrapped)
      case Text(str) -> str.size
      case Unicode(_, chars) -> chars
      case SpaceOrLine -> 1
      case _ -> 0
    }
  }
}

Tracking the need for wrapping

When traversing the formatting tree, we need to record if wrapping is needed or not for a particular sub tree. To do so, we'll introduce a Wrap enum that can be in one of two states: Enable, meaning wrapping is needed, or Detect meaning we need to detect it based on the width. Detect is the default state:

class enum Wrap {
  case Enable
  case Detect

  fn enable? -> Bool {
    match self {
      case Enable -> true
      case _ -> false
    }
  }
}

The Wrap.enable? method is added to make it a little easier to check if wrapping is needed, without having to manually pattern match against the Wrap enum.

Lowering ASTs into formatting trees

To lower the AST into a formatting tree, we'll need a type that visits the nodes in the AST and returns their corresponding Node values. We'll also need a type that takes a Node and converts it to formatted source code as a string, along with tracking the necessary state such as line lengths. For this we'll introduce two types: Builder and Generator.

The Builder type is used to define the necessary methods for visiting the AST nodes, returning their corresponding Node values. The Generator type is used to convert those Node values to strings.

For the sake of simplicity, we'll restrict the code shown in this article to handling simple function calls, text literals and strings.

The Generator type

The basic layout of the Generator type is as follows:

class Generator {
  # This field is the buffer we'll write our formatted code into.
  let @buffer: StringBuffer

  # This field tracks the indentation levels, not the number of indentation
  # characters (i.e. if you use 2 spaces for indentation, you increment this
  # field by one).
  let @indent: Int

  # The number of characters/extended grapheme clusters on the current line.
  let @size: Int

  # The maximum number of characters we allow per line. If your formatter
  # doesn't allow users to change this value, you probably want to turn this
  # into a constant instead.
  let @max: Int

  # A hash set containing all the groups that need to be wrapped.
  let @wrapped: Set[Int]

  fn static new(max: Int) -> Generator {
    Generator {
      @buffer = StringBuffer.new,
      @indent = 0,
      @size = 0,
      @max = max,
      @wrapped = Set.new,
    }
  }
}

StringBuffer is a type that we can push String values in and concatenate together, without producing intermediate String values.

To use this type, we define a generate method that takes a Node, renders it to a String and stores the String in the buffer of the Generator type:

class Generator {
  ...

  fn mut generate(node: Node) {
    node(node, ref Wrap.Detect)
  }

  fn mut node(node: Node, wrap: ref Wrap) {

  }
}

The generate method just calls the node method with a default value for the wrap argument. If your language of choice supports default arguments, this won't be necessary and you can instead merge the two methods into a single method.

Inko uses single ownership for memory management. The generate method takes over ownership of the Node passed to it, because the type of the node argument is Node and not e.g. ref Node (which is an immutable borrow). The expression ref Wrap.Detect creates an instance of the Wrap.Detect case, then passes an immutable borrow of that value to the node method. This borrow is valid until we return from the call to node.

Before we implement the node method, we'll add two helper methods to the Generator type and define a constant containing the characters to use for indenting lines:

let INDENT = '  '

class Generator {
  ...

  fn mut text(value: String, chars: Int) {
    @size += chars
    @buffer.push(value)
  }

  fn mut new_line {
    @size = INDENT.size * @indent
    @buffer.push('\n')
    @indent.times(fn (_) { @buffer.push(INDENT) })
  }
}

In Inko, both single and double quoted string literals support escape sequences such as \n and \t. In fact, they are exactly the same. In other languages (e.g. Rust) you likely need to use double quotes, so keep that in mind.

The text method adds a String of chars extended grapheme clusters to the buffer. The new_line method adds a new line such, while making sure to indent the new line. The INDENT constant defines the characters to use for indenting lines. In this case we're using two spaces, but it could be four spaces, a tab, a tab and three spaces, or something else.

Now we can take a look at the node method. We'll start with the basic structure, then step through rendering each node one by one:

fn mut node(node: Node, wrap: ref Wrap) {
  match node {
    case Nodes(nodes) -> {}
    case Group(id, nodes) -> {}
    case IfWrap(id, node, _) if @wrapped.contains?(id) -> {}
    case IfWrap(_, _, node) -> {}
    case Text(str) -> {}
    case Unicode(str, width) -> {}
    case Line if wrap.enable? -> {}
    case SpaceOrLine if wrap.enable? -> {}
    case SpaceOrLine -> {}
    case Indent(nodes) if wrap.enable? -> {}
    case Indent(nodes) -> {}
    case _ -> {}
  }
}

If you're having a hard time understanding Inko's pattern matching syntax, you can learn more about it in the documentation.

Rendering Nodes

Rendering the Nodes node is easy: we iterate over the child nodes, and render them individually. Similar to the Node.width method we'll be using a recursive algorithm. While you can turn this into an iterative algorithm, it gets a bit tricky and I'm not sure it would actually perform better in practice. The code for rendering Nodes is as follows:

fn mut node(node: Node, wrap: ref Wrap) {
  match node {
    case Nodes(nodes) -> nodes.into_iter.each(fn (n) { node(n, wrap) })
    ...
  }
}

Inko doesn't have for loops, instead you use iterators and closures. nodes.into_iter moves the nodes Array into an iterator over the Node values. We then use the each method of the iterator type to call node for each value.

Rendering Group

Rendering Group nodes is where things get interesting. First, we need to calculate the width of the child nodes, then we need to check if we can fit them onto the current line. If so, we'll do just that, otherwise we'll render each child node on its own line:

fn mut node(node: Node, wrap: ref Wrap) {
  match node {
    ...
    case Group(id, nodes) -> {
      let width = Int.sum(nodes.iter.map(fn (n) { n.width(@wrapped) }))
      let wrap = if @size + width > @max {
        @wrapped.insert(id)
        Wrap.Enable
      } else {
        Wrap.Detect
      }

      nodes.into_iter.each(fn (n) { node(n, wrap) })
    }
  }
}

Let's break this down, starting with this line:

let width = Int.sum(nodes.iter.map(fn (n) { n.width(@wrapped) }))

This iterates over the child nodes (without taking ownership, hence the use of iter and not into_iter), computes the width for each node, then sums up the result using Int.sum(). Note how we pass the wrapped hash set to each call to width, this is needed so we can calculate the correct width based on the wrapping needs of any Group nodes.

Next, we see if the nodes fit on the current line:

let wrap = if @size + width > @max {
  @wrapped.insert(id)
  Wrap.Enable
} else {
  Wrap.Detect
}

We check if the current line size plus the calculated width doesn't exceed the line limit. If it does, we track the current Group ID in the wrapped hash set, and use Wrap.Enable to signal wrapping of the child nodes is necessary, otherwise we use Wrap.Detect. We when render the child nodes, passing this Wrap value as an immutable borrow to each call to node:

nodes.into_iter.each(fn (n) { node(n, wrap) })

Performance wise it would be nice if we could somehow cache the output of width to speed things up a bit, but I haven't found a way of doing so. Fortunately, it shouldn't matter much as the resulting setup is fast enough. For example, Inko's code formatter can process around 240 000 lines per second using this exact algorithm, which is more than fast enough.

Rendering IfWrap

Rendering IfWrap nodes is straightforward: we check if the target group ID is in the wrapped set or not, and render the appropriate child node:

fn mut node(node: Node, wrap: ref Wrap) {
  match node {
    ...
    case IfWrap(id, node, _) if @wrapped.contains?(id) -> {
      node(node, Wrap.Enable)
    }
    case IfWrap(_, _, node) -> node(node, wrap)
  }
}

The if @wrapped bit is a pattern matching guard, so the body is only evaluated if both the pattern and the guard match.

Rendering Text and Unicode

Rendering of Text and Unicode nodes is straightforward:

fn mut node(node: Node, wrap: ref Wrap) {
  match node {
    ...
    case Text(str) -> text(str, str.size)
    case Unicode(str, width) -> text(str, width)
  }
}

We use the text helper method defined earlier. For the Text node we use String.size to pass the byte size (and thus character, as Text is for ASCII-only text), and for Unicode nodes we pass the pre-computed extended grapheme clusters count.

Rendering whitespace

We render the various whitespace nodes (Line, SpaceOrLine and Indent) as follows:

fn mut node(node: Node, wrap: ref Wrap) {
  match node {
    ...
    case Line if wrap.enable? -> new_line
    case SpaceOrLine if wrap.enable? -> new_line
    case SpaceOrLine -> text(' ', chars: 1)
    case Indent(nodes) if wrap.enable? -> {
      @size += INDENT.size
      @indent += 1
      @buffer.push(INDENT)
      nodes.into_iter.each(fn (n) { node(n, wrap) })
      @indent -= 1
    }
    case Indent(nodes) -> nodes.into_iter.each(fn (n) { node(n, wrap) })
    case _ -> {}
  }
}

For Line and SpaceOrLine we call new_line if wrapping is necessary. If no wrapping is needed we ignore the Line node (covered by the wildcard _ pattern at the end), while a SpaceOrLine is turned into a single space by calling the text helper method.

For Indent nodes we first increment the line size, because we start indentation at the current line, then we increment the indent level (not the number of indent characters) and add the indentation text to the current line. We then render the child nodes, and reset the indentation level to its previous value.

Converting Generator into a String

Once we're done with the Generator type, we want to turn the internal buffer into a String we can write to a file or STDOUT. To make this easy, we'll implement the IntoString trait from the std.string module:

impl IntoString for Generator {
  fn pub move into_string -> String {
    @buffer.into_string
  }
}

Given an instance of Generator, we can then use Generator.into_string to move the Generator into a String.

The final Generator type

Combining all this, our Generator type ends up looking like this:

class Generator {
  let @buffer: StringBuffer
  let @indent: Int
  let @size: Int
  let @max: Int
  let @wrapped: Set[Int]

  fn static new(max: Int) -> Generator {
    Generator {
      @buffer = StringBuffer.new,
      @indent = 0,
      @size = 0,
      @max = max,
      @wrapped = Set.new,
    }
  }

  fn mut generate(node: Node) {
    node(node, ref Wrap.Detect)
  }

  fn mut node(node: Node, wrap: ref Wrap) {
    match node {
      case Nodes(nodes) -> nodes.into_iter.each(fn (n) { node(n, wrap) })
      case Group(id, nodes) -> {
        let width = Int.sum(nodes.iter.map(fn (n) { n.width(@wrapped) }))
        let wrap = if @size + width > @max {
          @wrapped.insert(id)
          Wrap.Enable
        } else {
          Wrap.Detect
        }

        nodes.into_iter.each(fn (n) { node(n, wrap) })
      }
      case IfWrap(id, node, _) if @wrapped.contains?(id) -> {
        node(node, Wrap.Enable)
      }
      case IfWrap(_, _, node) -> node(node, wrap)
      case Text(str) -> text(str, str.size)
      case Unicode(str, width) -> text(str, width)
      case Line if wrap.enable? -> new_line
      case SpaceOrLine if wrap.enable? -> new_line
      case SpaceOrLine -> text(' ', chars: 1)
      case Indent(nodes) if wrap.enable? -> {
        @size += INDENT.size
        @indent += 1
        @buffer.push(INDENT)
        nodes.into_iter.each(fn (n) { node(n, wrap) })
        @indent -= 1
      }
      case Indent(nodes) -> nodes.into_iter.each(fn (n) { node(n, wrap) })
      case _ -> {}
    }
  }

  fn mut text(value: String, chars: Int) {
    @size += chars
    @buffer.push(value)
  }

  fn mut new_line {
    @size = INDENT.size * @indent
    @buffer.push('\n')
    @indent.times(fn (_) { @buffer.push(INDENT) })
  }
}

impl IntoString for Generator {
  fn pub move into_string -> String {
    @buffer.into_string
  }
}

The Builder type

The Builder type is used to visit the AST and turn each AST node into its corresponding Node value. This is typically where most of the complexity resides, as it's where you'll deal with constructing your formatting rules, edge cases, and more. For the sake of keeping things easy to understand, our Builder type only supports simple function calls, string literals and regular text literals (e.g. simple integers).

We'll start with the basic definition of this type, which is as follows:

class Builder {
  let @id: Int

  fn static new -> Builder {
    Builder { @id = 0 }
  }

  fn mut new_id -> Int {
    @id := @id + 1
  }
}

The id field is used to keep track of the next ID to use for Group nodes. The new_id method is used to request a new ID and automatically update the id field.

Strings

For strings, we'll define a string method as follows:

class Builder {
  ...
  fn mut string(value: String) -> Node {
    Node.Group(new_id, [Node.Text('"'), Node.unicode(value), Node.Text('"')])
  }
}

This method constructs a Group node that represents a double quoted string. While the argument it takes is a regular String in this example, in a real formatter this would be something along the lines of a StringLiteral AST node of sorts, containing the actual string value along with extra data (e.g. the source location).

Function calls

For function calls we'll define a call method with two argument names: the name as a String, and the argument nodes as an array of Node values:

class Builder {
  ...
  fn mut call(name: String, arguments: Array[Node]) -> Node {
    let id = new_id

    if arguments.empty? {
      return Node.Group(id, [Node.Text(name), Node.Text('()')])
    }

    let max = arguments.size - 1
    let vals = arguments
      .into_iter
      .with_index
      .map(fn (index_and_node) {
        match index_and_node {
          case (index, node) if index < max -> {
            Node.Nodes([node, Node.Text(','), Node.SpaceOrLine])
          }
          case (_, node) -> {
            Node.Nodes([node, Node.IfWrap(id, Node.Text(','), Node.Text(''))])
          }
        }
      })
      .to_array

    Node.Group(
      id,
      [
        Node.Text(name),
        Node.Group(
          new_id,
          [
            Node.Text('('),
            Node.Line,
            Node.Indent(vals),
            Node.Line,
            Node.Text(')'),
          ],
        ),
      ],
    )
  }
}

We start of by requesting a new group ID, then we check if we have any arguments to process. If not, we return a simple Group node that renders to NAME() where NAME is the function name.

If we do have arguments, we turn the list of Node values into a comma separated list, with a trailing comma after the last value that only shows up if wrapping is necessary:

let max = arguments.size - 1
let vals = arguments
  .into_iter
  .with_index
  .map(fn (index_and_node) {
    match index_and_node {
      case (index, node) if index < max -> {
        Node.Nodes([node, Node.Text(','), Node.SpaceOrLine])
      }
      case (_, node) -> {
        Node.Nodes([node, Node.IfWrap(id, Node.Text(','), Node.Text(''))])
      }
    }
  })
  .to_array

Here we turn arguments into an iterator over Node values, then we create a new iterator using with_index that yields values in the form of (index, value). We do this so we know when we're processing the last value, such that we can insert a trailing comma that only shows up when wrapping is necessary. Inko doesn't support pattern matching in closure arguments or let bindings at this stage, so we need to explicitly match the index_and_node tuple into its components.

The rest is straightforward: for all but the last argument we produce a Nodes that contains the value, followed by a comma and a SpaceOrLine node. For the last argument we instead produce a Node that contains the last argument followed by an IfWrap, which renders to a comma only when wrapping is necessary.

As map creates a new iterator (and everything is done lazily), we need to convert the result to an array using the to_array method, such that we can store the resulting array in a Node.

The Node returned is as follows:

Node.Group(
  id,
  [
    Node.Text(name),
    Node.Group(
      new_id,
      [
        Node.Text('('),
        Node.Line,
        Node.Indent(vals),
        Node.Line,
        Node.Text(')'),
      ],
    ),
  ],
)

This tree ensures that the opening parenthesis always comes after the name, no matter the formatting needs, and the closing parenthesis is placed on its own line when wrapping is necessary. If wrapping is necessary, we place each argument on its own line (as they are in a Group node), and each line is indented. The result is that if wrapping is necessary, expressions such as this:

foo(10000000000000000, 200000000000000000, 'this is a string')

Will be formatted like so:

foo(
  10000000000000000,
  200000000000000000,
  'this is a string',
)

Using the Generator and Builder types

With both types set up, we can use them like so:

# This creates a Generator that enforces a line length of 80 characters.
let gen = Generator.new(80)
let build = Builder.new
let node = build.call(
  'foo',
  [
    Node.Text('1000000000000000000000000000000'),
    build.call(
      'bar',
      [
        Node.Text('2000000000000000000000000000000'),
        build.string('this is a string'),
        build.call('without_arguments', []),
      ],
    ),
  ],
)

gen.generate(node)
gen.into_string

This produces the following output:

foo(
  1000000000000000000000000000000,
  bar(2000000000000000000000000000000, "this is a string", without_arguments()),
)

If we instead change the line limit to 120, the output is as follows instead:

foo(1000000000000000000000000000000, bar(2000000000000000000000000000000, "this is a string", without_arguments()))

And if we use a limit of 40, we get this instead:

foo(
  1000000000000000000000000000000,
  bar(
    2000000000000000000000000000000,
    "this is a string",
    without_arguments(),
  ),
)

Handling (trailing) comments

Upon publishing this article I got a question: how do you handle (trailing) comments using this formatting algorithm? Doing so isn't as complicated as one might think, and involves the following steps:

Given a list of nodes to render (which may include comments), turn this list into an iterator that allows peeking of values without advancing the iterator (i.e. Peekable in Rust)
Iterator over the nodes to render
For each node, peek at the next node. If this node is a comment and starts on the same line at which the current node ends, advance the iterator and set the resulting node aside
Render the current node
Render a single space on the current line, take the comment node set aside earlier and render it. Because we advanced the iterator in step 3, the comment node won't be processed again
Repeat this process for each expression

You can find an example of this idea here.

The caveat of this implementation is that the comment node may overflow the line limit. To solve this, you need to extend your formatter to support wrapping of comments. This gets complicated as you have to account for any special formatting included in the comment (e.g. Markdown code blocks) and make sure to not mess up their formatting. For Inko, I decided to keep things simple and not wrap comments where necessary, and instead leave this up to the developer.

Applying this to a real formatter

While what we've discussed so far is a simplified version of a real formatter, it's not too different from a real production code formatter. For example, Inko's own formatter uses the same setup discussed here, it just has some extra nodes to handle specific formatting needs, and has to handle things such as rendering string escape sequences in their literal form (i.e. \n is rendered as a literal \n and not an actual newline), and whatever formatting edge cases present themselves based on the formatting rules.

To put it differently, the setup discussed here gets you about 80% of the way, while the remaining 20% is spent handling edge cases based on your formatting needs. In case of Inko, I probably spent a week or two writing the initial formatter, followed by another two to three weeks of dealing with unexpected edge cases and careful tweaking of the output.

The final version of the code shown in this article, along with plenty of comments to help understand the code better, is found in this Git repository, which you can easily run by either installing Inko or by using Docker/Podman. If you want a more advanced example, consider taking a look at the code used by Inko's own formatter.

The code examples and the linked repository require the use of Inko's main branch, as the code depends on some changes that have yet to be released. Refer to the repository's README for more details.

If you'd like to learn more about the various aspects of building programming languages, or you're interested in learning more about Inko, please consider sponsoring my work through GitHub Sponsors, join Inko's Discord server or Matrix channel (bridged to the Discord server), or subscribe to the /r/inko subreddit.

In typical Ruby fashion, the community is seemingly unable to agree on a consistent code formatter to use, so it's not entirely clear how many actually use these formatters. RuboCop is widely used, but it's a linter and not just a style formatter.
Yes, I'm advertising my own work. Bite me.

What it was like working for GitLab

2024-02-08T19:00:00Z

I joined GitLab in October 2015, and left in December 2021 after working there for a little more than six years.

While I previously wrote about leaving GitLab to work on Inko, I never discussed what it was like working for GitLab between 2015 and 2021. There are two reasons for this:

I was suffering from burnout, and didn't have the energy to revisit the last six years of my life (at that time)
I was still under an NDA for another 24 months, and I wasn't sure how much I could discuss without violating it, even though it probably wouldn't have caused any problems

The NDA expired last December, and while I suspect I'm still dealing with the fallout of a burnout, I have a bit more energy to revisit my time at GitLab.

I'll structure this article into two main sections: an overview of my time at GitLab based on what I can still remember, and a collection of things I've learned as a result of my work and experiences.

Before GitLab
2015-2017
2017-2018
2019-2021
What I've learned
Conclusion

Before GitLab

Before joining GitLab, I was working for a small startup based in Amsterdam. Like most startups, in the months leading up to my departure the company started to run out of money and had to resort to some desperate solutions, such as renting out part of the office space to cover the costs. At the same time, I felt I had done all the things I wanted to and could do at a technical level.

In parallel to this, I was also working on Rubinius in my spare time, and we had considered using it on various occasions, going as far as making sure all our code ran on it without any problems. This also lead to the creation of Oga, an XML/HTML parsing library acting as an alternative to Nokogiri.

Unfortunately, the lack of funding combined with various technical problems meant that we never pursued the use of Rubinius further. Because of all these factors, I started looking for a job where I could spend at least some more time working on Rubinius in hopes of making it stable enough for people to use in a production environment.

During this time I attended various Ruby meetups in Amsterdam, and helped out with a few Rails Girls workshops. At one of these workshops I ran into Sytse and his wife, and once again at a later workshop or meetup (I think, I can't quite remember as it's been a long time). Through this I learned about GitLab, and developed an interest in working there.

Some time in the summer of 2015 I sent Sytse an Email, stating I wanted to work for GitLab and asking if they were willing to sponsor me working on Rubinius one day per week. The conversation and interviews that followed resulted in me starting at GitLab in October 2015 as employee #28. My task was to improve performance of GitLab, and allowed me to spend 20% of my time on Rubinius.

During my time I was a part of various teams, had a lot of autonomy, reported to 10 different managers over the years, nearly wiped the company out of existence, built various critical components that GitLab still uses to this day, saw the company grow from 30-something employees to around 2000 employees, and ended up with a burnout. Or as the Dutch saying goes: "Lekker gewerkt, pik!" (good luck translating that).

2015-2017

My last day at the company before GitLab was September 30, a Wednesday, followed by starting at GitLab the next day. This meant I went from working in an office five days per week to working remote five days per week. While I had worked from home before, mainly when the trains weren't running due to a tiny amount of snow or leaves on the tracks, it took a bit of adjusting to the new setup.

A particular memory from this time that stands out is carrying a bag of groceries home during the day, and realizing how nice it's to do that during the day instead of in the evening after coming home from work.

Another memory is taking a nap on my sofa with my cat, of which I took this picture at the time:

Yes, those are Homer Simpson slippers.

The apartment I was renting at the time wasn't large and only had a small kitchen area, a small living room, and a similarly small attic. This meant that my living room functioned as my bedroom, living room, and office all at once. It wasn't a great setup, but it was all I could afford at the time. Perhaps the expensive Aeron chair had something to do with it.

In spite of being an all remote company, GitLab was a social company, with frequent meetups and events taking place over the years. For example, a few weeks after I joined there was a company gathering in Amsterdam, involving various activities during the day and dinners in the evening:

Back then you could still fit the entire company in one corner of a restaurant.

Not long after, GitLab had its first growth spurt, resulting in somewhere around 100 new employees (I think? My memories are a bit fuzzy at this point). At the next company gathering in Austin in 2016, a single corner in a restaurant was no longer enough:

During this time there were also plenty of negative experiences. GitLab suffered from terrible performance, frequent outages (almost every week some), poor management, and many other problems that startups face. This lead to "GitLab is slow" being the number one complaint voice by users. Especially on Hacker News people just loved to complain about it, no matter what the original topic (e.g. some new feature announcement) might've been. Of course GitLab was aware of this, and in fact one of the reasons GitLab hired me was to resolve these problems.

Resolving these problems proved a real challenge, in particular because GitLab had no adequate performance monitoring infrastructure. That's not an exaggeration by the way: the only service running at the time was a New Relic trial account that only allowed monitoring of one, maybe two servers out of the (I think) total of 15-20 servers we had at the time. This meant that whatever data did come in wasn't an accurate representation, and made measuring and solving performance a challenge.

What made solving these problems extra difficult was GitLab's requirement that whatever tooling we'd use had to be available to self-hosted customers, and preferably be open source (or perhaps this was even a hard requirement, I can't remember). This meant I had to not only improve performance, but also build the tools to improve performance in the first place. At the same time, writing performant code (or code that at least isn't horribly slow) wasn't at all considered a priority for the rest of the company. GitLab also had a tendency to listen more to complaints on Hacker News than internal complaints. This lead to an internal running joke that it if you wanted something to change, you'd have better luck complaining about it anonymously on Hacker News instead of bringing it up through the proper channels.

What followed was several months of me trying to improve performance, build the tooling necessary for this, try to change the company culture/attitude towards performance such that things would actually improve over time, and deal with GitLab not being happy with the improvements made. I distinctively remember there being at least several video calls in which I was close to yelling at Sytse, though it fortunately never came to that.

In spite of these challenges I did manage to build the necessary tooling, and improve performance in various parts (some of which were significant, others not so much). This tooling became an official GitLab feature known as "GitLab Performance Monitoring", though it has changed quite a bit over the years. Another tool I built was "Sherlock", a heavy-weight profiler meant to be used in a development environment.

During this time, GitLab started to realize you can't solve these sort of problems by just hiring one person, especially if performance isn't a priority for the rest of the company. One of the changes this lead to was that instead of reporting directly to Sytse, I would report to a dedicated manager as part of the new "Performance" team, and the team had a budget to hire more people. I don't remember exactly what the budget was, but it wasn't much: two, maybe three people I think. This wasn't nearly enough given the total size of the company and it's primary focus being producing as many features as possible, but it was a start.

Much of my second year I spent as part of this team, now with a bit more room to breathe. I continued campaigning for more resources and making good performance a requirement for new code, but with mixed results, and of course I and the team as a whole continued improving performance.

During this time GitLab also saw its first wave of lay-offs and people leaving by their own will, mainly as a result of GitLab hiring the wrong people in the first place. This meant that GitLab grew from 30-something to (I think) 130-something people, only to shrink back to 80-something people, only to start growing again in the months to come.

As for Rubinius: while we tried to get GitLab to work on Rubinius, we never succeeded. Combined with the maintainer of Rubinius wanting to take the project in a different direction and the controversies this lead to within the Ruby community, we ultimately decided to give up on Rubinius, and I stopped working on it entirely. It's unfortunate, as Rubinius had a lot going for it over the years but was ultimately held back by the maintainers running the project in a way different from what was necessary for it to become successful.

2017-2018

After the first rocky 1,5 years, things started to improve. Performance had improved greatly, and GitLab was starting to take it more seriously. Hiring processes were much improved, and like a game of chess GitLab started moving the right people into the right places. The scope of the performance team also changed: instead of focusing on performance in general, the team would focus on database performance and as part of this was to be renamed to the creatively called "Database team". With this change also came a bigger budget for hiring people, and infrastructure engineers assigned to help us out with e.g. setting up new databases.

A critically important feature I built during this time is GitLab's database load balancer (announced here). This feature allowed developers to continue to write their database queries as usual, while the load balancer would take care of not directing these queries to either a replica or a primary. After performing a write, the load balancer ensures the primary database is used until the written changes are available to all replicas, an act commonly referred to as "sticking". The introduction of the load balancer had a significant and positive impact on performance, and I'm certain GitLab would've been in a lot of trouble if it wasn't for this load balancer. What I'm most proud of is being able to introduce this system transparently. To date I've not seen a database load balancer (let alone for Ruby on Rails) that you can just add to your project and you're good to go. Instead, existing solutions are more like frameworks that only provide a small piece of the puzzle, requiring you to glue everything together yourself, often without any form of sticking support. It's a shame we never got to extract it to a standalone library.

This period wasn't just one of incredible productivity and improvements, it also marked the lowest and scariest moment of my time at GitLab and my career as a whole: on January 31st, after a long and stressful day of dealing with many issues that continued well into the late evening, I ~~solved GitLab's performance problems~~ removed GitLab's production database by accident. This then lead to the discovery that we didn't have any backups as a result of the system not working for a long time, as well as the system meant to notify us of any backup errors not working either. In the end we did recover, as I had copied the production data to our staging environment about six hours prior as part of the work I was doing that day, though the recovery process took around 24 hours. While about six hours of data loss is by all accounts terrible, I'm not sure what would've happened if I hadn't made that backup. Suffice to say, my heart skipped a few beats that day, and I'm certain I instantly grew a few extra grey hairs.

A recurring source of frustration during this time was GitLab's desire to shard the database, even after the introduction of the database load balancer. Not only did I and the other engineers and my manager believe this to be the wrong solution to our problems, we also had the data to back this up. For example, sharding is useful if writes heavily outnumber reads, but in case of GitLab reads dominated writes by a ratio along the lines of 10:1. Further, the amount of data we were storing wasn't nearly enough to justify the complexity of introducing sharding. I distinctively remember a consultant we'd hired saying something along the lines of "No offence, but we have customers with several orders of magnitudes more load and storage, and even for them sharding is overkill". In spite of this, GitLab would continue to push for this over the years, until management made the decision to leave it be, only for GitLab to bring it up again (just using a slightly different name and idea this time) towards the end of my time at GitLab.

2019-2021

Some time in 2018-2019 I transitioned from the database team into a newly founded "Delivery" team, as I had grown tired of working on performance and reliability for the last four years. Furthermore, multiple people were now working on performance and reliability, so I felt it was the right time for me to move on to something new. The goal of this new team was to improve the release process and tooling of GitLab, as the state of this was at the time best described as messy.

For example, we looked into how much time there was between a commit landing in the main branch and deploying the change GitLab.com. The resulting data showed that on average it would take several days, but in the worst cases it could take up to three weeks. The main bottleneck here was the split between GitLab Community Edition and GitLab Enterprise Edition, both existing as separate Git repositories, requiring manual merges and conflict resolution on a regular basis. This lead to a multi-month effort to merge the two projects into one. While we divided the work into frontend and backend work, and made various teams responsible for contributing their share towards the effort, I ended up implementing most of the backend related changes, with another colleague taking care of most of the frontend work.

Together with the rest of the team we made significant improvements to the release process during this period, and we reached a point where we could deploy changes in a matter of hours. While this is nowhere near as quick as it should've been, going from a worst-case of three weeks to a worst-case of maybe a day is a massive improvement.

Like the previous periods, this period was not free of turmoil and changes.

2018 was the last year we had a GitLab summit focused on employees, with 2019 and following years following a format more like a traditional conference, aimed more at customers and less at employees. From a financial perspective this was understandable as organizing a gathering of 2000+ people is incredibly expensive. From a social perspective it was a loss, as the more corporate setting of the summits wasn't nearly as fun as the old format. I have fond memories of Sytse dancing on stage in response to a team winning a contest, or Sytse and his wife giving a fitness class while Sytse is wearing a giraffe costume. These sort of goofy events wouldn't happen any more in the following years.

Then there was the issue of laptop management: people would request a company Mac laptop and were more or less free to use it how they saw fit, or you'd use your own hardware like me. Over the years GitLab's management started discussions about using software to be able to remotely manage the laptops. A recurring problem in these discussions was that the proposed tools were invasive (e.g. they could be used to record user activity), didn't contain any guarantees against abuse, and feedback from employees (of which there was a lot) would be ignored until key employees started applying pressure on management. The plans would then be shelved, only for the discussion to start all over again months later.

What stood out the most was not the proposed changes, but rather the way management handled the feedback, and how the changes in general gave off a vibe of solutions in search of problems to justify their existence. It's worth mentioning that most people involved in these discussions (myself included) understood the need for some form of laptop management (e.g. against theft), but felt that the invasive solutions proposed went too far.

GitLab did settle on a laptop management solution using SentinelOne. While GitLab made it a requirement for employees to install this software on hardware used to access GitLab resources, including your personal hardware (or at least was considering requiring that), I (using my own desktop computer) somehow managed to stay under the radar and was never asked to install the software in question. Perhaps because I wasn't using a company issued laptop, GitLab just forgot to check up on me.

These cultural changes combined with various changes in my personal life resulted in a loss of motivation, productivity, and an increase in stress, and less consistent working hours. The team's manager (whom I'd consider the best manager I've ever had) also transitioned to a different role, with a newly hired manager now leading the team. I didn't get along well with this manager, The resulting conflict lead to a "performance enablement plan", a procedure meant to get things back on track before the need for a "performance improvement plan" (PIP). A PIP is meant to be used as a last attempt at improving the relationship between an employee, their work, and their employer.

What rubbed me the wrong way was how GitLab handled the PEP: I acknowledged there were areas I needed to improve upon, but I felt that part of the problem was the new manager's way of working. Management assured me that the PEP meant to improve the state of things on both ends, i.e. it wouldn't just focus on me improving but also the manager. That didn't happen, and the PEP focused solely on what I needed to do differently. The PEP was also a bit vague about what requirements had to be met. The original plan was for the PEP to last one month, but by the end of the first month my manager decided to extend the PEP by another month because they felt this to be necessary, the reasons for which weren't well specified. I decided to just go along with it, and after two months passed I completed the PEP and management deemed the results satisfactory.

The optimist in me likes to believe I was just the first employee to be put on a PEP and thus management had to figure things out as we went along. The pessimist in me has a far more negative opinion on this series of events, but I'll keep that to myself.

After this experience I realized that perhaps it was time for me to leave, as both GitLab and I were heading in different directions, and I was unhappy with the state of things at the time.

The opportunity for this presented itself towards the end of 2021: GitLab was going public, and taking into account the time I had to wait before I could exercise my stock options meant I'd be able to leave in December 2021. I couldn't leave earlier due to how stock option taxes worked in The Netherlands at the time: exercising stock options meant having to pay full income taxes (52%) over the difference between the exercise fee and valuation, even if the stock isn't liquid. In my case the amount of taxes would be so high I wouldn't be able to afford it, forcing me to wait until GitLab went public. A few months later the law changed, and you can now choose to pay the taxes either at the time of exercise, or when the stock is liquid. The caveat is that if you defer taxes until the stock is liquid, you pay taxes based on the value at that time, not based on the value at the time of exercising your stock options. This certainly isn't ideal and presents a huge financial risk, but at least you have a choice.

And so with my stocks acquired, I left in December 2021 to work on Inko full-time, using my savings to cover my bills.

What I've learned

With the history out of the way, let's take a look at some things I've learned from my time at GitLab. One thing to keep in mind is that I'm basing these findings on my personal experiences, and as such it's not unlikely I'm wrong in some areas.

Scalability needs to be part of a company's culture

A mistake GitLab made, and continued to make when I left, was not caring enough about scalability. Yes, directors would say it was important and improvements were certainly made, but it was never as much of a priority as other goals. At the heart of this problem lies the way GitLab makes money: it primarily earns money from customers self-hosting GitLab Enterprise Edition, not GitLab.com. In fact, GitLab.com always cost much more money than it brought in. This naturally results in a focus on the self-hosted market, and many of the performance problems we ran into on GitLab.com didn't apply to many self-hosted customers.

What was even more frustrating was that many developers in fact wanted to improve performance, but weren't given the time and resources to do so.

Make teams more data and developer driven

Another factor is GitLab's product manager driven nature. While some key developers may have had the ability to influence product decisions (given enough screaming and kicking), it was mainly product managers and directors deciding what needed to be implemented. Sometimes these decisions made a lot of sense, other times they seemed to be based solely on the equivalent of "I read on Hacker News this is a good idea, so we have to build it".

I believe GitLab would've been able to perform better as a company if it adopted a simpler hierarchy early on, instead of the traditional multi-layer hierarchy it has today. In particular, I think the idea of product managers needs to go in favour of giving team leads more power and having them interact more with users. To me, that's ultimately what a "product manager" should do: help build the product at a technical level, but also act as a liaison between the team and its users.

You can't determine what is "minimal viable" without data

One of GitLab's core principles is to always start with a "minimal viable change". The idea is to deliver the smallest possible unit of work that delivers value to the user. On paper that sounds great, but in practice the definition of "minimal" is inconsistent between people. The result is that one team might consider performance or good usability a requirement to something being viable, while another team couldn't care less.

In practice this lead to GitLab building many features over the years that just weren't useful: a serverless platform nobody asked for and that was ultimately killed off, support for managing Kubernetes clusters that didn't work for three weeks without anybody noticing, a chatops solution we had to build on top of our CI offering (thus introducing significant latency) instead of using existing solutions, or a requirements management feature that only supported creating and viewing data (not even updating or deleting); these are just a few examples from recent years.

To determine what makes something viable, you need a deep understanding of the desires of your target audience. While GitLab does perform user surveys every quarter, and some teams have access to data about user engagement, from what I remember and learned from talking to other former colleagues it seems this data was more incidentally used, instead of being a core part of each team's workflow.

A SaaS and self-hosting don't go well together

GitLab offers two types of product: self-hosted installations and a software as a service (SaaS) offering. I believe most companies won't be able to effectively offer such a setup, including GitLab. Not only do you get a conflict of interest based on what earns you the most money (as mentioned above), but the two types of setups also come with different requirements and ways of applying updates.

For example, for a SaaS you want to be able to deploy quickly and have to handle large amounts of data and workloads taking place on a centralized infrastructure. Given most self-hosted instances tend to be tiny in comparison to the SaaS offering, many of the solutions for the problems you encounter as a SaaS and their corresponding solutions just don't apply to self-hosted installations. This effectively results in two code paths in many parts of your platform: one for the SaaS version, and one for the self-hosted version. Even if the code is physically the same (i.e. you provide some sort of easy to use wrapper for self-hosted installations), you still need to think about the differences.

In contrast, when you focus on either a SaaS or self-hosted setup you get to dedicate all your attention to providing the best experience for the setup in question. There are of course exceptions, but they are exactly that: exceptions, and exceptions are rare.

More people doesn't equal better results

Like many other companies before it, GitLab hired large numbers of people over the years and today employs over 2000 people. I don't know how many of those are developers today, but I'm guessing at least a few hundred based on a quick glance at their team page.

It's well known that adding more people to a project doesn't necessarily improve productivity and results (see also "The Mythical Man-Month"), and yet almost every western startup with venture capital seems to ignore this, hiring hundreds of developers even if the product doesn't need nearly that many developers.

I don't have any data to back this up, but I suspect that most companies don't need more than 20 developers, with some needing 20 to 50 developers, and only a handful needing between 50 and 100 developers. Once you cross the 100 developer mark, I think you need to start thinking about whether the scope of your product(s) isn't getting out of hand before hiring even more people.

Note that I'm specifically talking about software developers here. For example, if you're building custom hardware, you'll probably need more people to scale up the production process. Sales and support are also two areas where you generally do benefit from having more people, as these types of work require less synchronisation between people.

I'm conflicted on the use of Ruby on Rails

GitLab is built using Ruby and Ruby on Rails, and this is a big part of what allowed it to reach the success it enjoys today. At the same time, this combination presents its challenges when the project reaches a large size with many contributors of different experience levels. Rails in particular makes it too easy to introduce code that doesn't perform well.

For example, if you want to display a list of projects along with a counter showing the number of project members, it's far too easy to introduce the N+1 query problem by accident. While Rails (or more specifically, ActiveRecord) provides functionality to solve this, it's an opt-in mechanism, inevitably leading to developers forgetting about this. Many of the performance problems solved during my first few years at GitLab were N+1 query problems.

Other frameworks have learned from this over the years and provide better alternatives. The usual approach is that instead of being able to arbitrarily query associated data, you have to pass in the data ahead of time. The benefit here is that if you were to forget passing the data in, you'd run into some sort of error rather than the code querying the data for you on a per-row basis, introducing performance problems along the way.

Ruby itself is also a choice I have mixed opinions on. On one end, it's a wonderful language I enjoyed using for a little under 10 years. On the other end, its heavy use of meta programming makes it difficult to use in large projects, even with the introduction of optional typing. I'm not just saying that for the sake of saying it, I experienced it first hand when writing a static analysis tool for Ruby years ago.

In spite of all this, I'm not sure what alternative I would recommend instead of the combination of Ruby and Ruby on Rails. Languages such as Go, Rust or Node.js might be more efficient than Ruby, but none have a framework as capable as Ruby on Rails. Python and Django might be an option, but I suspect you'll run into similar problems as Ruby and Ruby on Rails, at least to some degree. It would probably help if new web frameworks stopped obsessing over how to define your routing tree, and instead focused more on productivity as a whole.

I have some vague ideas on how I'd approach this with Inko, but there's a lot of other work that needs doing before I can start writing a web framework in Inko.

The time it takes to deploy code is vital to the success of an organization

This is something I already knew before joining GitLab, having spent a significant amount of time setting up good deployment and testing pipelines at my previous job, but working for GitLab reinforced this belief: you need to be able to deploy your code fast, i.e. within at most an hour of pushing the changes to whatever branch/tag/thing you deploy from. At GitLab it took somewhere around four years for us to get even close to that, and we still had a long way to go.

Apart from the obvious benefits, such as being able to respond to incidents more efficiently (without having to hot-patch code to account for your deploys taking hours), there's also a motivational benefit: being able to see your changes live is nice because you actually get to see and make use of your work. Nothing is more demotivating than spending weeks on a set of changes, only for it to take another two weeks for them to be deployed.

For this to work, you need to be incredibly aggressive about cutting down deploy times and the time it takes to run your test suite as part of your deployments. Depending on the type of application and the types of services you're testing, you may inherently need a certain amount of time to run the tests. The point here is not "tests and deployments must never take more than X minutes", but rather to (as an organization) make it a priority to be able to deploy as fast as your business requirements allow you to. As obvious as this may seem, I suspect many organizations aren't doing nearly as good of a job in this area as they could.

Location based salaries are discriminatory

The salary you earn at GitLab is influenced by various variables, one of which is location. The influence of your location isn't insignificant either. When you are a company with a physical office and have a need to hire people in specific areas, it might make sense to adjust pay based on location as you otherwise might not be able to hire the necessary people in the required areas. But for an all remote company without a physical office, and legal entities across the whole world, there's no legitimate reason to pay two different people with the same experience and responsibilities different salaries purely based on where they live.

To illustrate: when I left GitLab my salary was around €120 000 per year, or around €8500 per month, before taxes. For The Netherlands this is a good salary, and you'll have a hard time finding companies that offer better and let you work from home full time. But if I had instead lived in the Bay Area, I would've earned at least twice that amount, possibly even more. Not because I am somehow able to do my job better in the Bay Area, or because of any other valid reason for that matter, but because I would be living in the Bay Area instead of in The Netherlands.

No matter how you try to spin this, it's by all accounts an act of discrimination to pay one person less than another purely based on where they live. Think about it: if a company pays a person less because of the color of their skin or their gender, the company would be in big trouble. But somehow it's OK to pay a person less based on their location?

As for how to solve this, for companies it's easy: just pay based on the position's requirements, not the location of the applicant. It doesn't matter whether you're paying somebody in the Bay Area $100 000 per year, or somebody in the Philippines, because the cost for you as a business is the same. For employees my only advice is to try and negotiate a better salary, but this may prove difficult as companies paying based on locations also tend to be stubborn about it. I hope one day our laws catch up with this practice.

A company that seems to do a good job at this is the 0xide Computer Company. Instead of paying employees based on their location, 0xide pays employees the same amount (see this post for more details), something I deeply admire and believe more companies should do.

Conclusion

Looking back, my time at GitLab is a mix of both incredibly positive and negative experiences. I'm incredibly proud of the achievements the various teams I was on made, and the people I got to work with, but I'm also saddened by the last year or so souring an otherwise great experience. I don't have any regrets working for GitLab, and would do it all over again if I could, just a little differently thanks to the benefit of hindsight. I also still recommend it as a company to work for, because in spite of its flaws I think it does much better than many other companies.

A decade of developing a programming language

2023-11-14T11:00:00Z

In 2013, I had an idea: "what if I were to build my programming language?". Back then my idea came down to "an interpreted language that mixes elements from Ruby and Smalltalk", and not much more.

Between 2013 and 2015 I spent time on and off trying different languages (C, C++, D and various others I can't remember) to see which one I would use to build my language in. While this didn't help me find a language I did want to use, it did help me eliminate others. For example, C proved to be too difficult to work with. D seemed more interesting and I managed to implement something that vaguely resembles a virtual machine, but I ultimately decided against using it. I don't remember exactly why, but I believe it was due to the rift caused by the differences between D version 1 and 2, the general lack of learning resources and packages, and the presence of a garbage collector.

Somewhere towards the end of 2014 I discovered Rust. While the state Rust was in at the time is best described as "rough", and learning it (especially at the time with the lack of guides) was difficult, I enjoyed using it; much more so than the other languages I had experimented until that point.

2015 saw the release of Rust 1.0, and that same year I committed the first few lines of Rust code for Inko, though it would take another two months or so before the code started to (vaguely) resemble that of a programming language.

Fast-forward to 2023, and Inko is in a state where one can write meaningful programs in it (e.g. HVAC automation software, a Markdown parser, a changelog generator and more). Inko has also changed considerably over the years: whereas it was once a gradually typed interpreted language, it's now statically typed and compiles to machine code using LLVM. And whereas Inko used to draw inspiration heavily from Ruby and Smalltalk, these days it's closer to Rust, Erlang and Pony than it is to Ruby or Smalltalk.

Given it's been 10 years since I first started working towards Inko, I'd like to highlight (in no particular order) a few of the things I've learned about building a programming language since first starting work on Inko. This is by no means an exhaustive list, rather it's what I can remember at the time of writing.

You can find discussions about this article on Reddit here and here, on Hacker News, and on Lobsters.

Avoid gradual typing
Avoid self-hosting your compiler
Avoid writing your own code generator, linker, etc
Avoid bike shedding about syntax
Cross-platform support is a challenge
Compiler books aren't worth the money
Growing a language is hard
The best test suite is a real application
Don't prioritize performance over functionality
Building a language takes time

Avoid gradual typing

A big change I made was to switch Inko from being a gradually typed language to a statically typed language. The idea behind gradual typing was that it would allow you to build a prototype or simple scripts in a short amount of time using dynamic typing, then over time turn the program into a statically typed program (where beneficial).

In reality, gradual typing ends up giving you the worst of both dynamic and static typing: you get the uncertainty and lack of safety (in dynamically typed contexts) of dynamic typing, and the cost of trying to fit your ideas into a statically typed type system. I also found that the use of gradual typing didn't actually make me more productive compared to using static typing. The result was that I found myself avoiding dynamic typing in both Inko's standard library and the programs I wrote. In fact, the few places where dynamic typing was used in the standard library was due to the type system not being powerful enough to provide a better alternative.

Gradual typing also has performance implications. Consider this example using keyword arguments:

let x: Any = some_value

x.foo(b: 42, a: 10)

Here x is typed as Any, which used to mean the value is dynamically typed. Because we don't know the type of x in x.foo(...), we can't resolve the keyword arguments to positional arguments at compile-time. This meant Inko's virtual machine had to provide a runtime fallback, and the keyword arguments had to be encoded into the bytecode. While the cost wasn't significant, in a statically typed language the cost is zero because we can resolve the arguments at compile-time.

Another issue is that the presence of dynamic types can inhibit compile-time optimizations, such as compile-time inlining (and all the optimizations that depend on it). If a language uses a Just In Time (JIT) compiler, such as JavaScript (and by extension TypeScript), you can optimize the code at runtime, but that means having to write a JIT compiler which itself is a massive undertaking.

The presence of dynamic types also means that even statically typed code may be incorrect, though this depends on how you approach casting dynamically typed values to statically typed values. If such a cast doesn't require a runtime check, you may end up passing incorrectly typed data to statically typed code. If you do perform some sort of runtime check, this may affect performance when such casts are common.

Recommendation: either make your language statically typed or dynamically typed (preferably statically typed, but that's a different topic), as gradual typing just doesn't make sense for new languages.

The emphasis here is on new languages, as applying gradual typing to an existing language can be useful, especially as an intermediate step towards the language becoming fully statically typed.

Avoid self-hosting your compiler

Early in the development of Inko, I decided that I wanted to write the compiler in Inko itself, commonly referred to as a "self-hosted compiler". The idea was that by doing so, the compiler could be exposed through the standard library, and to have a sufficiently complicated program to test everything Inko has to offer.

While this seems great on paper, in practise it turns into a real challenge. Maintaining a single compiler is already a challenge, but maintaining two compilers (one to bootstrap your self-hosted compiler, and the self-hosted compiler itself) is even more difficult. The process of building the compiler is also more complicated: first you have to build the bootstrapping compiler, then you can use that to build the self-hosted compiler. Ideally you then use that self-hosted compiler to compile itself a second time, so you can ensure the behaviour doesn't subtly change depending on what compiler (the bootstrapping or self-hosted compiler) is used to compile your self-hosted compiler.

Because of these challenges, I abandoned this idea in favour of writing the compiler in Rust, and keeping it that way for the foreseeable future.

Recommendation: defer writing a self-hosted compiler until you have a solid language and ecosystem. A solid language and ecosystem is infinitely more useful to your users than a self-hosted compiler.

Avoid writing your own code generator, linker, etc

When writing a language, it's tempting to take on more than you can or probably should handle. In particular, it may be tempting to write your own native code generator, linker, C standard library, and so on (i.e what languages such as Zig and Roc are doing).

My general recommendation is to avoid this unless you have established a clear need for this. And when you do think there's a need, I'd still avoid it. Writing a language is hard enough as-is and can easily take years. For every such component (a linker, a code generator, etc) you add on top, it will take several more years before the stack as a whole becomes useful. That's ignoring the painful fact that such bespoke components are highly unlikely to outperform the established alternatives.

Recommendation: there are many developers who think they can write a better linker, code generator, and so on, but few developers who actually succeed in doing so. As harsh as it may sound, you are probably not one of them. Of course once you have an established language, you're free to reinvent as many of these wheels as you see fit.

If you're writing an interpreted language, it's fine and probably even needed to write your own (byte)code generator (unless you target an existing virtual machine such as the JVM), as bytecode generators are typically not that complicated to implement.

Avoid bike shedding about syntax

The syntax of a language and how its parsed is one of the most boring aspects of building a language. Writing parsers in general is pretty dull, and there's not a lot you can innovate upon.

And yet, it's a subject many developers building their own language seem to spend way too much time on. There are also plenty of articles titled something along the lines of "How to build your own programming language", only covering the basics of writing a parser and nothing more.

For Inko I took a different approach in its early days: I used an S-expression syntax, instead of designing my own syntax and writing a parser for it. This meant I was able to experiment with the semantics and virtual machine of the language, instead of worrying over what keyword to use for function definitions.

Recommendation: use an existing syntax and parser when prototyping your language, allowing you to focus on the semantics instead of the syntax. Once you develop a better understanding of your language you can switch to your own syntax.

Cross-platform support is a challenge

This shouldn't be entirely surprising, but supporting different platforms (Linux, macOS, Windows, etc) is hard. For example, Inko used to support Windows when it used an interpreter. When switching to a compiled language, I had to drop support for Windows as I couldn't get certain things to work (e.g. the assembly used for switching thread stacks).

Running tests on different platforms is also not nearly as easy as it should be. Take GitHub Actions: you can use it to run tests on Linux, macOS, and Windows. Unfortunately, the free tier (at the time of writing) only supports AMD64 runners, and while it does support macOS ARM64 runners, these cost $0.16 per minute.

The cost isn't even the biggest problem here, because depending on how often tests run it may not be that big. Rather, the problem is that paid runners typically aren't available for forks, meaning pull requests from third-party contributors won't be able to run the tests using these runners.

And this is ignoring the problem of supporting platforms not supported by your continuous integration platform (e.g. GitHub Actions) of choice. FreeBSD is a good example of this: GitHub Actions just doesn't support it, so you need to use qemu or similar software to run FreeBSD in a VM.

Even if you just support Linux, you still have to deal with the differences between Linux distributions. For example, Inko uses a Rust wrapper for LLVM (Inkwell), but the low-level LLVM wrapper (llvm-sys) it uses doesn't compile on Alpine Linux, and so Inko doesn't support Alpine Linux for the time being.

The extend to which this is a problem depends on the language you're trying to build. For example, if you're building an interpreter written in Rust it probably won't be that bad (though Windows is always going to be a challenge), but it is something you need to be prepared for.

Recommendation: if you're uncertain about supporting a certain platform, err on the side of not supporting it and document this, instead of sort-of-but-not-quite supporting it.

Compiler books aren't worth the money

While there are plenty of books on compiler development, they tend to not be that useful. In particular, such books tend to dedicate a significant amount of time to parsing, arguably the most boring part of a compiler, then only briefly cover the more interesting topics such as optimizations. Oh, and good luck finding a book that explains how to write a type-checker, let alone one that covers more practical topics such as supporting sub-typing, generics, and so on.

Recommendation: start with reading Crafting Interpreters, and read through /r/ProgrammingLanguages on Reddit. If you're interested in learning more about pattern matching, this Git repository may prove useful.

Growing a language is hard

Building a language is a significant challenge on its own. Growing the number of users using your language and the libraries written in your language? That's even more difficult. In particular, it seems languages either explode in terms of popularity/interest, even if that may not be warranted (looking at you, V), or it takes years for them to get even a handful of users.

Making a living off a programming language is exceptionally difficult, as the number of people willing to donate money is even smaller than those willing to try out your new language. This means either dedicating a lot of spare time towards building your language, or quitting your job and funding the development yourself (e.g. using your savings). This is what I did by the end of 2021 and while I don't regret doing so, it's a bit painful to watch your wallet shrink over time.

As far as advice goes, I'm not sure how to approach this as I'm still figuring that out myself. What I do know is that a lot of existing advice isn't helpful at all, as it amounts to "Just get more users, LOL". Perhaps in another 10 years from now I'll know the answer.

The best test suite is a real application

This one is a bit obvious, but worth highlighting regardless: writing unit tests for your language (e.g. for the standard library functions) is important and useful, but nowhere near as useful as writing a real application in the language. For example, I wrote a program to control my house's HVAC system in Inko, revealing various bugs and areas of improvement in the process. Such applications also act as a showcase for your language, making it easier for potential users to develop an understanding of what an average project in your language might look like.

Recommendation: write a few sufficiently complicated programs that are actually useful in your language, then use these as a way of testing functionality and stability of your language. If you can't think of any programs to write, consider porting this changelog generator written in Inko, as it's complex enough to act as a good stress test for your language, but not so complex it will take weeks to port.

Don't prioritize performance over functionality

When building a language, it can be tempting to focus heavily on providing a fast implementation, such as a fast and memory efficient compiler, and one can easily spend months working on this. Potential users of your language may care about performance to some degree, but what they care about more is being able to use your language, write libraries in it, and not having to reimplement every basic feature themselves because of a lacking standard library.

To put it differently: the value of good performance is proportional to the amount of meaningful code (= real applications) written in a language.

Recommendation: as the saying goes: first make it work, then make it fast. This doesn't mean you should not care about performance at all, rather 70-80% of your energy should be directed towards functionality, with the remaining 20-30% directed towards making the language not unreasonably slow.

Building a language takes time

To wrap things up, here's another observation that should be obvious but is worth bringing up regardless: building a simple language for yourself in a short amount of time is doable. Building a language meant to be used by many for many years to come is going to take a long time. To illustrate, here are some examples of a few languages and when they released their first stable release (a ? indicates no stable release is available at the time of writing):

Language	Started in	Release of 1.0.0
Python	1989	1994
Ruby	1993	1996
Scala	2001	2004
Rust	2006	2015
Go	2007	2012
Elixir	2011	2014
Crystal	2011	2021
Vale	2012	?
Inko	2013	?
Gleam	2016	?
V	2019	?

On top of that, there can be significant time between a language becoming stable and it becoming popular. Ruby 1.0 released in 1996, but it wouldn't be until 2005 or so that Ruby became popular with the release of Ruby on Rails. Rust in turn saw a rise in popularity following its first stable release, but it would still take a few years for the language to take off. Scala released version 1.0.0 in 2004, but didn't see widespread adoption until some time between 2010 and 2015.

Based on these patterns, I suspect that most languages will need at least 5-10 years of development before reaching their first stable release, followed by another 5 years or so before it starts to take off. That's all assuming you end up lucky enough for it to actually take off, as there are many languages that instead fade into obscurity.

Recommendation: if you want your language to succeed, be prepared for it to take at least 10-15 years. If you expect it to take the world by storm in just a year, you'll be sorely disappointed.

Switching to Fedora Silverblue

2023-03-02T23:43:38Z

For the last 10 years or so, Arch Linux has been my Linux distribution of choice. The early years were a bit rough, and the process of moving to systemd wasn't without its challenges either, though the experience has improved dramatically since then. In spite of these improvements, certain issues persisted, such as having to manually perform update related steps every now and then, fixing broken packages after an update, updating packages in a particular order (e.g. archlinux-keyring requiring an update before you can update other packages), and more.

Arch being a rolling release distribution also means that you're not supposed to install a new package without first updating your existing packages (at least for libraries). That is, sudo pacman -S some-package may lead to problems, so it's recommended to use sudo pacman -Syu some-package instead (see this section for more details). It's not a deal breaker, but it's yet another thing to keep in mind.

Perhaps the most annoying part is that package updates aren't tested all that well, if at all; or at least it feels that way. Linux kernel updates in particular had a tendency to cause issues on my laptop. I remember one particular instance where a bug in the Intel drivers (or something in the kernel itself, I can't quite remember) resulted in weird screen flickering/artifacts, requiring a rollback to a previous kernel version. Pinning packages using IgnorePkg was the usual workaround, but it's not a suitable long-term solution as updated packages may not work with older versions of the packages you're ignoring/pinning.

Long story short, over the years I realised I care more for a reliable and easy (or easier) to use distribution, instead of a distribution that gives you maximum control.

This is where Fedora comes in, and specifically Fedora Silverblue. Fedora has been around for years, and I've been keeping my eyes on it for a while. A while back I built a tiny computer to run some home automation software, and I decided to use Fedora Server for it. This gave me the chance to try Fedora without it getting in the way.

I ended up enjoying this enough that I decided to move my Linux installations to Fedora. As I mainly work on my desktop (still running Arch Linux at the time of writing), I decided to migrate my laptop first. I decided to go with Silverblue as I like the idea of an immutable desktop and the ability to roll back updates without leaving behind a dirty state.

The first step was to do some research into potential issues I might encounter. Through this I found a few potential issues/challenges to deal with:

Fedora ships a mirror of Flathub instead of using Flathub directly. You can and probably should disable this. I found and used this Reddit post as a reference to do so.
Fedora ships with systemd-oomd, and apparently this has a tendency to cause more problems than it solves (see here and here). I ended up disabling it using sudo systemctl stop systemd-oomd && sudo systemctl disable systemd-oomd && sudo systemctl mask systemd-oomd.
Apparently TRIM support isn't handled properly when using full disk encryption on Silverblue. The solution is to add rd.luks.options=discard to your kernel arguments.
A few packages I needed aren't available in the official repositories or copr, more on that later.
I read something about Flatpak (and thus the Firefox Flatpak) not supporting U2F, meaning I wouldn't be able to use my YubiKey with Firefox. This turned out to work just fine.

Having determined these issues had workarounds that I could live with, I proceeded with the installation process. The installation process itself was easy and ran without any issues, discussed below in no particular order.

After the installation finished I applied the necessary workarounds/fixes for the above issues, such as disabling systemd-oomd. Unfortunately, this is where I ran into some new and unexpected problems, though not all are exclusive to Silverblue.

Getting my keyboard layout to work

For my desktop I use a split keyboard that uses the Colemak Mod-DH ortholinear layout. On my laptop I use the same layout, through combination of a custom xkb keyboard layout and remapping the keycaps on my keyboard:

While the xkb project includes support for the Colemak Mod-DH layout, it only supports the variant where the bottom-left keys are XCDVZ, whereas the ortholinear version uses ZXCDV. I don't quite remember why the ZXCDV version isn't included, but I recall the reason being along the lines of "the XCDVZ layout is better for staggered keyboards". I guess I'm the only person wanting to use the same layout everywhere? Either way, my solution was to create a custom layout and be done with it.

For the Arch installation I just created the necessary files (based on this article) in the right place. I then performed the necessary magical incantations (which I of course couldn't remember) to get this working everywhere.

For Silverblue I started off with the same setup, placing the files in ~/.config/xkb instead of placing them in /usr/share/X11/xkb. While GNOME picked up the files just fine, I couldn't get this to work for the LUKS unlock screen or when using a console/TTY started using Alt and a function key. I also wasn't able to get GDM to use the layout. Placing the files in /usr/share wasn't an option either, as it's read-only on Silverblue.

Getting this to work took an entire evening, and required a few distinct steps. First, I build an RPM package to move these files into the right place in /usr/share. I then used rpm-ostree to layer the package onto the base image.

To get the console working I set KEYMAP in /etc/vconsole.conf to colemak_dh_ortho. The default initramfs of Silverblue ignores changes to this file, so to get this working I had to run rpm-ostree initramfs --enable. This enables regenerating of the initramfs every time you create a new rpm-ostree deployment, ensuring the necessary files are part of the initramfs. The downside is that commands such as rpm-ostree install and rpm-ostree update take quite a bit longer to finish. I also added vconsole.keymap=colemak_dh_ortho to my kernel arguments for good measure, but I'm not sure this is necessary.

The final piece of the puzzle was to get GDM working, which for some reason just refused to use this layout. I'm still not sure what exactly solved it, but I think it was running gsettings set org.gnome.libgnomekbd.keyboard layouts '["colemak_dh_ortho","us"]' followed by another reboot.

And all that took was well over six hours.

Getting rid of GNOME Software

GNOME software is the primary way of installing software through a GUI on Fedora. I ran into two issues with it, though both are not that big of a deal.

First, it's quite clunky to use when it comes to uninstalling software: when you remove a program, the list of installed programs is refreshed a few seconds after the removal finishes, showing a spinner while doing so. This made removing multiple programs a pain, as the spinner would typically show up just as I was about to click on the "remove" button of the next program I wanted to remove.

The second problem is that GNOME Software leaks memory like a sieve, and after several hours of using my laptop (I wasn't even using GNOME Software during that time) I found it had eaten up close to 1 GiB of memory.

grug tired of software leak memory. grug want reach for club, but grug remember easier just remove GNOME software and use terminal, so grug run rpm-ostree remove gnome-software gnome-software-rpm-ostree. Memory leak not worth grug's time and energy.

rpm-ostree and dnf are slow

DNF being slow is well known in the Fedora community. While DNF5 is supposed to improve this, I'll believe it when I see it. For me the process of installing and removing packages is fast enough, but refreshing mirror/package metadata is frustratingly slow.

What I didn't expect is for rpm-ostree to also be as slow as a snail. While you can stage updates in the background and will do most of your package related work in a container, you still have to interact with rpm-ostree every now and then. Coming from Arch Linux where pacman is super fast, the experience leaves a lot to be desired. To illustrate, for this article I ran rpm-ostree update and it took just over two minutes to upgrade a mere two packages. Of course I'm aware rpm-ostree does more than just upgrading two packages, but I'm not convinced this can't be done any faster.

Building packages for Fedora is frustrating

A few packages I needed were missing: Lua language server, Stylua, the Source Code Pro fonts with support for Nerd Fonts, neovim-gtk, and an up-to-date ruby-install.

Wanting to do the right thing I decided to read up on creating RPM packages and setting up a copr repository; something I had to do for my keyboard layout anyway. The experience was deeply frustrating: documentation on RPM packages is scattered across different websites, some new and some ancient. These websites also manage to somehow present you a ton of text, but not actually explain anything useful at all.

The following is a brief rant on RPM packaging. If you're not interested in reading it, the summary is this:

The process of building an RPM is confusing and frustrating, especially compared to how easy it's to build a package for Arch Linux. This only affects those actually interested in building packages.

To illustrate how frustrating this process is: through reading some tutorials I came across the RPM %package macro, but finding out what it did was near impossible. If you search for "RPM package macro" on Google, the first result points to this page that doesn't mention the macro at all. The second result doesn't mention it either. In fact, none of the results seem to mention this macro, and searching for "RPM %package macro" doesn't work either as the % is ignored. At some point I found this page which briefly mentions what it does, but to do that I had to:

Go to https://rpm.org/index.html
Click on "Documentation" and end up at https://rpm.org/documentation.html
Click on "RPM Reference Manual" and end up at https://rpm-software-management.github.io/rpm/manual/
Click on "Spec Syntax" and end up at https://rpm-software-management.github.io/rpm/manual/spec.html
Search for %package on the page

While this may seem like a weirdly specific issue to mention, I ran into issues like this constantly while trying to figure out what the idiomatic/modern way is of building an RPM.

Of course it gets worse. What would make sense is having just one tool to build a package, and maybe a separate tool to upload it to copr and start a build. Of course there isn't just one tool: this is Linux where people disagree on just about everything.

Building RPM packages involves two low-level programs: spectool and rpmbuild. spectool is used for just listing and downloading sources from an RPM .spec file, which describes how to build a package. Of course in typical Linux fashion it only downloads external sources, so if you list a local file as a source (e.g. an icon to install), you'll need to move them into the right place yourself. rpmbuild only concerns itself with building a package, and straight up ignores any sources listed in your spec file.

Of course people using these tools realised this isn't nice and decided to fix it by unifying the two into one program that everybody uses. Right? No, of course not, that would make too much sense.

First we have rpkg-util which builds upon the two mentioned tools and adds some templating capabilities. It's the default build strategy for copr when building from a VCS repository, so you'd think it's the way to build a package. But of course it's no longer maintained per their README, and looking at existing packages on copr it seems it's not used a lot. Oh and it also spits out the most useless error messages I've ever seen, such as this:

$ rpkg local --spec ~/path/to/spec/outside/of/the/current/dir
git_dir_version failed with value 1

Then there's tito, which tries to do a whole bunch of things related to packaging and releasing, but somehow doesn't actually make the process easier. It's default output is incredibly verbose and makes debugging build errors near impossible, it doesn't handle patch files, and its documentation is sorely lacking. Similar to rpkg-util I also wasn't able to find any big projects that use it, even though tito has been around for over a decade.

For the record, I understand how one ends up with a situation like this, and I have nothing against the people working on these tools, but having gone through this process I think I now understand why RPM packages are less commonly available compared to those for other distributions.

As for my own packages, I resorted to using spectool and rpmbuild directly through a Makefile. For example, for lua-language-server I use the following Makefile:

SPEC := lua-language-server.spec
TOP := ${PWD}/build

prepare:
	rm -rf build
	spectool --define "_topdir ${TOP}" -gR ${SPEC}
	cp -p sources/* build/SOURCES/

srpm: prepare
	rpmbuild --define "_topdir ${TOP}" -bs ${SPEC}

rpm: prepare
	rpmbuild --define "_topdir ${TOP}" -bb ${SPEC}

clean:
	rm -rf build

.PHONY: srpm prepare rpm

The --define flags are there so the RPM files and directories end up in ./build instead of in your home directory. This way you can build multiple packages without their source files potentially conflicting.

To publish a new package I then update the .spec file by hand (e.g. adjusting the version), run make srpm, followed by copr build lua-language-server path/to/the/built/srpm. It's not too bad, but it's still worse than just running makepkg -s on Arch Linux. If you're looking into building a package for Fedora, I'd suggest doing something similar to the above and just avoid rpkg-util and tito entirely unless you are certain you need these tools.

SELinux can be frustrating

Before installing Silverblue I made a backup of my TLP configuration. While Fedora ships with power-profiles-daemon, I've read a little too much about it not doing much more than just throttling your CPU, so I decided to stick with TLP. After all, TLP works fine so why bother replacing it. I installed TLP, replaced the default /etc/tlp.conf configuration file with my own, and reset its ownership to root:root. When I tried to start TLP using sudo systemctl start tlp, it failed. Of course when I ran it manually it worked just fine.

After a while I found out this was a SELinux problem, probably due to certain SELinux settings/permissions getting lost when I replaced the default file. To fix this I ran sudo fixfiles restore /etc/tlp.conf, after which TLP started up without issue.

While SELinux does log when there are errors (assuming you even remember that it does and where they're stored), the logs themselves aren't helpful. For example:

type=AVC msg=audit(1677382357.686:651): avc:  denied  { read } for
pid=16822 comm="tlp-readconfs" name="tlp.conf" dev="dm-0" ino=533021
scontext=system_u:system_r:tlp_t:s0
tcontext=system_u:object_r:dosfs_t:s0 tclass=file permissive=0

While this log line includes a ton of information, it does nothing to help me understand what I need to do to fix the actual problem.

Fonts issues with Firefox

While using the Firefox Flatpak, I noticed the text was a little fuzzy and hard to read. Upon closer inspection I noticed it was applying subpixel rendering, even though this is turned off system-wide (as it should be). I found out this is due to Flatpak not allowing access to $XDG_CONFIG_HOME/fontconfig, which seems to result in Firefox (incorrectly) guessing what to do.

The solution is to use Flatseal to give Firefox access to the xdg-config/fontconfig:ro filesystem subset, then restart Firefox.

Locale errors when using Distrobox

I'm using Distrobox instead of Toolbox, though this issue may also apply to Toolbox: when running certain commands in the container, I was getting a "Failed to set locale, defaulting to C.UTF-8" error. Per this issue the fix is to run sudo dnf install glibc-langpack-en in your container, changing the package name according to the language you are using.

What went well, and some tips

There may have been more issues I ran into, but these are the ones I can remember. Most of these are specific to my setup though. For example, if you use a QWERTY keyboard then getting started is easier. The cost of figuring out how to build an RPM package is a one-time cost, and wouldn't apply to most users of Silverblue. In fact, I suspect most users would only run into the Firefox font problem, the Distrobox locale errors (assuming they're using Distrobox in the first place), and the slowness of rpm-ostree and DNF.

Apart from these issues, I'm enjoying Silverblue so far. I also like how the immutable nature of Silverblue forces you to rethink certain workflows or decisions, such as building a proper (reusable) package instead of just dumping some files in /usr or /etc, or using containers more actively. Not having to worry about updates breaking your system (or at least not as easily as on Arch Linux) is of course also great.

As far as tips and tricks go, there are a few that I can recommend.

Put the container name in your prompt

Because you'll be using containers when using Silverblue (at least when using the terminal), I recommend putting the name of the current container in your shell prompt. I use Fish and have my prompt configured as follows:

function fish_prompt
    if [ $PWD = $HOME ]
        set directory '~'
    else
        set directory (basename $PWD)
    end

    if test -n "$CONTAINER_ID"
        echo -n "[$CONTAINER_ID] "
    end

    set_color $fish_color_cwd
    echo -n $directory
    set_color normal
    echo -n " \$ "
end

Outside a container this results in a prompt like this:

Downloads $ input-here

And inside a container:

[fedora] Downloads $ input-here

Use GNOME terminal profiles for your containers

Distrobox can create .desktop files for your containers, making it easier to start/enter them. If you open a new tab in that terminal it will open the tab in the default shell, not in the container; at least when using GNOME terminal. To work around this I adjusted the generated .desktop file to instead start GNOME terminal with a dedicated profile like so:

[Desktop Entry]
Name=Fedora
GenericName=Terminal entering Fedora
Comment=Terminal entering Fedora
Category=Distrobox;System;Utility"
Exec=gnome-terminal --profile Fedora -- /usr/bin/distrobox enter --no-workdir fedora
Icon=/var/home/yorickpeterse/.local/share/icons/distrobox/fedora.svg
Keywords=distrobox;
NoDisplay=false
Terminal=false
TryExec=/usr/bin/distrobox
Type=Application

Here --profile Fedora specifies the GNOME terminal profile to use. The --no-workdir option ensures the new terminal process always starts in the container's home directory.

The GNOME terminal profile in turn is configured as follows:

Command → "Custom command" is set to distrobox enter --name fedora -- fish
Command → "Preserve working directory" is set to "Always"

This way opening new tabs results in them entering the container, while preserving the working directory of the previous tab.

Give your containers a custom home directory

This isn't necessary if you only intend to use a single container, but if you use multiple containers it's a must: when creating a container using Distrobox, the --home flag is used to specify a custom home directory. This way the container won't pollute your actual home directory, and two different containers using the same files in your home directory won't conflict. For example:

mkdir $HOME/homes
distrobox create --name fedora --image fedora:latest --home $HOME/homes/fedora

This creates a new container called "fedora" with its home directory set to ~/homes/fedora.

Inside the container you still have access to the real home directory. As all my projects are in ~/Projects (in my real home directory), I created a symbolic link to this folder from the container's home directory (running this inside the container):

ln -s /var/home/yorickpeterse/Projects $HOME/Projects

This way inside the container's home directory I can just run cd Projects, instead of cd ../../Projects.

Automatically stage rpm-ostree updates

I'm not sure how well this works if you still have GNOME Software installed (or if it's even necessary), but I have rpm-ostree set up to automatically stage updates. This is done in two steps:

Add AutomaticUpdatePolicy=stage to /etc/rpm-ostreed.conf under the [Daemon] section.
Run sudo systemctl reload rpm-ostreed followed by sudo systemctl enable --now rpm-ostreed-automatic.timer.

You can then verify if it's enabled by running rpm-ostree status. If enabled you should see a message at the top along the lines of:

AutomaticUpdates: stage; rpm-ostreed-automatic.timer: last run 24h ago

Layer adw-gtk3

GTK3 applications look different from GTK4 applications, which is annoying. We can fix this by using the adw-gtk3 project as follows:

Run sudo wget -P /etc/yum.repos.d/ https://copr.fedorainfracloud.org/coprs/nickavem/adw-gtk3/repo/fedora-37/nickavem-adw-gtk3-fedora-37.repo.
Run rpm-ostree install gnome-tweaks adw-gtk3 --apply-live.
Open Tweaks and go to "Appearance", then under "Legacy Applications" choose "Adw-gtk3". If the theme isn't listed there try rebooting first.

Conclusion

To conclude, I like Silverblue, in spite of the issues I ran into. In the coming weeks I'll also move my desktop over to Silverblue, and at some point in the future I'll also move my Windows gaming desktop to Silverblue. Most of my issues are specific to my setup and probably won't apply to most users, though I wouldn't recommend Silverblue to those not familiar with a terminal just yet; at least not until GNOME Software is less clunky and stops hogging memory.

I'm leaving GitLab to work on Inko full-time

2021-12-14T15:54:48Z

Back in October 2015 I joined GitLab. I think I was employee #28 at the time, with the total number of employees being somewhere between 30 and 40 if I'm not mistaken. Fast-forward to today, and GitLab has grown to almost 1600 employees.

While I enjoyed my time at GitLab, after a little over six years I feel it's time for something new. In particular, I want to be able to dedicate more time to Inko. With that in mind, I resigned from GitLab with my last day being December 31st 2021. Starting January 1st 2022, I'll be working on Inko full-time. The roadmap for 2022 is as follows:

Finish the new compiler written in Rust, which also implements a new memory management strategy for Inko.
Build a decentralised package manager
Grow the community

For now I'm not adding more to the roadmap, as I'm not yet sure how productive I'll be once I start working on Inko full-time.

The new memory management strategy is something I'm most excited about. This strategy combines the efficient heap layout from Immix with a single-ownership model, but without the lifetimes complexity found in Rust. The ownership model is based on the paper Ownership You Can Count On: A Hybrid Approach to Safe Explicit Memory Management, though I intend to extend it with additional compile-time analysis and support for generic data structures that support both owned and borrowed values. Of course this approach comes with its own trade-offs, but I feel these trade-offs are worth making, and will make Inko a compelling alternative to languages such as Python, Ruby, and Erlang.

If you'd like to support the project financially, you can do so through GitHub Sponsors. And if you'd like to follow progress made on Inko, consider joining the Matrix channel, as I'll post short updates there from time to time. I also intend to start recording videos on the development of Inko and maybe start streaming, but I think it will take a bit of time before I have the courage to do so.

Friendship ended with the garbage collector

2021-08-24T18:00:00Z

It's been a while since the last update about my work on the Inko programming language. Not because there hasn't been any progress, but because I've been busy making changes. A lot of changes.

For the past two years or so I have been toying with the idea of replacing Inko's garbage collector with something else. The rationale for this is that at some point, all garbage collected languages run into the same issue: the workload is too great for the garbage collector to keep up.

The solutions to such problems vary. Sometimes one has to spend hours tweaking garbage collection settings. Such settings often lack good documentation, and are highly dependent on the infrastructure used to run the software. Other times one has to use hacks such as allocating a 10 GB byte array.

This got me thinking: what if for Inko I got rid of the garbage collector entirely, preventing users from running into these problems? After spending some time looking into this (see this issue for more details), I decided to postpone the idea. I wasn't able to come up with a good solution at the time, so I decided to take another look at it in the future.

Earlier this year I read the paper "Ownership You Can Count On: A Hybrid Approach to Safe Explicit Memory Management". This paper is from 2006, and describes a single ownership model for managing memory. The approach outlined is pretty straightforward: you have owned values, and references. When an owned value goes out of scope, it's deallocated. When creating a reference, you increment a counter stored in the owned value the reference points to. When the reference goes out of scope, the count is reduced. When an owned value goes out of scope and its reference count is not zero, the program terminates with an error (which I'll refer to as a "panic").

Of course this approach has its own downside: a program may panic at when dropping an owned value, if it still has one or more references pointing to it. This is something you can prevent from happening (at least as much as possible) using compiler analysis. Since you still have a runtime mechanism to fall back to, this analysis doesn't have to be perfect. The result is that you can decide how you want to balance developer productivity, correctness, and the complexity of the implementation.

In contrast, Rust has a strict and complex ownership model. This model ensures that if your program compiles (and you don't use unsafe code), you won't run into memory related issues such as dangling references or use-after-free errors. The trade-off here is extra complexity, not being able to implement certain patterns in safe code (e.g. linked lists), and possibly more.

The approach outlined here was compelling enough for me to take another look at using a single ownership model for Inko. Along the way, I found out about a language called Vale, which draws inspiration from the same paper.

The current status

Replacing the garbage collector with a single ownership model (amongst other changes I'm making) is what I have been working on since March 2021. The progress is tracked in the merge request "Single ownership, move semantics, and a new memory layout". Besides introducing a single ownership model, the merge request introduces changes such as (but not limited to):

Throwing errors is much cheaper, with the cost being similar to a regular function return.
Defining processes is done similar to defining classes, and sending messages looks like regular method calls.
A new compiler written in Rust, replacing the Ruby compiler. When our self-hosting compiler is mature enough, the Rust compiler will be used to bootstrap the self-hosting compiler.
A greatly improved allocator. We still use the Immix heap layout, and heaps are now thread-local instead of process-local.
Method calls and field lookups no longer use hashing, and instead use regular index lookups.
Dynamic dispatch is handled using a hashing approach inspired by Shenanigans With Hash Tables. Using this approach we allow reopening of classes and implementing of traits after defining a class, without the need for fat pointers. The compiler will generate code such that collisions are rare, and that the cost of handling collisions is as small as possible.

Processes and messages

A big change that is the direct result of the single ownership model is how processes send messages to each other. The released version of Inko takes an approach similar to Erlang: each process has its own heap, and messages are deep copied when sent. This removes the need for sharing memory, which in turn removes the need for synchronisation. The cost is having to deep copy objects. This can be time consuming, and handling circular objects is a challenge. Copying of some objects can also fail at runtime (e.g. sockets), but there wasn't a nice way of handling this.

When you use a single ownership model, you don't need copying. Instead, you just transfer ownership to the receiving process. This also means you don't have to maintain a heap per process. Instead, you can maintain a heap per OS thread (to allow for fast thread-local allocations), as the ownership model guarantees no two processes can access the same object concurrently. The result is a nicer language, type-safe message passing, a reduction in memory usage due to processes being smaller, and lots of other improvements.

To illustrate this, here is a simple example of implementing a distributed counter:

async class Counter {
  @number: UnsignedInt

  async def increment {
    @number += 1
  }

  async def get -> UnsignedInt {
    @number
  }
}

def main {
  let counter = Counter { @number = 0 }

  counter.increment
  counter.increment
  counter.get # => 2
}

Defining processes is done using async class. When you create an instance of an async class, a lightweight process (not an OS process) is spawned that owns the instance. The process that created the instance is given a value of type async T, or async Counter in the above example. This type acts as the client, with the process acting as a server. Clients can be copied and sent to other processes.

Messages are essentially remote procedure calls, and look like regular method calls. When you create a process with one or more fields, or pass arguments along with your message, the ownership of the values is transferred to the receiving process. A few types can't be sent to different processes, such as references, closures, and generators.

Message processing happens in FIFO order. When all clients disconnect, and the process has no more messages to process, the process runs its destructor and terminates.

When you send a message, the sender waits for a result to be produced, without blocking the OS thread the process is running on. If you instead want a future to resolve later, you can use the async keyword (async counter.get instead of counter.get).

Circular types

In languages with single ownership, circular types such as doubly linked lists can be tricky to implement, typically requiring unsafe code such as raw pointers. In Inko, such types are easy to implement:

class DoublyLinkedList[T] {
  @head: ?Node[T]
}

class Node[T] {
  @value: T
  @next: ?Node[T]
  @prev: ?ref Node[T]
}

Here ?T is syntax sugar for Option[T], meaning it's an optional value. ref T is a reference to an owned value of type T.

We don't need destructors, as Inko drops fields in reverse lexical order. For our linked list example with nodes A and B (with B coming after A), the drop order is as follows:

1. A @prev
2. A @next --> 3. B @prev
               4. B @next
               5. B @value
6. B
7. A @value
8. A

When we reach step 8, the reference from B to A is dropped, so no error is produced.

For more complex types a custom destructor may be needed to drop fields in a different order, though such cases should be rare. Even then, you won't need any unsafe code.

Generics support both owned values and references

A challenge identified in the ownership paper is allowing generic types to support both owned values and references. The paper doesn't provide a solution, and instead mentions implementing different types (so one Array type for owned values, and one for references).

Inko will start by using pointer tagging to differentiate between owned values and references. We already use pointer tagging for immediate values, and had an extra bit to spare anyway. Any generic code that isn't inlined will use a runtime check of this bit when dropping a generically typed value.

I decided against the use of monomorphisation for several reasons:

We don't have (and I can't think of any) optimisations that can take advantage of it.
It increases compile times, and I want to keep these as low as possible.
Through inlining most generic types can be removed.
It increases memory usage.
The Array type is built into the VM, and the VM uses it in several different places. If we monomorhpise generic types (including Array), the VM needs to be refactored such that it doesn't use the Array type directly. If we don't, the VM won't know which implementation of the Array type to use.

In the future Inko may use a different approach, but for the time being pointer tagging should be good enough.

Heap layout

A benefit of garbage collected languages is that they can allocate and reclaim memory such that allocations are fast, and fragmentation is kept low. Inko retains the Immix heap layout and bump allocator. To reuse memory and combat fragmentation, Inko threads scans a chunk of their heap before running a process. When a reusable block of memory is found, it's moved to the end of the heap after the allocation position. Scanning is done incrementally, ensuring that each scan takes a fixed maximum amount of time. Objects are never moved around, as doing so requires traversing all live objects (or read barriers) to update pointers to the moved objects.

While this approach doesn't fully mitigate fragmentation, I believe it should be good enough for the foreseeable future.

Remaining work

While work on the new virtual machine is finished, I'm still working on the new compiler. As part of this I'll also need to rewrite parts of the self-hosting compiler code written thus far. I suspect it will take a few more months before the work is finished. I'm super excited about these changes, and I hope they will make Inko a more compelling language to use. They will also make Inko a much faster language.

If you'd like to stay up to date on the progress made, I recommend joining Inko's Matrix channel, or subscribing to /r/inko on Reddit.

libffi-rs 1.0.0 is released

2020-10-25T00:09:55Z

libffi-rs (GitHub repository) is a Rust crate that provides bindings to libffi. I've been using the crate for about two years now for Inko, and it works great.

Development of the crate slowed down in recent years, as the author Jesse A. Tov has been busy. To help the author out, I joined as a maintainer, and earlier today I released version 1.0.0 of the libffi crate.

What's new

Version 1.0.0 does not introduce any API changes compared to previous versions. What it does introduce is the removal of the dependency on the bindgen crate.

Previous versions of the crate use bindgen to generate libffi bindings at build-time, which requires libclang to be installed. While installing libclang is not a problem on Linux, on macOS and Windows it's a bit more tricky. The bindgen crate also introduces quite the list of build-time Rust dependencies: 37 direct and indirect dependencies to be exact.

Starting with version 1.0.0, these dependencies are no longer necessary. The removal of these dependencies means installing the crate is both easier and faster, while providing the same functionality as before.

For more information about these changes, take a look at this pull request.

Upgrading

Existing users should have no trouble updating to the latest version, as the public API remains unchanged compared to the previous version (0.9.0). To upgrade, change your dependency definition to the following:

[dependencies]
libffi = "1.0.0"

Future plans

There is an open pull request that improves ARMv7 support, which I would like to include in a future release. Apart from that there are no big plans at this time, as the crate works well enough in its current state.

10 years of software development

2020-06-30T00:00:00Z

June 2020 marks my 10-year anniversary of becoming a software developer. When I first started out I just turned 18 years old, and yet somehow thought I knew everything better than everybody else. Of course this wasn't true, quite the opposite in fact.

Fast forward 10 years, and I have learned quite a bit. I have learned new tools, gained new skills, learned how to sit properly behind a desk, the list goes on. But as I have gained more knowledge and experience, for some reason I have also become increasingly more critical of myself.

Take for example Inko. Developing a programming language is hard work, and requires a certain amount of knowledge. While I don't consider myself an expert by any means, what I have achieved with Inko thus far shows I have a certain amount of knowledge and experience. And yet with every step I question myself. "Why would anybody use this?", "This isn't fast enough", "That other language does it much better", "You are unlikely to succeed", the list of questions and comments in my head goes on. Worse, the more progress I make, the more critical I seem to become.

I don't know how to solve this, nor do I have any good advice for others dealing with the same problem. All I can say is this: know that you are not alone. In fact, it's probably safe to assume everybody you know suffers from the same problem to a certain degree.

Maybe in another 10 years I'll have a solution. Or perhaps by then I have learned to just live with it.

Deciding when to collect garbage

2019-12-02T17:15:00Z

How to perform garbage collection is a widely explored topic, and there are all sorts of different techniques. Sequential collectors, parallel collectors, concurrent collectors, incremental collectors, real-time collectors, the list goes on. There are also different techniques for allocators used, ranging from free list allocators to bump allocators.

Deciding when to perform garbage collection appears to be written about less frequently. I suspect the reason for this is that deciding when to collect is specific to a programming language's behaviour. For example, languages using immutable objects will allocate a lot and thus more frequent collections may be desired.

Let's illustrate this using the best book one can buy to learn more about garbage collection: The Garbage Collection Handbook, 2nd Edition. This book consists of 416 pages, excluding the preface, table of contents, glossary, etc. These 416 pages cover pretty much everything there is to know about garbage collectors, how to implement them, what their trade-offs are, and so on.

Of these 416 pages, I could not find any that focus specifically on when to collect garbage. I do vaguely recall it's discussed somewhere in the book, but I was unable to find this by looking at the table of contents and skimming through several chapters.

In this article we'll take a look at the different techniques that can be used to decide when to collect garbage, how to implement such a technique, and what techniques a few programming languages out there use.

Deciding when to collect
The flaw of collecting based on allocations
Deciding when to collect using Rust
Languages and what techniques they use
- Inko
- Java
- Lua
- Ruby
Conclusion

Deciding when to collect

Let's start by taking a look at the different ways a collector can determine if garbage collection is necessary, in no particular order.

Collecting based on object allocation counts

This approach is the most simple, and a commonly used approach. When a certain number of objects is allocated since the last collection, we trigger a collection. At the end of a collection we reset this counter. This is repeated until the program terminates.

Most collectors using this approach will increase the threshold as the program runs, if needed. For example, a collector may decide to double the threshold if it could not release enough memory during a collection. This ensures that garbage collections don't happen too frequently.

Collecting based on object sizes

A refinement of collecting based on object counts is to trigger a collection after allocating a certain number of bytes. This is useful when you have objects of different sizes. Imagine a system where we collect based on object counts, and we allocate lots of large objects, but not enough to cross the allocation count threshold. Because we collect based on counts and not sizes, we may end up wasting more memory than necessary.

Obtaining the size of an object may not always be easy, or cheaper than just counting the number of objects. It's also not helpful if all objects are the same size, as counting the size would thus be the same as just counting the amount of objects.

Collecting based on object weights

Just allocating memory is not always all that needs to be done to initialise an object. Fields need to be filled in, synchronisation may be needed based on what kind of object is allocated, and so on. Instead of collecting based on the number of allocated objects, a collector may decide to assign a weight to every object, triggering a collection when the total weight exceeds a certain threshold.

Collecting based on the number of memory blocks

Counting individual object allocations may get expensive if allocations happen frequently. Allocators in turn frequently divide memory in blocks, such as a block of 8 KB. A collector can then decide to not count the number of allocated objects, but the number of blocks in use. If a block can contain 100 objects, this means we only need to increment and check our statistics once every 100 allocations; instead of doing so on every allocation. This may improve performance, but can also delay garbage collection.

Collecting based on the usage percentage of a fixed-size heap

Instead of collecting based on a counter crossing a threshold, we assign a fixed size to our heap. When a certain percentage of this heap is used we trigger a collection. When the heap is full, we trigger a collection and/or error if no additional memory is available.

This approach allows us to enforce an upper limit on the size of the heap, which can be useful in memory constrained environments. The downside is that consuming the entire heap may lead to the program terminating (depending on what the collector does in this case), even when the system has memory available.

This approach also may not work well if tasks (lightweight processes, threads, and so on) have their own heap, as preallocating memory for these heaps may be expensive and end up consuming a lot of (virtual) memory.

Collecting between web requests

A less common approach sometimes employed by web applications is to disable garbage collection by default, and manually run it after completing a web request. The idea of this approach is to defer any garbage collection pauses until after a request, preventing garbage collections from negatively affecting the user experience.

In practise I think this won't work as well as one might think. While an accepted request won't be interrupted by a collection, future requests may take longer to be handled due to a collection running between requests. With that said, this can be influenced by the application's behaviour, so perhaps there are cases where this does help.

Collecting after a certain time has passed

Instead of collecting based on some incremented number, a collector may decide to collect after a certain amount of time has passed. To the best of my knowledge this approach is not commonly used on its own. Instead, it's sometimes used as a backup of sorts to ensure collections run periodically, even when allocating only a small number objects.

Using this approach on its own is unlikely to work well, as there is no correlation between the time elapsed and the need to collect garbage. That is, just because five minutes have passed does not mean a collection is needed.

Go appears to use (or at least has used) this approach to force a garbage collection if no collection has taken place for more than two minutes. I have not been able to confirm if Go still does this as of Go 1.13.

Collecting when the system runs out of memory

When the operating system runs out of memory, we may want to trigger a collection in an attempt to release memory back to the operating system. This approach, if used, can be useful when used on top of another technique to trigger regular collections.

The effectiveness of this is debatable. When collecting garbage we may need to allocate some memory for temporary data structures (e.g. a queue to track objects to scan), but this may result in the operating system terminating the program as no memory is available. Since there is also no guarantee that a collector is able to release memory back to the operating system, this may result in collections wasting time.

Collecting based on past collection statistics

This is another technique that may be applied on top of a previously mentioned technique: trigger a collection (earlier) based on statistics gathered from a previous collection cycle. For example, a collector may decide to delay a collection if the previous collection spent too much time tracing objects. By delaying the collection, the collector may need to trace fewer objects the next time it runs.

The flaw of collecting based on allocations

Triggering collections based on allocations comes with a flaw: allocations and the amount of garbage may not necessarily be related. This means that in some cases a collection may be triggered too soon, while other times collections may be triggered too late.

Since tracing collectors operate on the live objects, there's not much that can be done about this. Reference counting collectors operate on dead objects and thus would have a better view of how much garbage there is, but efficient reference counting collectors are complex and come with their own drawbacks. High performance reference counting collectors also behave similar to tracing collectors, but may be much more complex to implement; meaning you may be better off just using a tracing collector.

There may be some sort of hybrid approach where a tracing collector keeps track of (an estimate of) dead objects, without using a fully blown reference counting system. These statistics (perhaps in combination with other statistics) could then be used to decide when a collection is needed. I am not aware of any collectors that use this technique, and I have my doubts about the benefits being greater than the drawbacks this technique would introduce.

Deciding when to collect using Rust

With that all covered, let's implement a simple strategy to determine when to collect by counting the allocated objects. For these examples I'll use Rust. First we'll start with some boilerplate:

use std::alloc::{alloc, handle_alloc_error, Layout};

pub struct Heap {
    /// The number of objects allocated since the last collection.
    allocations: usize,

    /// The number of objects to allocate to trigger a collection.
    threshold: usize,

    /// The factor to grow the threshold by (2.0 means a growth of 2x).
    growth_factor: f64,

    /// The percentage of the threshold (0.0 is 0% and 1.0 is 100%) that should
    /// still be in use after a collection before increasing the threshold.
    resize_threshold: f64,

    /// The number of objects marked during a collection.
    marked: usize,
}

impl Heap {
    pub fn new() -> Self {
        Self {
            allocations: 0,
            threshold: 32,
            growth_factor: 2.0,
            resize_threshold: 0.9,
            marked: 0
        }
    }
}

The Heap type would be used for storing heap information (e.g. a pointer to a block of memory to allocate into), and the number of allocations. For the sake of this article we keep this implementation as simple as possible. We use an arbitrary growth factor of 2.0. We use a float to allow for more precise growth factors, such as 1.5 or 2.3. Other values such as the threshold and resize threshold are also arbitrary.

Let's add a method to allocate objects:

impl Heap {
    pub fn allocate(&mut self, size: usize) -> *mut u8 {
        let layout = Layout::from_size_align(size, 8)
          .expect("The size and/or alignment is invalid");

        let pointer = unsafe { alloc(layout) };

        if pointer.is_null() {
            handle_alloc_error(layout);
        }

        self.allocations += 1;

        pointer
    }
}

Our Heap::allocate() method takes the number of bytes to allocate as an argument, returning a raw pointer to the allocated memory. For the sake of simplicity we align memory to 8 bytes. If an allocation fails (NULL is returned), we let Rust handle this for us.

Now that we have the method to allocate memory, let's add two methods: one to check if a collection is needed, and one to increase the threshold if needed:

impl Heap {
    pub fn should_collect(&self) -> bool {
        self.allocations > self.threshold
    }

    pub fn increase_allocation_threshold(&mut self) {
        let threshold = self.threshold as f64;

        if (self.marked as f64 / threshold) < self.resize_threshold {
            return;
        }

        self.threshold = (threshold * self.growth_factor).ceil() as usize;
    }
}

Heap::should_collect() is simple and should not need any explaining. Heap::increase_allocation_threshold() checks if the number of marked objects (this value would be updated by the collector while tracing objects) is too great, increasing the threshold (using a growth factor) if needed.

That's all there is to it. Well, almost: a real collector probably needs to store more data, update the statistics in the right place, and so on; but just the code for deciding when to collect is straightforward.

Languages and what techniques they use

Now let's take a look at some programming languages out there, and what approach they use to determine when a collection is needed.

Inko

Inko uses lightweight processes, each process has its own heap, and the collector collects each process independently. The process heaps consists of one or more 8 KB blocks. After a collection, the collector returns any free blocks to a global collector for later use. Any still full blocks the collector puts aside so they won't be used for allocations. Any blocks with space available can be reused once the process resumes.

Every time a block is requested from the global allocator, a block allocation counter is incremented. This is done for both the young and mature generation. When this counter exceeds a certain threshold, a collection is triggered. If after a collection the collector determines not enough blocks could be returned to the global allocator, it will increase the threshold for the next allocation. The various settings used for this (the initial thresholds, growth factors, etc.) can all be configured using environment variables.

The current block thresholds are 8 MB for the young generation, and 16 MB for the mature generation. These thresholds are arbitrary, and they will probably change in the future. The mature generation threshold in particular seems rather high, as 16 MB of blocks translates to around half a million objects; far too much for a single lightweight process.

Java

The JVM enforces a maximum heap size that is configured when starting the JVM. Due to all the different collectors the JVM supports it's hard to determine what triggers a garbage collection. I suspect it's based on a variety of statistics, such as how much of the (fixed-size) heap is in use, previous collection timings, and so on.

Lua

Per this document, Lua 5.4 has two garbage collection modes: an incremental collector, and a generational collector. Both collectors seem to use a similar approach to deciding when to collect: when the amount of bytes allocated grows beyond a certain value, a collection is triggered.

Ruby

Ruby uses several statistics to determine when to collect, and if a minor or full collection should be performed. The article Understanding Ruby GC through GC.stat covers these various statistics pretty well.

When a certain number of objects is allocated, Ruby runs a minor collection. Full collections can be triggered if the number of promoted objects exceeds a threshold, or if one of several other conditions (which we won't cover here) is met. Ruby will also increase these thresholds if needed, though I can't remember if the collector always increases these thresholds, or only in certain cases.

Conclusion

While this article is not the most in-depth overview of deciding when to trigger a garbage collection, I hope it's useful enough to give a better understanding of what may trigger a collection, and what impact the various techniques will have.

Not beating C with 96 lines of Inko

2019-11-22T12:00:00Z

The article "Beating C with 80 Lines of Haskell" discusses writing a simplified version of wc using Haskell, and how it performs compared to the C implementation. This resulted in various other people writing the same program in different languages, and writing about doing so. At the time of writing, there are implementations for:

Today we will be taking a look at writing a similar program in Inko.

Benchmarking & setup

Several articles mentioned above include some benchmarking data, such as how long it takes to count the words of a file with a certain size (e.g. 1GB). While we will also discuss some benchmarking data, it's important to not focus on them too much. Instead, the numbers discussed below should be treated as rough estimates at best.

For this article we will be comparing the Inko implementation to GNU wc version 8.31, running on a 7th generation Thinkpad X1 Carbon. The CPU is a Intel Core i5-8265U. The CPU governor used is the "performance" governor, and the clock speed is 3.8 Ghz. The OS is Arch Linux running Linux kernel version 5.3.11. The storage device is an NVMe SSD.

Implementation

Like the other implementations, our implementation expects ASCII input. We also won't implement any command-line options, or other features of wc. Our input set will be this file from the Haskell implementation. The file size is 6.2 MB.

For our Inko implementation we will take an approach to counting words similar to the Go (and other) implementations: we read our input into a byte array, in chunks of 64 KB. When we encounter a whitespace character, we set a flag and increment the line count. When we reach a non-whitespace character and the flag is set, we increment the word count and unset the flag. We repeat this until we have consumed all input bytes.

Importing our dependencies

Let's start by importing the types and modules we need:

import std::byte_array::ByteArray
import std::env
import std::fs::file
import std::pair::Pair
import std::process
import std::stdio::stdout
import std::string_buffer::StringBuffer

ByteArray stores a sequence of bytes, as actual bytes and not as (signed) integers. This means a ByteArray of 4 bytes needs 1 byte per value, instead of 8 bytes (when using an integer). This type is not imported by default, so we have to explicitly import it.

The module std::fs::file provides file IO types and methods. Inko uses different types for files based on the open mode, such as ReadOnlyFile for read-only files. We will see this in action later.

Pair is a binary tuple. We will use this so we don't have to define our own types for in several places.

Unlike languages such as Ruby, operations using STDERR, STDOUT, and STDIN require you to import the appropriate modules; instead of relying on global methods or types. The module std::stdio::stdout is used for writing to STDOUT.

Our last import is the importing of the StringBuffer type. Inko does not have string interpolation or formatting, so concatenating strings together (without producing intermediate strings) requires the use of the StringBuffer type. This is a bit clunky, but it's good enough for now.

Constants

Next we will define several constants that we need to access in several methods:

let CONCURRENCY = 8
let MAIN = process.current
let NEWLINE = 10
let SINGLE_SPACE = 32
let SPACE_RANGE = 9..13

The CONCURRENCY constant controls the number of processes we will spawn to count words. The simplest approach would be to spawn one process for every chunk. Since the work is purely CPU bound doing so doesn't improve performance if we end up spawning more processes than the number of CPU cores.

The MAIN constant stores an object containing information about the current process. All processes we spawn for counting words will send their results to this process.

The next three constants define some byte values: byte 10 is the Unix newline separator, byte 32 is a single space, and the range 9..13 covers all ASCII whitespace characters (newlines, tabs, etc). In Inko A..B creates an inclusive range from A to B.

Counting words

It's time to define the methods and types we need to count the words in a ByteArray, starting with two methods: space? and worker_loop:

def space?(byte: Integer) -> Boolean {
  SPACE_RANGE.cover?(byte).or { byte == SINGLE_SPACE }
}

def worker_loop {
  let chunk = process.receive as Chunk

  MAIN.send(chunk.count)

  worker_loop
}

The space? method returns True if the input byte is a whitespace character, such as a single space or a newline. Inko has no if/else/or/and statements, instead it uses messages, methods, and closures. Instead of writing A || B, you would write A.or { B }, where or is a message sent to A. The curly braces { B } denote a closure, which in this case returns whatever B is.

The worker_loop method is a tail-recursive method called by the processes that count words. Each loop the process will wait for an incoming message using process.receive. Sending messages to processes uses dynamic typing, and Inko is pretty strict about dynamic typing. For example, passing a dynamic type (Dynamic) as an argument does not work if a non-dynamic type (e.g. Integer) is expected. Sending messages to a dynamic type is fine, and will produce a new dynamic type. This means we could condense this method to the following:

def worker_loop {
  MAIN.send(process.receive.count)

  worker_loop
}

The reason we don't do this is to make it more clear what input we expect in this method, and to prevent us from using the wrong method(s).

Inko supports tail call elimination, so our worker_loop method will not overflow the stack. We could also use a closure and send the loop message to it:

def worker_loop {
  {
    let chunk = process.receive as Chunk

    MAIN.send(chunk.count)
  }.loop
}

This achieves the same results and in fact loop is implemented using tail recursion. Since using tail recursion ourselves in this method requires a little less code we just use that, instead of using loop.

Now it's time to create an object used for counting words, which we will call Chunk. This type will hold some state, such as the bytes to process and the number of lines counted so far. We use a dedicated type so it's a bit easier to send input to the word counting processes, and so we can use tail recursion when iterating over the bytes to process. We define objects using the object keyword:

object Chunk {

}

Object attributes need to be defined explicitly when we define the object, so let's do that:

object Chunk {
  @previous_is_space: Boolean
  @bytes: ByteArray
  @lines: Integer
  @words: Integer
  @index: Integer
}

In Inko we define and refer to attributes using the syntax @NAME. The @ is part of the name, so it's valid to define both an attribute @foo and a method foo. When defining attributes we must also specify the type, such as Integer for the attribute @index. The attribute @previous_is_space is used to record if a previously processed byte was a whitespace character.

Now we need to define our initialiser method, which is always called init:

def init(previous_is_space: Boolean, bytes: ByteArray) {
  @previous_is_space = previous_is_space
  @bytes = bytes
  @lines = 0
  @words = 0
  @index = 0
}

This method just sets the attributes to the right value. If we forget to set an attribute in the init method, the compiler will produce an error.

We can now define a method to count words and lines, which we will creatively name "count":

def count -> Pair!(Integer, Integer) {
  let byte = @bytes[@index]

  byte.nil?.if_true {
    return Pair.new(@lines, @words)
  }

  space?(byte!).if(
    true: {
      (byte == NEWLINE).if_true {
        @lines += 1
      }

      @previous_is_space = True
    },
    false: {
      @previous_is_space.if_true {
        @words += 1
        @previous_is_space = False
      }
    }
  )

  @index += 1

  count
}

That's quite a lot to take in, so let's break it down. We start by obtaining the current byte, and checking if it's Nil. Accessing an out of bounds index in a ByteArray is valid, and returns Nil. When this is the case we have consumed all input, and we can return the number of lines and words we have counted. Instead of creating a custom object to store the lines and words, we use the Pair type.

Remember that Inko does not have if statements, and instead uses messages and method calls. Here if_true is sent to the result of byte.nil?, and the closure passed as its argument will only be run if byte.nil? produced boolean true.

Next up we have the code that determines what to do with the current byte:

space?(byte!).if(
  true: {
    (byte == NEWLINE).if_true {
      @lines += 1
    }

    @previous_is_space = True
  },
  false: {
    @previous_is_space.if_true {
      @words += 1
      @previous_is_space = False
    }
  }
)

We use the space? method we defined earlier on, and pass it the current byte. We use byte! instead of just byte, as the type of byte is ?Integer (an Integer or Nil). Since space? expects an Integer, we have to cast our byte variable to the right type. Doing this by hand gets tedious, so Inko offers the ! postfix operator to do just that.

Once we have obtained the result of space?, we send the if message to it and pass two arguments: a closure to run when the receiver is true, and a closure for when the receiver is false. Here true: and false: are just keyword arguments used to clarify the purpose of the closures.

The last two lines are pretty simple: we just increment the byte index by 1, then tail recurse back into the count method.

Scheduling work

Now that we have our methods and types in place, we can start scheduling the work. We'll start by opening the file in read-only mode, making sure a file is actually provided:

env.arguments[0].nil?.if_true {
  process.panic('You must specify a file to process')
}

let path = env.arguments[0]!
let input = try! file.read_only(path)

env.arguments[0] returns the first command-line argument, or Nil if no there are no arguments. If this happens, we exit the program with a panic.

Our file is opened using file.read_only(path), which opens the file path points to in read-only mode. We use try! to cause a panic if the file could not be opened, since there isn't much we can do without being able to open the file.

Bored yet? No? Good, we're almost there!

Now it's time to start our worker processes, and to start scheduling work:

let workers =
  CONCURRENCY.times.map do (_) { process.spawn { worker_loop } }.to_array

let mut bytes = 0
let mut words = 0
let mut lines = 0
let mut previous_is_space = True
let mut jobs = 0
let buffer = ByteArray.new

The workers assignment is the most interesting. The bit CONCURRENCY.times.map creates an iterator that runs 8 times (since we set CONCURRENCY to 8), mapping the input value (an integer ranging from 0 to 7) to the result of process.spawn. Since we don't care about the input integer, we define the argument name as _. We then collect the results into an Array using the to_array message. Each spawned process runs the worker_loop method, until the program is finished. The other variables are not interesting, so let's skip those.

We will divide work across the processes in a round-robin fashion, until we run out of bytes to read. Every process is given a chunk of equal size:

{
  try! input.read_bytes(bytes: buffer, size: CHUNK_SIZE).positive?
}.while_true {
  workers[jobs % workers.length]
    .send(Chunk.new(previous_is_space: previous_is_space, bytes: buffer))

  previous_is_space = space?(buffer[-1]!)

  bytes += buffer.length
  jobs += 1

  buffer.clear
}

We create a closure that returns the result of input.read_bytes(...).positive?, which is a boolean. The result of input.read_bytes(...) is an integer signaling the number of bytes read. If the operation fails, we panic (by using the try! keyword). The method read_bytes reads bytes into a provided ByteArray, instead of returning a ByteArray.

while_true is a message sent to this closure, and will run its argument (also a closure) as long as the receiver returns boolean true.

Work is balanced across processes by sending the chunks to processes:

workers[jobs % workers.length]
  .send(Chunk.new(previous_is_space: previous_is_space, bytes: buffer))

The expression jobs % workers.length produces an integer/index between zero and the last index in the workers array. Since the workers Array stores Process objects, we can just send send to them to have the message (a Chunk object in this case) sent to the process.

Since we perform work in parallel, we have to determine if a chunk follows whitespace when scheduling them. We do this using previous_is_space = space?(buffer[-1]!). Inko allows you to access negative indexes of Array and ByteArray types, which translate to indexes from the end of the list. In other words, the index -1 accesses the last element in the list.

After this we just increment the number of bytes read, the number of jobs scheduled, and we clear our buffer. We reuse the same ByteArray so we don't have to create a new one for every 64 KB of bytes that we read.

Now we can wait for all the results to be sent back from our workers, then present them:

{ jobs.positive? }.while_true {
  let count = process.receive as Pair!(Integer, Integer)

  lines += count.first
  words += count.second

  jobs -= 1
}

stdout.print(StringBuffer.new(
  ' ',
  lines.to_string,
  ' ',
  words.to_string,
  ' ',
  bytes.to_string,
  ' ',
  path
))

Here we wait for incoming messages, cast them to the right type (a Pair of the number of lines and words), then add the results to the total number of lines and words. Lastly, we present the results by writing them to STDOUT.

Our final version looks like this:

import std::byte_array::ByteArray
import std::env
import std::fs::file
import std::pair::Pair
import std::process
import std::stdio::stdout
import std::string_buffer::StringBuffer

let CONCURRENCY = 8
let MAIN = process.current
let NEWLINE = 10
let SINGLE_SPACE = 32
let SPACE_RANGE = 9..13
let CHUNK_SIZE = 64 * 1024

def space?(byte: Integer) -> Boolean {
  SPACE_RANGE.cover?(byte).or { byte == SINGLE_SPACE }
}

def worker_loop {
  let chunk = process.receive as Chunk

  MAIN.send(chunk.count)

  worker_loop
}

object Chunk {
  @previous_is_space: Boolean
  @bytes: ByteArray
  @lines: Integer
  @words: Integer
  @index: Integer

  def init(previous_is_space: Boolean, bytes: ByteArray) {
    @previous_is_space = previous_is_space
    @bytes = bytes
    @lines = 0
    @words = 0
    @index = 0
  }

  def count -> Pair!(Integer, Integer) {
    let byte = @bytes[@index]

    byte.nil?.if_true {
      return Pair.new(@lines, @words)
    }

    space?(byte!).if(
      true: {
        (byte == NEWLINE).if_true {
          @lines += 1
        }

        @previous_is_space = True
      },
      false: {
        @previous_is_space.if_true {
          @words += 1
          @previous_is_space = False
        }
      }
    )

    @index += 1

    count
  }
}

env.arguments[0].nil?.if_true {
  process.panic('You must specify a file to process')
}

let path = env.arguments[0]!
let input = try! file.read_only(path)

let workers =
  CONCURRENCY.times.map do (_) { process.spawn { worker_loop } }.to_array

let mut bytes = 0
let mut words = 0
let mut lines = 0
let mut previous_is_space = True

let mut jobs = 0
let buffer = ByteArray.new

{
  try! input.read_bytes(bytes: buffer, size: CHUNK_SIZE).positive?
}.while_true {
  workers[jobs % workers.length]
    .send(Chunk.new(previous_is_space: previous_is_space, bytes: buffer))

  previous_is_space = space?(buffer[-1]!)

  bytes += buffer.length
  jobs += 1

  buffer.clear
}

{ jobs.positive? }.while_true {
  let count = process.receive as Pair!(Integer, Integer)

  lines += count.first
  words += count.second

  jobs -= 1
}

stdout.print(StringBuffer.new(
  ' ',
  lines.to_string,
  ' ',
  words.to_string,
  ' ',
  bytes.to_string,
  ' ',
  path
))

Performance

Let's start by running GNU wc to see how it performs:

$ time -f "%es %MKB" wc big.txt
 128457 1095695 6488666 big.txt
0.03s 2136KB

This only took 0.03 seconds (30 milliseconds), and used a peak RSS of 2.08 MB. Not bad!

Now let's see how our Inko implementation performs:

$ time -f "%es %MKB" inko wc.inko big.txt
 128457 1095695 6488666 big.txt
8.34s 260272KB

Ouch! Our implementation uses a peak RSS of 254 MB, and takes 8.34 seconds to count the words and lines. What's going on here? Is our implementation bad, or is Inko just slow?

Well, sort of. Our implementation isn't bad at all. Maybe it would be a bit nicer if we wouldn't have to use the StringBuffer type, but apart from that there is not a lot worth changing. Instead, the problem is Inko. More precisely, the complete lack of optimisations applied by Inko's compiler.

Optimisations, or lack thereof

When creating a programming language you need a compiler to compile your language. The first compiler thus needs to be written in a different language. For Inko I opted to use Ruby since it's widely available, and a language I have worked with for almost ten years. The goal is to rewrite Inko's compiler in Inko itself, something that is actively worked on.

Because we want to replace the Ruby compiler with a compiler written in Inko, we spent little time on adding optimisations to the Ruby compiler. In fact, the only optimisations it applies are:

Tail call elimination
Replacing keyword arguments passed in-order with positional arguments

Other languages typically perform some form of method inlining, constant folding, optimising certain method calls into specialised instructions (e.g. translating A + B into something that doesn't require a method call), etc. Inko's current compiler does none of that, producing code that does not perform as well as it should.

Closure allocations

This brings us to the main problem of our implementation: closure allocations. Specifically, the use of closures instead of statements such as if and while. Allocating a closure is not that expensive, but in our implementation of wc we are allocating a lot. Our count method alone will create at least five closures for every byte. For a 64 KB chunk that results in a total of 327 680 closures. More allocations also means more garbage collections. While we can reuse memory after a collection, collections still take up time.

To combat this we plan to add an optimisation pass to the self-hosting compiler that will eliminate closure allocations where possible. For example, cases such as if_true and if_false can be optimised to not use closures at all. It's hard to say how big the impact of this would be on our wc implementation, but I would not be surprised if we can cut the runtime in half; or maybe reduce it even more.

Garbage collection performance

Another problem we are running into is that Inko's garbage collector is spending far more time tracing objects than should be necessary. Under normal circumstances Inko's garbage collector is able to trace lots of objects in less than one millisecond, but for our wc implementation it can take several milliseconds to trace 20-30 objects. We can see this by running our wc implementation while setting the environment variable INKO_PRINT_GC_TIMINGS to true (some output is removed to keep things readable):

$ env INKO_PRINT_GC_TIMINGS=true time -f "%es %MKB" inko wc.inko big.txt
[0x7fb240004ec0] GC in 2.528122ms, 28 marked, 0 promoted, 0 evacuated
[0x7fb240004670] GC in 15.437073ms, 28 marked, 0 promoted, 0 evacuated
[0x7fb240005630] GC in 28.714244ms, 28 marked, 0 promoted, 0 evacuated
[0x7fb240007440] GC in 30.711002ms, 28 marked, 0 promoted, 0 evacuated

This even happens when we limit the number of tracing threads to 1, instead of the default of half the number of CPU cores:

$ env INKO_TRACER_THREADS=1 \
    INKO_PRINT_GC_TIMINGS=true time -f "%es %MKB" inko wc.inko big.txt
[0x7fdbfc005dd0] GC in 581.006µs, 28 marked, 0 promoted, 0 evacuated
[0x7fdbfc005630] GC in 2.047803ms, 28 marked, 0 promoted, 0 evacuated
[0x7fdbfc007bb0] GC in 918.097µs, 28 marked, 0 promoted, 0 evacuated
[0x7fdbfc004ec0] GC in 1.104836ms, 28 marked, 0 promoted, 0 evacuated

The timings may be a bit better, but they are still pretty bad given we end up only marking a small number of objects. Take the following program as an example:

object Thing {}

let things = 28.times.map do (_) { Thing.new }.to_array

1_000_000.times.each do (integer) {
  integer.to_float
}

Here we create an array containing 28 Thing instances, which we keep around. We then create one million float objects, which are heap allocated. If we run this with the INKO_PRINT_GC_TIMINGS variable set, the output is as follows:

$ env INKO_PRINT_GC_TIMINGS=true inko foo.inko
[0x5620ad17df70] GC in 523.047µs, 44 marked, 0 promoted, 0 evacuated
[0x5620ad17df70] GC in 480.612µs, 46 marked, 0 promoted, 0 evacuated
[0x5620ad17df70] GC in 493.339µs, 63 marked, 43 promoted, 0 evacuated
[0x5620ad17df70] GC in 552.766µs, 9 marked, 0 promoted, 0 evacuated

These timings are much closer to what one would expect.

It's not quite clear yet what is causing this slowdown. Based on some profiling using Valgrind I suspect the crossbeam library (which we use in the garbage collector) is to blame, as Valgrind's data suggests most time is spent in crossbeam code; even though the code should be fast. The crossbeam types we use rely on an epoch based garbage collection mechanism, and per this crossbeam RFC it seems this may not work too well when spawning lots of short-lived threads; as is done when tracing objects.

A possible solution would be to use a fixed-size thread pool for tracing objects, instead of spawning tracing threads on-demand. We do not use this approach at the moment because the current approach is easier to implement. An approach I have been thinking of is to give each collector thread its own pool of tracing threads, spawned when the collector threads first starts up. This approach means a tracing pool only ever collects a single process at a time, allowing us to pass certain data around once (= when starting the tracing), instead of having to pass it around with every new job that is scheduled. This is something I will have to take a look at in the coming weeks.

Wrapping up

We did not manage to beat C with Inko, but that was never the goal of this exercise. Instead, I merely wanted to showcase how one would approach the problem using Inko, and get more people interested in Inko as a result.

The optimisations discussed will be applied over time, gradually improving performance of Inko. One day we will also add a JIT, though I suspect it will take several years before we will have a JIT. The potential crossbeam bottleneck is also worth investigating.

I doubt a dynamic language such as Inko will be able to beat C, but if we can at least beat other dynamic languages (e.g. Ruby) that is good enough.

For more information about Inko, take a look at the Inko website or the Git repository. If you would like to sponsor the development of Inko with a small monthly contribution, please take a look at the sponsors page for more information.

Writing a self-hosting compiler for Inko

2019-06-08T00:00:00Z

About a year ago I wrote "Inko: a brief introduction", and later published the Inko website. Since then, I made a lot of progress towards making it useful for everyday use. Some recent milestones include:

A Foreign Function Interface.
A new process scheduler that is easier to maintain, and performs better.
Non-blocking sockets, without the need for callbacks.
Reduced memory usage per process.

The next milestone for Inko is having a self-hosting compiler. But why would one want to write a self-hosting compiler? Why not use an already established language? What are the benefits of writing a self-hosting compiler? Let's find out!

The first compiler

When creating a language, you need a way to compile its source code. But we can't use our own language, since we are still developing it. To deal with this, developers use a different language for the first compiler. Two examples of this are Rust and Go. The first compiler for Rust was written in OCaml, and the first compiler for Go was written in C.

For Inko's current compiler we use Ruby. Before writing the compiler in Ruby I made an attempt at writing it in Rust. Inko's Virtual Machine is also written in Rust, so using Rust for the compiler made sense at the time. Writing the compiler in Rust turned out to be frustrating, as I kept running into minor issues along the way. After about a month, I decided to cut my losses and use Ruby instead. Using Ruby allowed me to deliver a working compiler faster.

There were also two others reasons for using Ruby instead of Rust:

The compiler would one day be rewritten in Inko. This meant that quality was not the focus of the first compiler. Instead, it had to focus getting enough done so I could start building the standard library.
Ruby is closer to Inko than Rust is, which makes it easier to port code to the new compiler.

Rust tends to be an unforgiving language, at least it feels that way. This makes sense when you are writing production-ready software, but can slow you down when trying to prototype a compiler.

Benefits of a self-hosting compiler

If we have to use a different language for our first compiler, why not keep using this compiler? Why should one spend the extra time and effort on making their compiler self-hosting?

A typical compiler consist of different components, such as:

A lexer.
A parser (sometimes the parser also takes care of lexing the input).
Type checking.
Optimisation passes.
Code generation.

To write our compiler in our own language, the language must provide the necessary features. Such features might be:

String slicing.
Concurrency primitives.
A unit testing framework.
APIs for working with the filesystem.

Adding these features to the standard library benefits all users of the language. We could come up with a list of features to add, without a reference program. But it can be difficult to come up with every possible feature, before there is a use case for them. Worse, we may end up adding features that turn out to not be useful once actually used.

Performance is also important for a programming language. Your language can have all the features in the world, but users will not use it if the language is too slow. To ensure our language performs well, we need a way to measure and improve its performance. One way of doing this is by writing synthetic benchmarks. While useful for measuring specific sections of code, they are not useful for determining the impact of a change on a larger program.

A more realistic way of measuring performance is using a program with users. Compilers are an excellent reference. For example, a lexer operates on sequences of characters or bytes, executing code for every value in the sequence. Without any optimisations, such code could be slow. By writing our compiler in our own language, we have a program to measure the performance impact of changes made to the language.

While not a benefit per se, making the compiler self-hosting is a way of showing the capabilities of the language. If you can write the language's compiler in the language itself, you can write any other program in the language.

Towards a self-hosting Inko compiler

The first step towards a self-hosting compiler was to simplify the syntax in various places. For example, Inko allowed you to implement a trait in two different ways: when defining an object, or separately. Implementing a trait when defining an object looked like this:

object Person impl ToString {
  # ...
}

The alternative is to implement the trait separately:

impl ToString for Person {
  # ...
}

I added support for both so that object definitions and trait implementations were closer together. This complicates various parts of the compiler. In practise I also found it not to be as useful as anticipated.

Another syntax change is the removal of support for unicode identifiers. Being able to use unicode identifiers could be useful, but it complicates the lexer. I also doubt it will see much use in the coming years.

With the syntax simplified, I started implementing the lexer. The merge request tracking progress is "Implement Inko's lexer in Inko itself".

Implementing Inko's lexer in Inko

As I work on the compiler I will write about the progress made, starting with the lexer. After all, talking about the compiler and not showing anything would be boring.

The basic idea of a lexer is simple: take a sequence of bytes or characters, and produce one or more "tokens". A token is some sort of object containing at least two values: a type indicator of some sort, and a value. The type indicator could be a string, integer, enum, or something else. The value is typically a string.

Inko uses an object called Token for tokens, defined as follows (excluding methods not relevant for this example):

object Token {
  def init(type: String, value: String, location: SourceLocation) {
    let @type = type
    let @value = value
    let @location = location
  }
}

For those unfamiliar with Inko's syntax, this defines and object called Token and its constructor method init. The init method takes three arguments:

type: the type name of the token, such as 'integer' or 'comma'.
value: the value of the token, such as '10' for an integer.
location: an object that contains source location information, such as the line range and column number.

The init method sets three instance attributes: @type, @value, and @location.

For the lexer, Inko uses an object called Lexer. Showing all the lexer's source code would be a bit much. Instead, we'll highlight some interesting parts. The constructor of the lexer is as follows:

object Lexer {
  def init(input: ToByteArray, file: ToPath) {
    let @input = input.to_byte_array
    let @file = file.to_path

    # ...
  }
}

ToByteArray is a trait that provides the method to_byte_array, for converting a type to a ByteArray. When reading data from a file, Inko will read it into a ByteArray. Converting this to a String requires allocating an extra object, and twice the memory. The type ByteArray also implements the ToByteArray trait. This allows lexing of files, without allocating a String:

ToPath is a trait that provides the method to_path, for converting a type to a Path. Path is a type that represents file paths, providing a more pleasant interface compared to using a String. Using this trait allows one to supply either a String or a Path as the file argument:

import std::compiler::lexer::Lexer
import std::fs::path::Path

Lexer.new(input: '10', file: 'test.inko')
Lexer.new(input: '10', file: Path.new('test.inko'))

The Lexer type is an iterator, allowing the user to retrieve tokens one by one:

import std::compiler::lexer::Lexer

let lexer = Lexer.new(input: '10', file: 'test.inko')
let token = lexer.next

token.type  # => 'integer'
token.value # => '10'

To determine what token to produce, a Lexer will look at the current byte in the input. Based on the current byte, next sends different messages to the Lexer. The implementation of next is a bit much to cover, but more or less looks as follows:

def next -> ?Token {
  let current = current_byte

  current == A
    .if_true {
      return foo
    }

  current == B
    .if_true {
      return bar
    }

  Nil
}

The return type here is ?Token, meaning it may return a Token or Nil.

Inko does not have a match or switch statement, instead we compare objects for equality and use block returns. In the above example, if current == A evaluates to true we return the result of foo, skipping the code that follows it. Reading the above code, one might think that the code is incorrect. In most languages, this code:

A == B
  .foo

Is parsed as this:

A == (B.foo)

In Inko this is not the case. If the message that follows a binary operation (A == B) is on a new line, it's sent to the result. This means it's parsed as follows:

(A == B).foo

This allows one to write this:

A == B
  .and { C }
  .if_true {
    # ...
  }

Instead of this:

(A == B)
  .and { C }
  .if_true {
    # ...
  }

For certain tokens we need to perform more complex checks. For example, for integers we can not compare for equality because an integer can start with different values (0, 1, etc). Instead, we use Inko's range type like so:

INTEGER_DIGIT_RANGE.cover?(current).if_true {
  return number
}

Here INTEGER_DIGIT_RANGE is a range (using the Range type) covering the digits 0 to 9. The method cover? checks if its argument is contained in the range, without evaluating all values in the range.

The implementations of the methods that produce tokens vary. Some are simple, others are more complex. Strings in particular are tricky, as they can contain escaped quotes and escape sequences (\n, \r, etc).

Numbers are also tricky, as there are different number types and formats:

Regular integers: 123.
Hexadecimal integers: 0x123abc, 0X123ABC.
Floats: 10.23, 1e2, 1E2, 1e+2, 1E+2, 1e-2, 1E-2.

The difficulty here is that the type is not known until reaching a certain character, such as . or x.

Covering all this would be far too much, so I recommend taking a closer look at the merge request "Implement Inko's lexer in Inko itself".

Work after the lexer

After finishing work on the lexer, the parser is next. After that, I will have to spend some time planning what steps would be next. I would like for the compiler to be parallel and incremental, but I do not yet have an idea on how to implement this. I also need to revisit the type system, as certain parts feel a bit hacky.

Determining how long all this takes is difficult. After implementing the parser I will have a better estimate. I expect it will take between three and six months. I do have a three week vacation in a couple of weeks, and I tend to be productive during my vacations. Perhaps a bit too productive.

Inko: a brief introduction

2018-05-02T22:00:00Z

Inko is a programming language I started working on in early 2015. The goal of the project is to create a gradually typed, object-oriented programming language with a focus on safety and concurrency. Inko draws inspiration from various other languages, such as Smalltalk, Erlang, Rust, and Ruby. Like any other language, it is not perfect but the more time I spend working on it, the more I believe it could turn out to be a useful programming language.

While the language is still quite far from being usable, I have been making a lot of progress with both the compiler and the standard library. As a result, I think it's time to start writing a bit more about the language, starting with a brief introduction of what Inko is all about.

Keep in mind that the exact syntax is subject to change and that some topics/features discussed in this article might not yet be available. In particular, large parts of the compiler's type system and syntax are being rewritten as part of the "Rewrite the Ruby compiler's type system" merge request.

History

The idea of building my own programming language dates to early 2013. Back then, I knew little about programming languages, parser, virtual machines, and so on. I also wasn't quite sure what I was looking for in this language. It wasn't until early 2015 that I started writing code for the project, starting with the virtual machine. It was also around this time that I started to have a better understanding of what I was looking for: a language with a strong object-oriented model and excellent support for concurrency, borrowing various features from languages I admire, such as Smalltalk and Erlang.

I ended up writing the virtual machine in Rust, though Rust wasn't my first choice. At the time Rust was still new and unstable, with both the syntax and functionality changing frequently. So I first looked into other languages such as C, C++, and D. While I made quite a bit of progress with using D, I felt that using a garbage collected language for a virtual machine was less than ideal. Ultimately I decided to go with Rust since it seemed to be the most suitable. At first, this was quite frustrating, but as Rust began settling in the frustration fortunately went away.

Today I'm quite happy with the choice of using Rust for the VM. Rust certainly has its flaws, but I find it much easier and much more pleasant to use than languages such as C/C++ and similar low-level programming languages.

Object model

Inko is a prototype-based object-oriented programming language, though the use of prototypes is mostly hidden from the user. Instead of inheritance, Inko uses composition using traits. I never really enjoyed the use of inheritance as I feel it couples objects too tightly, and composition through traits feels like the right answer to this problem. While Inko supports the creation of class-like objects using an object keyword, we simply call these "objects". This may seem odd but it helps clarify that these aren't traditional classes that support inheritance. For example, if we want to define a "Person" object of sorts, you could do so as follows in Ruby:

class Person
  def initialize(name)
    @name = name
  end
end

The equivalent Inko code would be:

object Person {
  def init(name: String) {
    let @name = name
  }
}

Here let @name = name defines an instance attribute called @name set to the value of the name argument, with the type of name being a String. If we wanted to use dynamic typing, we would simply leave out the type signature:

object Person {
  def init(name) {
    let @name = name
  }
}

Message passing

Inko uses message passing for pretty much everything, including constructs such as "if" and "while"., allowing objects to decide how such constructs should behave, instead of the language dictating what evaluates to be true and false, for example. This means that instead of using an "if statement", you would use the "if" message.

Say you want to check if x is greater than 10. In Ruby (and many other programming languages) you may write such code as follows:

if x > 10
  do_something
else
  do_something_else
end

In Inko we would instead write:

x > 10
  .if true: {
    do_something
  }, false: {
    do_something_else
  }

Here if is a message sent to the result of x > 10 (this relies on some special syntax support so you don't have to write (x > 10).if). true: and false: are simply keyword arguments sent to the if message, and the curly braces are closures. The object the if message is sent two determines which of the two closures is executed.

Methods can be defined using a def keyword, take an optional arguments list, and may specify the throw and return type:

def example(argument: Type) -> ReturnType {
  # ...
}

If you leave out the argument types or the return type Inko will use a dynamic type instead:

def example(argument: Type) {
  # This method can return values of any type since its return type is inferred
  # as a dynamic type.
}

Type system

Inko is a gradually typed programming language. Gradual typing gives you the benefits of a statically typed language while still allowing you to trade type safety for flexibility where necessary. Gradual typing is also useful when prototyping or when building a simple program that won't really benefit from static typing (e.g. a quick script to manage some music files).

To ensure type safety, Inko uses static typing by default, requiring you to opt-in to dynamic typing where desired. Using dynamic typing is straightforward: simply leave out the type signature in various places and Inko will treat the types as dynamic types.

Like any other reasonable statically typed language, Inko supports generics programming. For example, we can define a generic "List" type like so:

object List!(T) {
  # ...
}

Here !(T) defines the list of type parameters of the "List" type. The type parameter syntax is taken from D. While unusual it removes the need for additional syntax when explicitly passing type parameters with a message. For example, Rust uses <T> and requires you to write foo::<T>() when explicitly passing a type parameter as foo<T> would be parsed as (foo) < (T>).

Using !(T) means that we can instead write foo!(T), which is much easier on the fingers. Scala uses [] (e.g. List[T]), and while easier to type (on QWERTY it doesn't require the use of the shift key) Inko isn't able to use this syntax because [] is a valid message name. For example: foo[10] translates to foo.[](10).

Generics can be used in objects, traits, and methods. For example:

object Person {
  def ==(other: Self) -> Boolean {
    # ...
  }
}

Here other uses the "Self" type which tells the compiler that other is of the same type as the enclosing object ("Person" in this case).

Booleans and Nil

In many languages, the boolean values true and false are some kind of primitive value instead of a structure or object. In Inko, they are just regular objects like any other. The type Boolean in turn is just a trait implemented by the Boolean objects True and False.

The absence of a value can be indicated using a Nil. Nil is just a regular object like any other, but there's only one instance of this object. Nil is set up in such a way that any message sent to it returns Nil, except for a few messages that have a custom implementation. For example, Nil.foo would return Nil but Nil.to_integer would return 0. This greatly simplifies code as we no longer need to constantly check if we're dealing with a value of type T or Nil, though of course we still can if necessary.

Optional types can be used to indicate that something can be either of type T or Nil. For example, to define an optional return value we would write:

def example -> ?Integer {
  Nil
}

It is an error to pass a Nil to a regular type (e.g. String), but it's perfectly fine to pass a Nil to an optional type (e.g. ?String).

One example of where this is useful is when retrieving an array value by its index. Like Ruby, an array will return a Nil when there is no value for a given index. In Ruby, this means you may need to check what type of value you are dealing with, for example:

user = list_of_users[4]

if user
  user.username
else
  ''
end

In Inko, we can instead write the following:

list_of_users[4].username.to_string

Should list_of_users[4] return a Nil then sending username will produce another Nil. Sending to_string to Nil will produce an empty String since Nil defines its own implementation of this method.

In short, by having Nil return a new Nil for unknown messages we can greatly reduce the amount of code necessary to deal with values that might be absent (but we can still check for a Nil where necessary).

Error handling

Inko uses exceptions for error handling, drawing inspiration from an article titled "The Error Model" by Joe Duffy. The article is quite long but definitely worth the read.

I went with exception handling, since the happy path of the code should not be slowed down by error handling code. For example, when using a more functional approach, such as using a Result type, you always need to check what you're dealing with and "unwrap" the underlying value. When using exceptions, on the other hand, you just use the code as if it didn't throw an error, automatically jumping to a different region of code when it does.

Error handling principles

The basic principles of Inko's error handling system are that it should be clear when something throws, what it throws, and most important of all that code doesn't lie about any of this. To achieve this, Inko has a set of rules that must be followed when working with errors.

Method signatures must include the error type

A method that throws an error must include the error type in its signature. This can be done using the !! keyword in the method signature:

def foo !! SomeError {
  # ...
}

This ensures that by just looking at the method (signature) we immediately know what errors we have to deal with.

A method that does not define an error type to throw can not throw. This means the following method would not compile:

def foo {
  throw 10
}

Only a single type can be thrown

A method can only throw an error of a single type, though you can specify the type to be a trait and throw any value that implements this trait. By restricting the number of possible types to just a single one we remove the need for having to catch many different error types. It also simplifies the syntax.

Methods that define a throw type must actually throw it

A method that specifies a type to throw must actually throw this type at some point, not doing so results in a compiler error. This means that the following method would not compile since it never throws a value:

def foo !! Integer -> Integer {
  10
}

Sending a message that may throw requires explicit error handling

When sending a message that may throw, we must wrap the send in a try expression:

try foo

This makes it clear to the reader that foo may throw, without requiring them to first find the implementation of the method to figure this out.

By default, the try expression will just re-throw the error type, but you can explicitly handle the error by using an else expression:

try foo else (error) bar(error)

Here we would run foo and if it succeeds, we'd return whatever foo returned. If foo threw an error, we'd run bar instead. Here the error variable would contain the object that was thrown. The type of error is inferred by the compiler.

The else expression supports multi-line expressions as well, which can be useful when your error handling logic is more complex:

try foo else (error) {
  bar(error)
  baz(error)
}

Sometimes we just want to terminate the program if an operation failed. In this case, we can use try! instead of try:

try! foo

The "try" keyword only supports a single expression

To prevent one from wrapping hundreds of lines of code in a single "try" expression, the syntax simply doesn't support this; instead you can only use a single expression with "try" expression. This means that the following code would produce a syntax error:

try {
  foo
  bar
}

This however is perfectly fine:

try {
  foo
}

Curly braces can still be used in case the expression doesn't fit on a single line, or it's simply more readable by using curly braces.

Bugs are not recoverable

Many languages that use exceptions make the mistake of using exceptions for errors caused by bugs. In Ruby, dividing by zero will result in a ZeroDivisionError error being thrown. Inko instead uses "panics". When a panic occurs, the virtual machine will print a stacktrace of the panicking process and terminate the entire program. This ensures that bugs are caught as early as possible, and more importantly can't be hidden by simply catching and ignoring the exception. Some examples of operations that may panic:

Dividing by zero.
Formatting a time object using an incorrect string format.
Trying to allocate memory when no system memory is available.

The general idea is fairly straightforward: if an error is the result of a bug or shouldn't happen then it should be a panic. If an error is likely to occur frequently (e.g. a network timeout) it should be an exception.

Concurrency

Inko's concurrency model is heavily inspired by Erlang. Instead of using OS threads directly Inko provides lightweight processes. These processes have their own heap and are garbage collected independently.

Communication between these processes happens through message passing, with the messages being deep copied. Certain permanent objects (e.g. modules) are allocated on a separate permanent heap and processes can access these objects without copying. While deep copying comes with a performance penalty (depending on the size of the data being copied) it ensures that a process can never refer to the memory of another process. This in turn ensures that the garbage collector only has to suspend the process that it has to garbage collect, instead of also having to suspend any processes that use this process' memory.

Processes use preemptive multitasking using a reduction system similar to Erlang. In short: every process has a number of "reductions" it can perform. Once this value reaches 0 the value is reset and the process is suspended. The virtual machine provides two thread pools for executing processes: one for regular processes, and one for processes that may perform blocking operations (e.g. reading from a file).

Inko provides the means to move a process between these two pools whenever necessary. This means that when performing a blocking operation we don't need to spawn a separate process in a separate thread pool, instead we just move our process from one pool to another; moving it back once our blocking operation has been completed.

Sending and receiving messages uses dynamic typing as Inko's type system can not be used to specify the types of messages a process may support. To work around this Inko will eventually support a type-safe API. The exact semantics are not yet defined, but if you're curious you can read more about this in the issue "Type safe actor API".

Memory management

Inko is a garbage collected language. The garbage collector is a parallel, generational garbage collector based on Immix. Fun fact: to the best of my knowledge Inko's garbage collector is the only full implementation of Immix apart from the one provided by JikesRVM. There are a few other implementations of Immix, but the ones that I know of typically don't implement evacuation or other parts of Immix.

The garbage collector can collect process independently, though a process will be suspended during garbage collection. The collector being parallel means it will use multiple threads to garbage collect the memory of a process.

How well the garbage collector performs is hard to say as I have only run a few basic benchmarks. These benchmarks usually involved garbage collecting a few million objects and from the top of my head this would usually only take a few milliseconds. Once Inko matures a bit more I'll most likely spend more time writing (and publishing) benchmarks.

Portable bytecode

The bytecode of the virtual machine is portable between CPU architectures and operating systems. This means that bytecode compiled on a 64 bits CPU can be run on a 32 bits CPU. This may seem like a minor feature but it makes it easier to distribute bytecode files as you no longer need to compile your program for every architecture.

In the future Inko may support a way of bundling such bytecode files similar to JAR, though this isn't supported at the moment.

Examples

With all of that out of the way let's take a look at some examples of Inko source code. The examples discussed below are all taken from the standard library, which can be found here.

Checking if a String starts with another String

Checking if one String starts with another String can be done using the method String#starts_with? in the std::string module. The implementation of this method is pretty straightforward:

def starts_with?(prefix: String) -> Boolean {
  prefix.length > length
    .if_true {
      return False
    }

  slice(0, prefix.length) == prefix
}

The argument prefix is the String we are looking for, and our return value is a Boolean. In the method we start with the following:

prefix.length > length
  .if_true {
    return False
  }

This is a simple optimisation: if the String we are looking for is greater than the String we are checking then we can just return False right away ("hello" can never start with "hello world" for example). In Ruby you would write this as follows:

if prefix.length > length
  return false
end

# Alternatively:
return false if prefix.length > length

Next up we have the actual comparison:

slice(0, prefix.length) == prefix

This operation is pretty straightforward: first we generate a new String starting at character 0 and include prefix.length characters. We then simply check if this equals the given prefix String. Note that string slicing operates on characters, not bytes.

Loops and tail call elimination

Loops are created using closures, instead of using a special while or loop keyword. A loop using a conditional is created by sending while_true or while_false to a closure:

let mut number = 0

{ number < 10 }.while_true {
  number += 1
}

Here we create a loop that runs as long as the result of the closure { number < 10 } evaluates to true. As long as this is the case we execute the closure passed to the while_true message.

An infinite loop is created by sending loop to a closure:

{
  # This will run forever
}.loop

The while_true method is implemented as follows:

def while_true(block: do) -> Nil {
  call.if_false { return }
  block.call
  while_true(block)
}

Let's start with the signature. This method takes one argument block, which has its type set to do. In this context do is used to specify that we expect a closure with no arguments and a dynamic return type. If we required an argument we would instead write do (Integer). If we wanted to also include a return type we could write do (Integer) -> Integer. We can also use the lambda keyword to create a lambda. The difference between the two is simple: a closure can capture outer local variables, a lambda can not. When the type signature requires a closure you can also pass a lambda, but not the other way around. Closures and lambdas are collectively referred to as "blocks".

Now let's look at the body of this method:

call.if_false { return }
block.call
while_true(block)

First we run the receiving block, returning early if it returned something that evaluates to false. If it evaluates to true we'll simply execute the block passed in the block argument, then we will call ourselves again. Inko supports tail call elimination so we can simply keep calling while_true indefinitely without blowing up the call stack.

The loop method is a simple method that also relies on tail call elimination:

def loop -> Nil {
  call
  loop
}

Here call will run the receiving block, then we simply recurse into loop to repeat this process.

Because Inko uses preemptive multitasking, loops such as those shown above will never block an OS thread indefinitely. Instead, the virtual machine will suspend the process once it has consumed all of its reductions, resuming execution of the process some time later.

Processes and communication

To start a process, we first need to import the std::process module like so:

import std::process

Next we can start a process like so:

import std::process

let pid = process.spawn {
  # This runs in a separate process
}

We can send messages to a process using process.send and receive them using process.receive:

import std::process

let pid = process.spawn {
  process.receive # This would produce 'hello'
}

process.send(pid, 'hello')

When using process.receive without any messages being available the process will be suspended until a new message arrives.

File operations

For our last example, we'll look at a simple file operation: reading a file. In a typical language, you would open the file with a specific mode, then read from it. For example, in Ruby you would do the following:

file = File.open('example.txt', 'r')
file.read

Many languages will use the same data types for files opened in different file modes. This means that the following Ruby code would compile, but produce a runtime error (since the file is not opened for writing):

file = File.open('example.txt', 'r')

file.write('hello')

Inko uses different types for files opened in different modes. For example, a file opened in read-only mode is a ReadOnlyFile while a file opened in write-only mode is a WriteOnlyFile. This means our first example is written as follows:

import std::fs::file

let file = file.read_only('example.txt')

try! file.read # This will terminate the program if we couldn't read the data

Our second example would be as follows:

import std::fs::file

let file = file.read_only('example.txt')

try! file.write('hello')

This code however will not compile since a ReadOnlyFile does not respond to the write message. I really like this API because it's straightforward to implement and removes the need for having to worry about using the wrong file mode for your operations.

Trying it out

If you're curious about Inko, you can give it a try yourself, but keep in mind that with Inko being a young language this process is a bit painful.

To try things out you need to have three things installed:

Ruby 2.4 or newer.
Bundler (gem install bundler).
Rust 1.10 or newer using a nightly build (stable Rust is unfortunately not supported at the moment).

Once these requirements are met you can clone the Git repository:

git clone git@gitlab.com:inko-lang/inko.git
cd inko

To build the compiler, you need to run:

cd compiler
bundle install

To build the virtual machine, you need to run (from the root directory):

cd vm
make release

Once done you can compile a program (from the root directory) as follows:

./compiler/bin/inkoc /tmp/test.inko -i ./runtime/ -t /tmp/inkoc-build

This will compile the program located at /tmp/test.inko and store all the bytecode files in /tmp/inkoc-build. Once compiled the compiler will print the file path of the bytecode file that belongs to the input file (/tmp/test.inko in this case).

To run your program you start the VM as follows:

./vm/target/release/ivm \
    -I /tmp/inkoc-build \
    /tmp/inkoc-build/path/to/bytecode.inkoc

These two commands can be merged into a single one as follows:

./vm/target/release/ivm \
    -I /tmp/inkoc-build \
    $(./compiler/bin/inkoc /tmp/test.inko -i ./runtime/)

Of course this is far from ideal and in the future this will be greatly simplified, but for now running a program sadly requires some additional work.

In the future I will be writing more about Inko's internals such as the garbage collector and the allocator. If you want to stay up to date on the latest Inko news the easiest ways of doing so are:

Star the project on GitLab.com.
Subscribe to my website's Atom feed.
Follow me on Twitter.

Compiling XPath to Ruby

2015-09-06T22:45:00Z

The process of evaluating a programming or query language is typically broken up in 3 steps:

The lexing phase, which turns raw text into a sequence of "tokens". Tokens are usually a pair (e.g. an array or tuple) of a type and a value.
The parsing phase, which turns a sequence of tokens into an Abstract Syntax Tree (AST).
An evaluation phase, producing a set of instructions a machine should execute based on an AST.

For the third step there are two ways of doing things:

Instructions are executed on the fly.
Instructions are generated and executed separately.

Both options have their benefits and drawbacks. A system that executes instructions on the fly is typically easier to implement. However, these systems tend to be slower as there's very little to no room for optimizations as execution depends directly on the input AST. Directly evaluating ASTs also makes it very hard (if not downright impossible) to perform Just In Time (JIT) compilation.

A system that first generates instructions and then executes them can be harder to implement, at the benefit of allowing for better optimizations.

An example of the first method would be Ruby 1.8, while an example of the second method is your average C compiler (e.g. gcc).

XPath Evaluation in Oga

Up until version 1.3.0, Oga used to evaluate XPath queries on the fly. While the code was fairly easy to work with, performance left a lot to be desired. The setup of this evaluator was as following:

Every type of AST node would have a corresponding handler method called on_X where X would be the type of the AST node. For example, an int AST node would be handled by on_int. Each of these handlers would take their input, operate on it, and return the result. The usual return value would be an instance of Oga::XML::NodeSet, an Array-like object used for storing XML nodes.

The performance impact of this setup depends on two things: the size of the input document, and the size and complexity of the given XPath query. For small documents the performance wasn't too bad, but for larger documents (e.g. the 10 MB test file used for benchmarks) this could result in even simple queries taking seconds to complete.

In short, if I wanted to improve performance I would need to come up with a radically different way of evaluating XPath queries.

Compiling XPath

The alternative I started looking into was compiling XPath to some kind of format that could be executed in a more efficient way. One option would be to compile to some custom bytecode format and evaluate that. However, ideally the target format would be something that could take advantage of optimizations already provided by Ruby implementations. That way I wouldn't have to write my own optimization passes or maybe even some sort of JIT compiler.

Compiling to Ruby bytecode would be an option, if it weren't for every implementation using its own bytecode format. Also, no implementation to date actually considers the bytecode part of their public API (as far as I'm aware), meaning it could change at any given point.

Ruby source code on the other hand works across implementations, is stable, and can take advantage of all performance optimizations a Ruby implementation might have to offer.

Starting with version 1.3.0, Oga compiles XPath expressions to Ruby source code. The result is a Proc that takes an input document (or element) and returns the result of the XPath expression it was compiled from. The compiled Procs are cached on a per expression basis. This means that if you run the same query in a loop, Oga only has to compile it once.

Code wise the setup is fairly similar to the old evaluator. There are still AST node type specific handlers (on_int, on_axis_following_sibling, etc). However, instead of returning Oga::XML::NodeSet instances they return AST nodes used to produce Ruby source code.

Performance Improvements

The new compiler setup yields significant performance improvements over the old evaluator setup. In certain cases performance is even better than Nokogiri, which uses C for its XPath evaluation.

Of course any performance claim is meaningless without a benchmark to back it up. Oga has several benchmarks for the new compiler, these resides in the benchmark/xpath/compiler directory of the repository.

Benchmarks were run on a Thinkpad T520 running Linux 4.1 with a bunch of applications in the background, while listening to the Metal Gear Solid 5: The Phantom Pain soundtrack on YouTube. In other words, treat these numbers with a grain of salt. For best results you should run these benchmarks yourself. To do so, clone the Git repository of Oga, run rake generate fixtures and then run one of the benchmark files like any other Ruby script.

First, lets look at the benchmark big_xml_average_bench.rb. This benchmark takes a 10 MB test file and runs the query descendant-or-self::location 10 times, measuring the execution time for every iteration. Using Oga 1.2.3 we get the following output:

Iteration: 1: 3.493
Iteration: 2: 2.868
Iteration: 3: 2.934
Iteration: 4: 2.965
Iteration: 5: 2.926
Iteration: 6: 2.928
Iteration: 7: 3.008
Iteration: 8: 2.977
Iteration: 9: 2.938
Iteration: 10: 2.993

Iterations: 10
Average:    3.003 sec

Using Oga 1.3.0 the output is as following instead:

Iteration: 1: 0.432
Iteration: 2: 0.448
Iteration: 3: 0.522
Iteration: 4: 0.453
Iteration: 5: 0.44
Iteration: 6: 0.494
Iteration: 7: 0.448
Iteration: 8: 0.431
Iteration: 9: 0.432
Iteration: 10: 0.437

Iterations: 10
Average:    0.454 sec

Here Oga 1.3.0 is about 6.6 times faster.

Next, lets look at the benchmark concurrent_time_bench.rb. This benchmark uses the XML file kaf.xml and runs the query KAF/terms/term 10 times in parallel using 5 threads. The idea of this benchmark is to measure performance as the number of threads increase. A higher number of threads can result in more pressure on the garbage collector (GC), depending on the code being benchmarked. More pressure on the GC can in turn result in poorer performance due to the GC having to stop all threads more often.

Using Oga 1.2.3 the results of this benchmark are as following:

Preparing...
Starting threads...
Samples: 50
Average: 0.2316 seconds

Using Oga 1.3.0:

Preparing...
Starting threads...
Samples: 50
Average: 0.0342 seconds

Here Oga 1.3.0 is also around 6.6 times faster.

Finally, lets look at the benchmark comparing_gems_bench.rb. This benchmark uses the XML document <root><number>10</number></root> and retrieves all text nodes of all <number> nodes. This benchmark uses benchmark-ips.

The benchmark runs this query for the following libraries:

Ox: 2.2.0
Nokogiri: 1.6.6.2
REXML: MRI 2.2.1 was used (as REXML is bundled in Ruby's standard library)
Oga

Note that Ox doesn't actually support XPath, it instead offers its own querying language. As a result it's not entirely fair to compare it with the other libraries. However, for the sake of showing the performance difference of Ox' query language versus the rest I've included it any way.

Using these Gems and Oga 1.2.3, the results are as following:

Calculating -------------------------------------
                  Ox    14.548k i/100ms
            Nokogiri     3.879k i/100ms
                 Oga     2.681k i/100ms
               REXML     1.114k i/100ms
-------------------------------------------------
                  Ox    197.284k (± 3.9%) i/s -    989.264k
            Nokogiri     46.701k (± 9.7%) i/s -    232.740k
                 Oga     28.293k (± 2.0%) i/s -    142.093k
               REXML     11.901k (± 2.8%) i/s -     60.156k

Comparison:
                  Ox:   197284.2 i/s
            Nokogiri:    46701.1 i/s - 4.22x slower
                 Oga:    28292.6 i/s - 6.97x slower
               REXML:    11900.5 i/s - 16.58x slower

And using Oga 1.3.0:

Calculating -------------------------------------
                  Ox    15.227k i/100ms
            Nokogiri     3.966k i/100ms
                 Oga    13.874k i/100ms
               REXML     1.168k i/100ms
-------------------------------------------------
                  Ox    201.044k (± 1.5%) i/s -      1.005M
            Nokogiri     47.338k (± 8.6%) i/s -    237.960k
                 Oga    166.485k (± 9.8%) i/s -    832.440k
               REXML     11.693k (± 5.3%) i/s -     58.400k

Comparison:
                  Ox:   201044.3 i/s
                 Oga:   166485.5 i/s - 1.21x slower
            Nokogiri:    47338.3 i/s - 4.25x slower
               REXML:    11692.7 i/s - 17.19x slower

Here Oga 1.3.0 is about 5.8 times faster compared to version 1.2.3. Using 1.3.0 Oga outperforms not only REXML but also Nokogiri.

Please keep in mind that performance will vary depending on the size of the input document and the query being used. There will be cases where Oga outperforms others, but there will (probably) also be cases where it performs worse.

Wrapping Up

The source code for the compiler can be found in lib/oga/xpath/compiler.rb. The source code used for the Ruby AST and code generation can be found in lib/oga/ruby. There are still plenty of parts in the compiler that could be optimized further as the current code is largely ported from the old evaluator.

Those who wish to take advantage of the new compiler can simply update to Oga 1.3.0. A full list of changes can be found in the changelog.

Hello, GitLab!

2015-08-31T20:49:00Z

I'm excited to announce that I will be joining GitLab starting October 1st. I greatly enjoyed my time at Olery, but after almost 3 years I felt it was time for a new adventure. If you're based in Amsterdam and love working with Ruby you should definitely send your details over to jobs@olery.com.

At GitLab my time will be broken up in to two chunks. 80% of my time (4 days) will be spent on improving performance and stability of the platform. This will include things such as improving the response time of web pages, cutting down memory usage, decreasing the time it takes to process Git repository data, etc.

The other 20% of my time (1 day) will be spent on improving Rubinius. Initially I'll start with wrapping up some existing work such as updating rubysl-socket, pull request #3356, pull request #3372 and finishing the work needed to support Ruby 2.2.

Once this has been taken care of I plan to work on two things:

Improving performance of Rubinius itself.
Building tools to help improve Rubinius and applications using Rubinius.

One idea I'm already toying with is adding the ability of tracing object allocations using Ruby itself. Tracing allocations should have a very low overhead and should not require disabling the garbage collector for accurate statistics. This in turn would allow one to run a tracer in their production application (e.g. using something like New Relic's Ruby agent) without having to worry about slowing the application down to a crawl.

Another idea is to add a way of tracing constant/method cache invalidations. In particular constant cache invalidations can be tricky to debug, even when using Rubinius' -Xic.debug and -Xserial.debug options. For more information about this idea you can refer to issue #3490.

Adding support for LLVM 3.6/MCJIT (pull request #3367) is something I will sadly not be working on any time soon. In order to do so I would first have to learn about all the nitty-gritty details of LLVM, which in itself can easily take months. As such I'm leaving this up to Brian Shirai, who already started working on the various parts needed to support LLVM 3.6.

Finally, I'd like to thank GitLab for this opportunity. While 1 day a week might not seem like much, it's a lot better than the 1 or 2 hours a week (if I'm lucky) I can currently dedicate to Rubinius. Hopefully in the future I can dedicate even more time to Rubinius, but only time will tell (no pun intended).

Oga 1.0 Released

2015-05-20T21:00:00Z

Until now if one wanted to parse XML and/or HTML in Ruby the most common choice would be Nokogiri. Nokogiri however is not without its problem, as I have discussed in the past. Other existing alternatives usually only focus on XML (such as Ox and REXML), making them unsuitable for those in need of HTML support.

Starting today Ruby developers will be able to use a solid alternative as I'm happy to announce that 449 days after the very first commit Oga 1.0 has finally been released.

Version 1.0 of Oga will be the first version to be considered stable per semantic versioning 2.0. This doesn't mean it will be bug free, it just means the API is not meant to change in backwards incompatible ways between minor releases. While Oga is already being used in production for a while I was reluctant to increment the version to 1.0 until at least proper HTML5 support was introduced.

A lot has changed over the last 16 months. The old Racc parsers have been replaced by LL(1) parsers using ruby-ll, support was added for HTML5, XML/HTML entity conversion, handling of invalid XML/HTML, better SAX parsing, Windows support and much more.

The exact list of changes can be found in the changelog. If you want to jump straight to trying out Oga you can install it from RubyGems:

gem install oga

Oga doesn't depend on libxml so the installation process should only take a few seconds.

Oga's Git repository is located at https://gitlab.com/yorickpeterse/oga, the documentation can be found at http://code.yorickpeterse.com/oga/latest/. Those interested in migrating from Nokogiri can use to the guide "Migrating From Nokogiri".

Oga: a new XML/HTML parser for Ruby

2014-09-12T14:45:00Z

In the Ruby ecosystem there are plenty of HTTP libraries. Net::HTTP, HTTParty, HTTPClient, Patron, Curb, Excon, Tyhpoeus, just to name a few. There are so many of them it's almost as if it's required that one writes an HTTP client in order to call themselves a Ruby developer.

When it comes to XML/HTML parsing on the other hand the options are quite limited. The two most common libraries are Nokogiri and REXML. Both these libraries however have various flaws that makes working with them less than pleasant. REXML is generally quite slow, only supports XML and can use quite a chunk of memory when parsing data.

Nokogiri on the other hand is quite fast, but in turn is not thread-safe and in certain places has a bit of an odd API. Nokogiri also vendors its own copy of libxml which greatly increases install sizes and times. Most important of all, Nokogiri simply doesn't work on Rubinius.

So what exactly is the problem with Nokogiri and Rubinius? Well, on MRI and Rubinius Nokogiri will use a C extension. This extension in turn uses libxml. Due to MRI having a GIL everything might appear to be working as expected, however on Rubinius all hell breaks loose. To be exact, at certain points in time bogus data (e.g. null pointers) are sent to the garbage collector, this in turn crashes Rubinius. Both I and Brian Shirai (brixen) have spent quite some time trying to figure out what the heck is going on, without any success so far. The exact details of all this can be found in the following Nokogiri issue: https://github.com/sparklemotion/nokogiri/issues/1047.

This particular problem is thus severe that some of the production applications I've tested (that use Nokogiri heavily) consistently crash around 30 seconds into the process' lifetime. As a result it's impossible for me to run these applications on Rubinius. If a process were to crash once every few days I might be able to live with it while searching for a solution, every 30 seconds however is just not an option.

All of this prompted me to start working on an alternative, an alternative that doesn't require complicated system libraries or Ruby implementation specific codebases. For the past 8 months I've been working on exactly that. I've called the project Oga, and it can be found on GitLab.com: https://gitlab.com/yorickpeterse/oga. Today, 199 days after the first Git commit, I'll be releasing the first version on RubyGems.

Oga is primarily written in Ruby (91% Ruby according to Github), with a small native extension for the XML lexer. It supports parsing of XML and HTML, comes with XPath expressions, support for XML namespaces and much more. It works on MRI, Rubinius and JRuby and doesn't require large system libraries. This in turn means smaller Gem sizes and much faster installation times. For more information, see the Oga README.

Oga can be installed from RubyGems as following (the installation process should only take a few seconds):

gem install oga

Once installed you can start parsing XML and HTML documents. For example, lets parse the Reddit frontpage and get all article titles:

require 'oga'
require 'net/http'

body     = Net::HTTP.get(URI.parse('http://www.reddit.com/'))
document = Oga.parse_html(body)
titles   = document.xpath('//div[contains(@class, "entry")]/p[@class="title"]/a/text()')

titles.each do |title|
  puts title.text
end

Because Oga is a very young library there is a big chance you'll bump into bugs or other issues (I'm going to be honest here). For example, HTML parsing is not yet as solid as it should be (https://gitlab.com/yorickpeterse/oga/issues/20), Oga also does not yet honor the encoding set in the document itself (https://gitlab.com/yorickpeterse/oga/issues/29). If you happen to run into any problems/bugs, please report these at the issue tracker. Feedback and questions are also more than welcome.

Personally I'm really excited about what Oga currently is and what it will become (it also seems other share that sentiment). I was not expecting it to take nearly 8 months to write such a library, but looking back at everything it was more than worth the effort.

And last, I'd like to thank the following people:

Peter Zotov: for helping me out with Ragel numerous times
Brian Shirai for debugging the initial problems with Nokogiri as well as his support of the project in general
Charles Nutter for helping me out with getting a new version of Racc released, his interest in profiling/benchmarking Oga and his support of project in general
Countless of other people that have shown great interest ever since I started working on Oga

Hacking extconf.rb

2013-06-08T23:00:00Z

As it turns out you can make the process discussed in this article easier by using a Rakefile instead of an extconf.rb file. See the bottom of this article for more information.

In Ruby land RubyGems is the de facto package manager. RubyGems allows you to easily distribute your Ruby packages (known as "Gems"). These packages come in two flavours:

Pure Ruby Gems
Gems that include C code (or any other compiled code for that matter) that is compiled upon installation

The latter is commonly used to create Ruby bindings for C libraries such as libxml2. The benefit of using C bindings is that they generally perform better than their pure Ruby equivalents.

To install a C extension RubyGems executes a Ruby file called "extconf.rb" (though you can change the name) to generate a Makefile and then runs make and make install to build and install the extension. To get this done you'll have to tell RubyGems where it can find the required files, this is done in your Gem specification as following:

Gem::Specification.new do |gem|
  # ...

  # These files are used to generate Makefile files which in turn are used
  # to build and install the C extension.
  gem.extensions = ['ext/my_extension/extconf.rb']

  # ...
end

Here the configuration file is located in ext/my_extension/extconf.rb. These files typically look like something along the lines of the following:

require 'mkmf'

have_header('some_header')
find_executable('some_required_executable')

$CFLAGS << ' -Wextra -Wall -pedantic '

create_makefile('my_extension/my_extension')

Because all of this is executed upon Gem installation (and thus on the end user's computer) this opens up interesting possibilities. For example, you could check if specific files are available in a certain directory or as is more commonly done check for headers and such. It also allows you to execute arbitrary commands (which can potentially be dangerous).

For a project at Olery we had to wrap code written in various languages (Java, Python and Perl to be exact) in Ruby and distribute it. This introduces a problem though: how do you ensure that all the dependencies of both the Ruby and underlying code (e.g. Python) are installed? How do you ensure that the right versions are available? In other words: dependency management.

To give an example, one of the underlying code bases was written in Perl and vendored the dependencies in the Git repository of the project. Normally Perl is easy to use: you just run it. However, this particular project used one Perl package that had a C binding and thus had to be compiled upon installation.

In Perl you normally install packages using CPAN (or CPAN Minus). However, CPAN is rolling release and thus only keeps track of the most recent version of each package. This means that a package could break at any given time without us knowing about it beforehand. Another problem is that CPAN might not always be available, configured or might require root access to install packages (this depends on the configuration though). In other words, relying on CPAN would probably make things too painful to deal with.

We decided to go down a different route: manually compile the package upon installation. Since it was vendored and packaged along with the Ruby code this in theory should not be too hard.

To achieve this we had to find a way to tap into the installation process of a Gem. The only way to do this without requiring the user to run extra commands after installing the Gem is to tap into the C extension build process. Since this process is executed on the user's machine it allows you to inject arbitrary actions. In other words, we had to hijack extconf.rb to compile the Perl code.

To recap, building a C extension happens as following:

Download the Gem
Run the extconf.rb file(s) of the Gem to generate the Makefile(s)
Run make and make install for each Makefile to build and install the corresponding extensions.
Move the generated extension file (e.g. my_extension.so) to the lib directory of the Gem so that it becomes available in the load path.

Our solution was as following: use extconf.rb to compile the Perl code and use a dummy Makefile to trick RubyGems into believing that the C extension was built successfully. Without a valid Makefile RubyGems would otherwise just abort the process.

As an example we'll build a Gem called "wat". The first step is to create a basic Gem specification (only relevant code is shown here):

Gem::Specification.new do |gem|
  gem.name       = 'wat'
  gem.extensions = ['ext/wat/extconf.rb']
end

In our case the extconf.rb file had to do two things: check for the required dependencies (e.g. the "perl" command) and compile the extensions:

require 'mkmf'

# Stops the installation process if one of these commands is not found in
# $PATH.
find_executable('perl')
find_executable('make')

# Create a dummy extension file. Without this RubyGems would abort the
# installation process. On Linux this would result in the file "wat.so"
# being created in the current working directory.
#
# Normally the generated Makefile would take care of this but since we
# don't generate one we'll have to do this manually.
#
File.touch(File.join(Dir.pwd, 'wat.' + RbConfig::CONFIG['DLEXT']))

directories_with_perl_code.each do |directory|
  Dir.chdir(directory) do
    sh 'perl Makefile.PL PREFIX=path/to/local/installation LIB=path/to/local/lib'
    sh 'make && make install && make clean'
  end
end

# This is normally set by calling create_makefile() but we don't need that
# method since we'll provide a dummy Makefile. Without setting this value
# RubyGems will abort the installation.
$makefile_created = true

This takes care of ensuring our dependencies are there, the Perl code is compiled and RubyGems doesn't abort the installation process.

Next up we'll need to create a dummy Makefile. This Makefile goes in the same directory as the extconf.rb file and looks pretty simple:

all:
    true

install:
    true

The true commands are used to ensure that the commands run successfully, again RubyGems would abort installation if one of them failed.

This solution, as dirty as it may sound, was actually surprisingly elegant. Of course you should not use this as an excuse to turn RubyGems into a universal package manager. However, if you need to take care of some basic dependency management or need to run arbitrary commands upon installation it's not even that bad. And no, I did not do drugs while writing that.

After discussing this with Peter Zotov it turns out that the above process can be done a bit easier by using a Rakefile instead of an extconf.rb file. An example of a project using this approach is ruby-llvm. I haven't investigated this option myself so I can't tell for certain though.

Using a Rakefile

After writing this article it was discovered that the above process can be made significantly easier by using a Rakefile. To be more exact, any file that does not match the following pattern can be used without having to create the above dummy files:

/\A(extconf|makefile).rb\z/

This information is based on this code. These particular lines of code cause the installation process to fail (since mkmf exits with a non successful exit status) if the filename of an extension matches the above pattern and the variable $extmk is set to false.

In our particular use case this meant that I could get rid of the dummy Makefile and C extension file since it's actually mkmf that insists on these files being created and not RubyGems. This in turn made the code considerably smaller and much less of a hack.

Debugging With Pry

2011-11-27T00:00:00Z

Pry is a REPL (Read Eval Print Loop) that was written as a better alternative to IRB. It comes with syntax highlighting, code indentation that actually works and several other features that make it easier to debug code. I stumbled upon Pry when looking for an alternative to both IRB and the way I was debugging my code (placing puts all over the place, I think it's called "Puts Driven Development").

Pry tries to do a lot of things and I was actually quite surprised how well it does that. It might not stick to the Unix idea of only doing a single thing (and doing that very well) but it makes my life (and the lifes of others) so much easier that it's easy to forget.

Pry is primarily meant to be used as a REPL. There are a lot of things that make Pry so much more pleasant to use than IRB. One of the things almost any Ruby programmer will notice when using IRB is that its indentation support is a bit clunky. Indenting itself works fine most of the time but it fails to un-indent code properly as illustrated in the code snippet below (pasted directly from an IRB session):

ruby-1.9.3-p0 :001 > class User
ruby-1.9.3-p0 :002?>   def greet
ruby-1.9.3-p0 :003?>     puts "Hello world"
ruby-1.9.3-p0 :004?>     end
ruby-1.9.3-p0 :005?>   end

Luckily Pry handles this just fine, whether you're trying to indent a class or a hash containing an array containing a proc and so on. Pry does this by resetting the terminal output every time a new line is entered. The downside of this approach is that it only works on terminals that understand ANSI escape codes. In Pry the above example works like it should do:

[1] pry(main)> class User
[1] pry(main)*   def greet
[1] pry(main)*     puts "Hello world"
[1] pry(main)*   end
[1] pry(main)* end

Besides indentation Pry does a lot more. A feature that I think is very cool is the ability to show documentation and source code of methods right in your REPL (sadly this feature doesn't work with classes or modules at the time of writing). This means that you no longer have to use the ri command to search documentation for methods. You also don't need to install the RDoc documentation as Pry pulls it directly from the source code. Showing the source code of a method or its documentation can be done by using the show-method and show-doc command. For example, invoking show-method pry in a Pry session would give you the following output:

[1] pry(main)> show-method pry

From: /path/trimmed/for/readability/lib/pry/core_extensions.rb @ line 19:
Number of lines: 3
Owner: Object
Visibility: public

def pry(target=self)
  Pry.start(target)
end

Calling show-doc pry would instead show the following:

[2] pry(main)> show-doc pry

From: /path/trimmed/for/readability/lib/pry/core_extensions.rb @ line 19:
Number of lines: 17
Owner: Object
Visibility: public
Signature: pry(target=?)

Start a Pry REPL.
This method differs from Pry.start in that it does not
support an options hash. Also, when no parameter is provided, the Pry
session will start on the implied receiver rather than on
top-level (as in the case of Pry.start).
It has two forms of invocation. In the first form no parameter
should be provided and it will start a pry session on the
receiver. In the second form it should be invoked without an
explicit receiver and one parameter; this will start a Pry
session on the parameter.
param [Object, Binding] target The receiver of the Pry session.
example First form
  "dummy".pry
example Second form
   pry "dummy"
example Start a Pry session on current self (whatever that is)
  pry

You can also run these commands for code that was written in C. This requires you to install the gem pry-doc (gem install pry-doc). Do note that this only works for core C code, currently Pry does not support this for third party extensions.

Another very cool feature is that Pry can be used as a debugging tool for your code without having to manually jump into a session. By loading Pry, which can be done by writing require "pry" or by using the option -r pry when invoking Ruby you gain access to everything Pry has to offer. The most useful tool is binding.pry. This method starts a Pry session and pauses the script.

Lets say you have the following script and want to see the values of the variables:

language = 'Ruby'
number   = 10

# Do something awesome with the above variables.

The typical approach would be to insert a puts statement above the comment followed by an exit statement. Pry in a way can do a similar thing, it just makes it a lot more awesome. If you modify the script as following you can truly debug your code like a boss:

language = 'Ruby'
number   = 10

binding.pry

# Do something awesome with the above variables.

If you now run the script by calling ruby -r pry file.rb you get a fancy Pry session:

[yorickpeterse@Wifi-Ninja in ~]$ ruby -r pry file.rb

From: file.rb @ line 4 in Object#N/A:

     1: language = 'Ruby'
     2: number   = 10
     3:
 =>  4: binding.pry
     5:
     6: # Do something awesome with the above variables.
[1] pry(main)>

A nice thing about starting Pry this way is that it starts in the context of the call to binding.pry meaning you get access to data such as the local variables. These can be displayed by calling ls or by simply typing their name.

[yorickpeterse@Wifi-Ninja in ~]$ ruby -r pry file.rb

From: file.rb @ line 4 in Object#N/A:

     1: language = 'Ruby'
     2: number   = 10
     3:
 =>  4: binding.pry
     5:
     6: # Do something awesome with the above variables.
[1] pry(main)> ls
self methods: include  private  public  to_s
locals: _  _dir_  _ex_  _file_  _in_  _out_  _pry_  language  number
[2] pry(main)> number
=> 10
[3] pry(main)>

Moving out of the "breakpoint" (or moving to the next one if you have multiple ones defined) can be done by hitting ^D (Ctrl+D usually).

Besides the features mentioned in this article Pry has several more. For example, long output is piped to Less. This can be quite useful if you're trying to display a big hash using pp. The full list of features can be found on the Pry website as well as by invoking the help command inside a Pry session. If you're in need of help or have any suggestions you can join the IRC channel #pry on the Freenode network (irc.freenode.net). The source code of Pry is hosted on Github.

Use BCrypt Fool!

2011-04-13T09:41:00Z

Almost any application will eventually need to store a collection of passwords or another type of data that has to be stored using a hashing algorithm. Blogs, forums, issue trackers, they all need to store user data and these passwords. This article covers the common mistakes made when dealing with passwords and what you should use instead. In order to fully understand this article some basic knowledge of programming and computers is required, you should also know a bit about the common hashing algorithms such as MD5 and SHA1.

The Problem

When developing applications developers make the common mistake of thinking they have a solid understanding of how hashing works. They think that by doing X they're done and perfectly safe. Guess what, that's not the case (not even close). The following mistakes are the most common:

Using a broken algorithm (MD5, SHA1)
Hashing a password N times in the form of hash( hash(password) ) * N
Limiting the length of passwords to N characters

We'll start with the first problem. Up until a few years ago MD5 was the most common hashing algorithm used for passwords (and other data as well). MD5 was considered to be pretty safe until a group of people managed to prove how weak it really was: they were able to generate a set of collisions in a relatively short amount of time (a few hours or so). This set off a chain reaction and many more flaws were found.

Luckily MD5 isn't the only hashing algorithm out there, there's SHA1 and the SHA2 family as well as a few other ones. SHA1-SHA2 are much strong than MD5 and at the time of writing (April 2011) only SHA1 has been compromised. Technically it would take serious amount of time to crack SHA1 but the idea of using an algorithm that *can* be cracked before humanity is wiped out should be enough for people to not use it for privacy related data.

So why are collisions bad? Can't we just use a very very long password or use method X (insert your favorite counter measure)? Yes, you can. The problem however isn't fixed, you're merely making the process slower rather than fixing the actual root of the problem. Time for an example. Assuming we have a hashing function called "hash" and two strings, A and B (where A and B are unique), our hashing process of these strings would look like the following:

pwd1 = hash(A)
pwd2 = hash(B)

In this case both pwd1 and pwd2 are unique. At this point a lot of people think they're good to go as they assume nobody is willing to wait for a certain period of time before they're able to crack the password, this is a *very* stupid mistake. While trying to crack a password (by bruteforcing it for example) may take a long time on a single computer most hackers can easily boot up a few servers or even worse, use a botnet. All known hashing algorithms (except BCrypt, more on that later) are affected by a single common problem: Moore's law. Moore's law states that every two years the amount of transistors that can be put in a computer doubles. This means that the faster computers get the quicker they're able to crack a password. A hacker merely has to use N computers and the time required to retrieve the original password will be greatly reduced.

Because of this problem developers try to come up with solutions. These solutions don't actually solve the problem, they just make it harder and require more time. A common "fix" is to hash a password N times and then save it in the database. Developers do this for a few reasons:

It's supposed to be slower
In order to retrieve the original password a hacker has to crack multiple hashes instead of only one.

The fun thing is that this entire process doesn't actually make the password more secure. The first reason is pretty easy to bust: simply add more hardware (or better hardware) and you're good to go. The second reason is a bit harder to bust as it depends on the algorithm that is used. If we look back at our hash() function the process of hashing a hash multiple times would look like the following:

hash = hash( hash(hash(A)) )

In this example there are 3 calls to the hashing function. If A was "yorick" this would look a bit like the following:

hash(yorick)  -> j238103
hash(j238103) -> a9shda9
hash(a9shda9) -> 11s08j1

In this case "11s08j1" is the final hash that will be stored in our database. At this point developers usually lay down their work and take a coffee or a tea thinking they've done a good job and are hacker proof. Guess what, they're not. What just happened is that the process of hashing A multiple times actually increased the possibility of a hash collision. While we do have to crack the hashing process N times for each call to hash() we don't actually have to start at the very end (with "11s08j1"). The reason for this is that "11s08j1" isn't directly based on "yorick" but on "a9shda9". This means that we merely have to find the hash that results in "11s08j1" when using our hash function. If we find a collision we can simply crack it again and we'd end up with our original password.

In order to explain this properly I simplified the process of hashing A N times:

password --> hash 1 --> hash 2 --> final hash

In order to retrieve the original password ("password") we'd have to find a collision for "hash 2". We can't use hash 1 as it's source ("password") can be considered totally random and would take more time. However, the source of hash 2 is much easier due one big issue: the entropy (the amount of possible combinations) of the password has been decreased. If we look back at the previous example we know the final hash is "11s08j1" and that the original password is "yorick". Using various techniques (rainbow tables, bruteforcing, etc) we can quickly identify the source of "final hash". The value of "hash 2" is "a9shda9", while in our example this looks more random (it is) than the original password common hashing algorithms only use regular characters (letters and numbers) for their output. A good example of this is the following Ruby example:

require 'digest'

password = 'as9(A*&SD&(@))'
hash     = Digest::SHA1.new.hexdigest(password)

p hash # => "d4c36f9b1f003bee2e5dcafdf6b006110709dfb5"

The hash of the password (which is just something I randomly typed on my keyboard) may be longer but it only uses letters and numbers opposed to all the gibberish in the original password. The same happens with our hash() function and this allows us to quickly retrieve the original password. If we have the original hash of "final hash" we can then simply continue reversing the process until we end up at "yorick".

The reason why you can't initially find the source of "hash 2" is because you can't find out what "hash 1" is because it's not stored somewhere while "final hash" is.

To cut a long story short, hashing a hash N times doesn't make your passwords more secure and can actually make it less secure as a hacker can quite easily reverse the process by generating hash collisions.

The Solution

It has already been mentioned before but the solution is to use an algorithm called "BCrypt". BCrypt is a hashing algorithm based on Blowfish with a small twist: it keeps up with Moore's law. The idea of BCrypt is quite simple, don't just use regular characters (and thus increasing the entropy) and make sure password X always takes the same amount of time regardless of how powerful the hardware is that's used to generate X. I'm not going to cover all the technical details but basically BCrypt requires you to specify a cost/workfactor in order to generate a password. This workfactor not only makes the entire process slower but is also used to generate the end hash. This means that if somebody were to change the workfactor the hash would also be different. In other words, hackers, you're fucked. In order for a hacker to gain the original password he must use the same workfactor and thus has to wait N times longer than when not using a workfactor.

Time for an example in Ruby:

require 'benchmark'
require 'bcrypt'

password = 'yorick'
amount   = 100

Benchmark.bmbm(20) do |run|

  run.report("Cost of 5") do
    amount.times do
      hash = BCrypt::Password.create(password, :cost => 5)
    end
  end

  run.report("Cost of 10") do
    amount.times do
      hash = BCrypt::Password.create(password, :cost => 10)
    end
  end

  run.report("Cost of 15") do
    amount.times do
      hash = BCrypt::Password.create(password, :cost => 15)
    end
  end

end

For the non Ruby people, this is a simple benchmark script that shows the time it takes to hash "yorick" with BCrypt with a cost/workfactor of 5, 10 and 15 a total of 100 times. The results of this benchmark would look like the following:

Rehearsal -------------------------------------------------------
Cost of 5             0.250000   0.000000   0.250000 (  0.249723)
Cost of 10            7.740000   0.010000   7.750000 (  7.879849)
Cost of 15          247.510000   0.460000 247.970000 (255.346897)
-------------------------------------------- total: 255.970000sec

                          user     system      total        real
Cost of 5             0.250000   0.000000   0.250000 (  0.272549)
Cost of 10            7.750000   0.030000   7.780000 (  8.442511)
Cost of 15          247.530000   0.480000 248.010000 (254.815985)

The column we're really interested in is the "real" column. As you can see a cost of 5 only takes about 250 miliseconds while a cost of 15 takes a whopping 250 seconds (around 4 minutes).

To cut another long story short: BCrypt adopts to Moore's law and makes it impossible for a hacker to crack a password using rainbow tables or other techniques.

Implementations

The BCrypt hashing algorithm is implemented in quite a few languages. I've collected a list of resources for various languages so you can start using BCrypt right away.

PHP

PHP allows you to use BCrypt passwords using the crypt() function. This works as following:

<?php

$hash = crypt('rasmuslerdorf', '$2a$07$usesomesillystringforsalt$');

Ruby

For Ruby there's a gem called "bcrypt-ruby" which can be installed using Rubygems:

$ gem install bcrypt-ruby

Once installed you can use it as following:

require 'bcrypt'

hash = BCrypt::Password.create('yorick', :cost => 10)

Perl

For Perl there's Crypt::Eksblowfish which works as following:

use Crypt::Eksblowfish::Bcrypt qw(bcrypt_hash);

$salt     = '1p23j1-9381-23';
$password = 'yorick';
$hash     = bcrypt_hash({
    key_nul => 1,
    cost    => 10,
    salt    => $salt,
}, $password);

Others

Python has The Python Cryptography Toolkit
Lua seems to have this implementation
There's an Erlang implementation as well

Special Thanks

I'd like to thank the following IRC folks for helping me out (all of them can be found on Freenode):

squeeks from #forrst-chat
amr from #forrst-chat
dominikh from #ramaze

Yorick Peterse

How to write a code formatter

Table of contents

Nodes and trees

Text

Whitespace and indentation

Grouping nodes

Conditional formatting

Computing widths

Tracking the need for wrapping

Lowering ASTs into formatting trees

The Generator type

Rendering Nodes

Rendering Group

Rendering IfWrap

Rendering Text and Unicode

Rendering whitespace

Converting Generator into a String

The final Generator type

The Builder type

Strings

Function calls

Using the Generator and Builder types

Handling (trailing) comments

Applying this to a real formatter

What it was like working for GitLab

Table of contents

Before GitLab

2015-2017

2017-2018

2019-2021

What I've learned

Scalability needs to be part of a company's culture

Make teams more data and developer driven

You can't determine what is "minimal viable" without data

A SaaS and self-hosting don't go well together

More people doesn't equal better results

I'm conflicted on the use of Ruby on Rails

The time it takes to deploy code is vital to the success of an organization

Location based salaries are discriminatory

Conclusion

A decade of developing a programming language

Table of contents

Avoid gradual typing

Avoid self-hosting your compiler

Avoid writing your own code generator, linker, etc

Avoid bike shedding about syntax

Cross-platform support is a challenge

Compiler books aren't worth the money

Growing a language is hard

The best test suite is a real application

Don't prioritize performance over functionality

Building a language takes time

Switching to Fedora Silverblue

Getting my keyboard layout to work

Getting rid of GNOME Software

rpm-ostree and dnf are slow

Building packages for Fedora is frustrating

SELinux can be frustrating

Fonts issues with Firefox

Locale errors when using Distrobox

What went well, and some tips

Put the container name in your prompt

Use GNOME terminal profiles for your containers

Give your containers a custom home directory

Automatically stage rpm-ostree updates

Layer adw-gtk3

Conclusion

I'm leaving GitLab to work on Inko full-time

Friendship ended with the garbage collector

The current status

Processes and messages

Circular types

Generics support both owned values and references

Heap layout

Remaining work

libffi-rs 1.0.0 is released

What's new

Upgrading

Future plans