The fun part of the Internet is that anyone can write some stupid stuff for anyone to read and laugh to their hearts’ content. I’ll try to write something useful, but even if it is a mess, I’m not the first, and definitely won’t be the last.

What I want to talk about is the role of the types in our programming lives and how to utilize them for the better and easier life.

What we usually know about types is that they’re annoying. So much so, we are willing to code in languages that don’t have types (I’m looking at you, dynamic languages). And of course, dynamic languages do have types, it’s just that the compiler won’t check for the type accuracy for you. Now it is YOUR responsibility. But at the same time, the types are checked at the runtime, over and over and over again, making you lose those precious CPU cycles. And also those nominal types are not really compatible with other nominal types, so any misuse of the types results in annoying type errors in runtime. Instead of being compile time errors…

Yet again, people would still continue to argue about the usability or uselessness of types. The funny ones are the ones who would write in Python and then use mypy to ensure type-safety. I mean, didn’t you start coding in Python, because, well, you didn’t have to do type checking?

But I’m being condescending.

I mean, what are the altnernatives? Java? C++? Golang?

As a person who recently tried to use Go and make a good use of its capabilities, I kinda feel the annoyance of people who have to use the type system.

But then again, the problem is that people just don’t know about better alternatives (because they don’t want to learn, but that is a topic for another day).

As I’m writing this, I’m realizing that the gripe I have is not with types, but with the fact that programmers don’t really don’t know how to talk to their compilers.

Most people would consider it to be a tool to convert your mess into a machine-code mess, but what if our computers could do more? What if our compilers could check not for stupid or mechanical stuff, but some more complicated invariants?

The thing is, that is possible in functional programming langauges based on ML languages (no, not the Machine Learning kind). But the question that bugged me was “Why those languages and not others?”.

The thing is, is that it is possible in other langauges, but with much effort, unfortunately. It just so happens that ML languages are better suited for it than other languages.

So, how do we do that? How do make the compiler check that:

  • functions might not return a value?
  • or return different types of values?
  • or do something completely unrelated?

And most importantly, what is the role of types in all of this?

Types for the better future

What are types used for? You know, there’s something called “Type Theory” in Mathematics, but I’m sure it is pretty unrelated (spoiler: it is related).

On the other hand, how do compilers ensure that the programs that we write are “correct”? And what does “correct” even mean?

Well, those things are already solved by mathematicians. “Correct” means that the program won’t go haywire and won’t cause runtime errors during its execution.

And yet, some programs do cause runtime errors, and a whole lot of them. Why?

Well, another question is how do the compilers ensure that the programs are “correct”? (I know that I asked it already).

Of course, they use types for that. Typecheckers ensure that the types for specific operations match and that somehow is enough.

This “somehow” made me think: what is exactly going on when typecheckers check for types?

The magic lies in Curry-Howard correspondence. You should look it up because I won’t be able to explain it in all the nuanced details, but the gist is that some smart people (independently) have shown that the propositional logic (if A, then B) corresponds to types (function from A to B) and that if the proof is correct, then the types should match.

Take a moment to think about it. What Curry-Howard correspondence tells us, is that the types are essentially a mathematical proof of some sort. So what typecheckers (and by proxy we) do is prove that the programs are correct, using types. All the annoying stuff of annotating everying is essentially you writing small proofs. That’s why computer scientists are essentially a dumbed-down version of mathematicians. Every self-respecting software engineer should learn some math. A bare minimum.

If this sounds crazy, well, it is not. At least the idea stands, because there are tools, called proof-assistants, that use some of those ideas to prove mathematical statements. One that I know for sure is Coq (don’t laugh!). And it can also generate a 100% working program from the mathematical proof.

This profound idea opens a possibility of writing, or better yet, structuring programs as a mathematical proofs. That is a really difficult thing to do, unless you’re using languages made specifically for that (Coq, or those that have dependent types). More important and practical takeaway is to use types to expose assumptions made in the code and ask compiler to verify those assumptions for us throughout the codebase. I think this is more practical and useful.

Some hypothetical examples

Let’s take a Golang, because that is what I’ve been using. And let’s assume I’m writing a compiler, because that’s what I’ve been doing.

If you don’t know how compilers work, don’t worry, it is not so important. The basic idea is that compilers (using clever mathematical tricks) convert text into mathematical structures which are easier to work with. It so happens that the structure in fact is a tree. You know, like binary tree, but only Abstract Syntax Tree. Parsers that build the said AST ensure that everything is in specified order, but doesn’t check if it makes sense (e.g. “I ate love by looking at a cow” is a grammatically valid sentence while being a complete nonsense). To ensure that the AST represents a valid program that make sense, we need typecheckers.

There are different ways to represent programming langauges, and how you do that affects how they feel or what they can do.

For example the program may be a list of statements (or instructions):

type Program struct {
    Statements []Statement
}

Example of such language might be a Python or a Bash, where the instructions come one after another to do specific actions.

type Statement interface {
    statementNode()
    String() string
}

This is how the Statement may be defined. I was told that statementNode() is a part of duck typing and by providing such method to a struct tells Go compiler that the struct implements a specific interface.

Statements can be thought as instructions. There are also something called expressions.

The main difference between expressions and statements is that expressions can be evaluated to a value, while statements don’t really yield a value.

2
1 + 2
foo(bar)

Now, imagine this is a C language. Is this is a valid C code?

2;

Well, technically yes. This is a useless expression, but it is a still valid code.

Ideally we should be able to put the expression inside the Program.Statements. Except, we can’t: the types don’t match.

What can we do now then?

type ExpressionStatement struct {
    Expression Expression
}

func (es *ExpressionStatement) statementNode() {}

We created a wrapper type, that wraps an expression, and allows us to put 2 into the Program.Statements.

So what about it?

You may think that you’ve made a wrapper type, but no. What you did was something more profound: you proved that expression is a statement. How did you do that? By showing that such type can exist and typecheck along other types without breaking the consistency of the other code.

Now, take a minute to think about it. What you did wasn’t just provide a wrapper. You mathematically proved things to your compiler. And compiler accepted your explanation.

Types are a language that you can talk to your compiler’s proving system. But types are not just for compiler’s benefit, they also exist there for yours.

Imagine this:

foo.bar

This is an example struct member access, the one you typically see in C-like languages.

The initial idea was to model it like this:

type MemberAccess struct {
    Name Ident
    Member string
}

func (ma *MemberAccess) expressionNode() {} // MemberAccess is an expression

type Ident struct { // Identifier
    Name string
}

func (id *Ident) expressionNode() {} // Ident is an expression

Of course, seems valid. But when I was writing a top-down parser, when trying to parse things, I encountered a small problem:

// parser code above
e := p.parseLiteralOrFunctionCallOrSubExpression() // parses identifiers
if p.curTokenIs(token.DOT) { // this is a part of member access
    p.nextToken()
    name := p.parseIdentifier()
    return &ast.MemberAccess {
        Name:   e,
        Member: name,
    }
}

This is an example code, that you don’t have to look too deeply. The main takeaway is that the code doesn’t compile. There is a type-error: Name is of type Ident, but e is of type Expression.

e is of type Expression because I used parseLiteralOrFunctionCallOrSubExpression() method instead of parseIdentifier().

I could have done that instead, but I new that it was wrong: what if the function returns a struct?

foo(bar).kek

Isn’t that a valid code?

It was clear to me that my understanding of the AST was wrong. So now I need to prove things:

type MemberAccess struct {
    Struct Expression
    Member string
}

This is better. The second type-error is that the Member is a string, but name is an Ident (or Expression). To prove that we can either change the definition:

type MemberAccess struct {
    Struct Expression
    Member Ident
}
// or
type MemberAccess struct {
    Struct Expression
    Member Expression
}

Or when creating struct, prove that we have all the necessary information to derive our node:

t := p.parseIdentifier()
name, ok := t.(*ast.Ident)
if !ok {
    panic("Something went horribly wrong")
}
return &ast.MemberAccess {
    Name:   e,
    Member: name,
}

// or...

name = p.parseIdentifier()
return &ast.MemberAccess {
    Name:   e,
    Member: name.String(),
}

This would suffice to satisfy the typechecker, but it is clear that something is going (potentially) wrong. If we make Member either a Ident or Expression, we introduce unnecessary burden of proof onto a person using our code, since they have to ensure that the things are exactly what they are claiming to be.

In respect to type-casting or taking String(), we are:

  • punching holes in our type system (essentially saying “trust me, bro”)
  • making some bold assumptions about the code of parseIdentifier() that exist nowhere except our heads and which compiler is blissfully unaware of

That’s why people don’t really see compilers as nothing more than machine-code-translating tools. They are not powerful enough to prove some stuff that we really wish we could prove about our code. And trying to reach to that level would require some awkward magic.

But then again, there are things that we can do. You probably saw this line:

// code above
return nil

What are you doing with this? You’re essentially saying that something doesn’t exist.

Fair enough. But then again, do you always check for nil? Probably not, that’s why you have invariants, right?

But then again, when you forget about checking for nil, things go horribly wrong.

The question is, how to deal with that?

Well, what if you don’t return nil in the first place?

If you don’t return nil, then what do you return?

Depends on the context. For example let’s get back to our language. Is this a valid expression?

()

You might say no and you would probably be right. How would that work out?

2 + ()

Clearly, this is a problem. When parsing the code, we expect some expression inside the parenthesis.

if !p.curTokenIs(token.LPAR) {
    // ensuring preconditions
    panic("Impossible case, not parsing a subexpression")
}

p.nextToken() // skipping (
if p.curTokenIs(token.RPAR) {
    return ???
}

You can’t return nil, because the code that called the function to parse subexpression assumes that the function always succeeds.

What about this instead?

type EmptyExpression struct {
}

func (ee *EmptyExpression) expressionNode() {}

And now:

p.nextToken() // skipping (
if p.curTokenIs(token.RPAR) {
    return &ast.EmptyExpression{}
}

Now you’re not returning a nil and you function ended successfully. While it still doesn’t resolve the problem, at least you live another day to hope that this problem gets resolved by somebody else (your language’s typechecker).

At some point, when I made the first prototype of the parser in 1100+ lines of (arguably) bad code, I failed the simplest test:

1

While not all of the code that I wrote was bad, I clearly made wrong assumptions and told about wrong assumptions to compiler. The problem is that I don’t handle newlines and end-of-file nicely. To do that, I would need to introduce a state.

If there’s no state == bugs, then you must be extremely smart person. For me bookkeeping is extremely difficult, so I try to simplify the process as much as possible. Instead of keeping track at which state I’m right now, what if the functions returned all the necessary information.

What I needed, was to say that the functions may fail. The current types are saying that the parse() methods will always return a node of the AST. But that is not true.

Since I’m coming from functional programming background, I decided to bring friends of my own. Enter Options:

type optionExpr struct {
    has_value bool
    value     *ast.Expression
}

I would wrap the value inside this struct and make functions return optionExpr instead of ast.Expression.

To create a Some(e):

func some_of_expr(e *ast.Expression) optionExpr {
    return optionExpr{
        has_value: true,
        value:     e,
    }
}

To create None:

func none_of_expr() optionExpr {
    return optionExpr{
        has_value: false,
        value:     nil,
    }
}

Nice. Now, how does one work with it? Checking everytime for if opt.has_value { // ... would quickly become tedious and doesn’t solve the underlying problem. So we need one more helper function:

func bind_of_opt_expr(e optionExpr, f func(*ast.Expression) optionExpr) optionExpr {
    if e.has_value {
        return f(e.value)
    } else {
        return e // e is None
    }
}

This comes from monads, but the idea is that there are some rules that we need to abide by to ensure the type consistency (returning optionExpr is one of them).

So now, what you would do instead of working with values directly, you would create “callbacks” that would represent your “happy path”, where everything works correctly. bind_of_opt_expr() ensures that if there is None somewhere along the line, the overall result will become None.

So now I started to replace the return argument from ast.Expression to optionExpr and once again I started getting type errors. But those are good type errors. Because now compiler was telling me where I made “leaps of logic” and where I need to put more legwork to prove the correctness of the code.

Of course, writing such code in Golang is a bit tedious, because Golang’s type system sucks. I realize why people don’t code in functional style: they don’t know languages that are more well suited for that thing.

At some point I had to have an option of []ast.Expression:

type optionExprList struct {
    has_value bool
    value     []*ast.Expression
}

func some_of_expr_list(e []*ast.Expression) optionExprList {
    return optionExprList{
        has_value: true,
        value:     e,
    }
}

func none_of_expr_list() optionExprList {
    return optionExprList{
        has_value: false,
        value:     nil,
    }
}

func bind_of_opt_exprlist_ret_exprlist(e optionExprList, f func([]*ast.Expression) optionExprList) optionExprList {
    if e.has_value {
        return f(e.value)
    } else {
        return e // e is None
    }
}

At some point I needed a bind that accepts the optionExprList but instead returns optionExpr:

func bind_of_opt_exprlist_ret_expr(e optionExprList, f func([]*ast.Expression) optionExpr) optionExpr {
    if e.has_value {
        return f(e.value)
    } else {
        return none_of_expr()
    }
}

And then I required a bind that accepts optionExpr but returns optionExprList:

func bind_of_opt_expr_ret_exprlist(e optionExpr, f func(*ast.Expression) optionExprList) optionExprList {
    if e.has_value {
        return f(e.value)
    } else {
        return none_of_expr_list()
    }
}

Of course, it is tedious, not mentioning that you always have to do the return some_of_expr(e) inside the binds. But this is an important measure to try to be honest about your code to the compiler and to yourself. Instead of waiting for those problems to explode in your face during runtime, you’re dealing them during compilation, on a playground where you have arguably more control.

Hopefully you learned today how you can use compilers to aid you in your development, instead of just building stuff.