Higher order functions

Functional programmers are all about making things better.

The problem is that we don’t know how to make things better.

So we tend to explore different ideas. And most of them don’t work. But some turn out to be surprisingly effective. So much so, the other languages tend to get those ideas into their own languages and then when you try to argue why you should learn a proper functional programming language, people look at you like crazy, saying that they already have all the fancy functional programming stuff in their language and that they already know what it is, while their code is still 90% imperative and state mutating shit. Sorry, nasty flashbacks.

Anyhow, one of the sources of bugs is the constant repitition. There are many boilerplate code that you have to write for some reason, and that people hope that AI will manage all of those things.

Boilerplate code is such a problem, seemingly most of the research in computer science is about making the code-reuse as easy as possible. Remember inheritence? Its main selling point was the easy code reuse: just inherit from the class whose code you want to use. It took people a while to realize that it is a bad idea composition over inheritence.

I know it is hard to tell, but functional programmists are also human. And turns out we also don’t like boilerplate code.

And so, we found our ways around it. The reason people don’t know about it or they have difficulties understanding that concept is because of heavy influence of C. People know functions from C (which are actually procedures, i.e. variables for storing code), so the things that C funcitons don’t do must be impossible.

However, that is not the case. You know, we have dependency injection of our own.

Let’s take a look back at one of our examples:

def sum_left_to_right(l):
  sum = 0
  for i in range(len(l)):
    print("Currently on: {}".format(i))
    sum += l[i]
  return sum

def sum_right_to_left(l):
  sum = 0
  for i in range(len(l) - 1, -1, -1):
    print("Currently on: {}".format(i))
    sum += l[i]
  return sum

As you can see, those two functions are basically the same. The only difference is how we go through the range. So this function could be abstracted like this:

def sum_over_range(l, _range):
  sum = 0
  for i in _range:
    print("Currently on: {}".format(i))
    sum += l[i]
  return sum

So now if you want to do summation from left to right or right to left, you just pass the proper range, like this:

sum_generic(l, range(len(l)))
# or
sum_generic(l, range(len(l) - 1, -1, -1))

But that doesn’t look so nice, especially since you may easily forget which argument is which. There are ways to do that, but we don’t have necessary tools to do that just yet. So let’s come back to that problem later.

Believe it or not, but you’ve already been exposed to functional programming in your school. You probably know this symbol: Σ. Unfortunately, it doesn’t look so nice in this blog post and I haven’t figured how to draw LaTeX expressions using JavaScript (damn you, webpack!). So let’s write it like this:

  N
 sum { i } = N * (N + 1) / 2
i = 1

The way we would write in Python:

def sum_n(l):
  s = 0
  for x in l:
    s += x
  return s

This is good. But now imagine that we have to sum squares:

  N              N(N + 1)(2N + 1)
 sum { i * i } = ----------------
i = 1                   6

No problem, let’s write it in Python:

def sum_sq(l):
  s = 0
  for x in l:
    s += x * x
  return s

Nice, easy-peasy. But now we are asked to sum the square roots:

  N
 sum { sqrt(i) } = ???
i = 1

While I don’t know the equation for that, we can still write the code that does this for us:

def sum_sqrt(l):
  s = 0
  for x in l:
    s += sqrt(x)
  return s

It was easy, but hopefully you’re starting to see the problem now. Let’s look at the code that we’ve written:

def sum_n(l):
  s = 0
  for x in l:
    s += x
  return s

def sum_sq(l):
  s = 0
  for x in l:
    s += x * x
  return s

def sum_sqrt(l):
  s = 0
  for x in l:
    s += sqrt(x)
  return s

We basically wrote the same code several times. The only thing that is changing is the term that we sum over. The terms are:

x
x * x
sqrt(x)

Let’s look at the mathematical expressions:

  N
 sum { i }
i = 1

  N
 sum { i * i }
i = 1

  N
 sum { sqrt(i) }
i = 1

Those are awfully similar as well, and the only parts that are changing are:

i
i * i
sqrt(i)

They are basically the same! That is not a coincidence.

Let’s look at the same part of the mathematical expression:

  N
 sum { ... }
i = 1

What this expression says, is to sum the terms described in the {...}. In fact, we can write it like this:

  N
 sum  f(i)
i = 1

Turns out, we can do the same in the Python code:

def sum_generic(l, f):
  s = 0
  for x in l:
    s += f(x)
  return s

It is basically the same code, with some key difference: we are passing f() to sum_generic().

“Passing functions as arguments?” you might be asking. If it is your first time seeing this, this is a quite important and powerful idea that you’ll be seeing often.

Now we can write our sum_n(), sum_sq() and sum_sqrt() in terms of sum_generic():

def sum_n(l):
  return sum_generic(l, lambda x: x)

def sum_sq(l):
  return sum_generic(l, lambda x: x * x)

def sum_sqrt(l):
  return sum_generic(l, lambda x: sqrt(x))

Here lambda is just a function that accepts an argument x. Essentially lambda is an anonymous function.

In functional programming we use functions for almost everything, even minute things, but it is still difficult to come up with meaningful names, so lambda is a way to create functions on the fly (and not give them names).

Now the obligatory explanation of higher ordered functions.

In the C language, functions are just a way for you to structure sequence of instructions. You just give them a name and then invoke those instructions with a name. In a sense, you could write down those instructions with a macros instead, and they would still act the same.

However, higher ordered functions allow you several things:

  • They can accept other functions as their input arguments
  • They can return functions as a result
  • Those functions can be stored in a variable

Hopefully you can see that the idea of the higher ordered function is not as scary as it seems, although the words may sound scary or obscure (math people are meanies, I’m sorry).

Higher ordered functions are such a powerful and useful idea, that people would hack Python to provide those capabilities, by generating code that is similar to lambdas and then evaluate it, or by changing AST, despite Guido’s thinking otherwise about lambdas. In the end, Guido allowed it, but the lambdas are half-assed, unfortunately. This is something we have to live with.

Additionally, higher ordered functions exist even in C++ (since C++11!). While their type is difficult to nail down properly, you can use templates to help you out:

template <typename Fn>
auto some_function(Fn f, int x){
  return f(x);
}

int main(){
  std::cout << some_function([](int x){ return x * 2; }, 3) << '\n';
  return 0;
}

While the idea of higher order functions may seem simple, there’s one important reason that many math people get overly excited about them: when providing a function f(), you can create a new function g() that can do some more extended things, more than f() could! We’ll see examples of it in the future, if you stick around, but for now just keep the idea in the back of your mind.

Now, the question is: how exactly do you use the higher ordered functions?

Well, you may have had this thought a lot: “Okay, let’t write for again…”, and then you’d write a code similar to this:

for i in range(len(a)):
  # busy doing computations with a[i]
  b[i] = ... # do some computation

From what I can see, this can be abstracted away like this:

for i in range(len(a)):
  b[i] = f(a[i])

where f() is some function that does all the computations that we did.

Let’s write a function that abstracts away this boilerplate code:

def map(a, f):
  b = [None] * len(a)
  for i, x in enumerate(a):
    b[i] = f(x)
  return b

The reason we picked the name map is because map is a name often used for this kind of function: which takes some function f() and applies it to each element of some list a and produces a new list b.

In reality, map doesn’t need to work on lists only, but this, and the reason why this function is called map will be covered later.

On the side note, the more declarative way to write this, is to do it like this:

def map(a, f):
  return [f(x) for x in a]

If it trips you up, just stop and think for a while. If it doesn’t make sense, think of a set theory. What is a set theory? Well, set theory is what happens when you try to give names to things that you took for granted.

Anyway, there’s a function called map, that does the similar thing as the map that we defined. The only difference, is that while our versions return list, the Python versions return an iterator. The reason will become obvious later, but if you want to use python versions, don’t forget to transform them to list:

list(map(...))

So, map is one of the functions that we often use, because it comes from the imperative for loop. Another thing that we would often do is something like this:

s = 0
for x in a:
  s += f(x)

# do something with s

This is something that functional programmers also often use. The names are reduce(), fold(), left_fold(), right_fold(), etc. Let’s write our own:

def reduce(a, f, initial = 0):
  acc = initial
  for x in a:
    acc = f(acc, x)
  return acc

Here f(acc, x) is a function that takes two parameters:

  • The accumulator (acc)
  • The value (x)

Essentially the function f() tells the reduce how to combine, or reduce the values in the list to one value. So if we want to write a sum function, we need to write how to combine the values into a sum:

def addition(acc, x):
  return acc + x
  
def sum(l):
  return reduce(l, addition, 0)
  
sum([1,2,3,4,5])

We can describe many different functions using reduce(). For example we can calculate the length of the array:

def length(acc, _):
  return acc + 1
  
def len(l):
  return reduce(l, length, 0)

len([1,2,3,4,5])

Essentially, for every element in the list we do +1 to the accumulator. That is in the essence the definition of the length of the list.

Or we can find the maximum in a list:

def _max(acc, v):
  return acc if acc > v else v

def my_max(l):
  return reduce(l, _max, MIN_LIMIT) # MIN_LIMIT is impossibly small value

my_max([1,2,3,4,5])

reduce() is a ubiquious tool in the toolbox of functional programmers, so get used to it. fold is just another way to say reduce: the list is being folded into a single value, instead of being reduced.

While this makes sense, what about fold_left and fold_right?

Since the functions fold_left and fold_right are pure, they can be rewritten like this:

fold_left (l, f, init) = f(f(f(f(init, l[0]), l[1]), l[2]), ...)
fold_right(l, f, init) = f(f(f(f(init, l[-1]), l[-2]), l[-3]), ...)

If it doesn’t make sense, then assuming that is some arbitary binary operator and that all types are consistent, then we can rewrite it like this:

fold_left (l, , init) = ((((init  l[0])  l[1])  l[2])  ...)
fold_right(l, , init) = (...  (l[-3]  (l[-2]  (l[-1]  init))))

This is something called equational reasoning (remember the first part and Physics example?). By equalizing two sides, we can say that one is equal to another. This is a powerful tool in the hands of a functional programmer. Pay close attention to that.

What it tells us in this case, is that the fold_left and fold_right behave as if inserting some binary operator between each of the elements in the list. The difference, is that in case of fold_left, we are moving from left to right. The associativity is left.

In case of fold_right we are moving from right to left. The associativity is right.

If it feels arbitary, the reason is because it is. I may be wrong, so when you’re using your language’s fold_left and fold_right, please consult the documentation.

And another thing, I may have lied about reduce and fold. They are not exactly the same thing.

In case of fold, you have to provide an initial element. But in case of reduce, the initial element is the first (or last) element of the list.

Or I’m lying again.

Anyway, nice to know, right?

Well, there’s another function that we often use and in fact gets implemented in almost every other mainstream language. The use case for that function, I’ll be honest, not so clear, until you try to use it. But here it is:

def magical_function(l, p):
  res = []
  for x in l:
    if p(x) == True:
      res = res + [x]
  return res

Now, what is function p()? In mathematical language this is called a predicate. It is a fancy name for a function who’s return type is a bool:

def p(x: any) -> bool:
  # predicate function

You may have seen it, but more like part of an SQL query:

SELECT name FROM accounts WHERE name LIKE 'D%' -- select all names starting with "D"

In Python, equivalent imperative code:

names_starting_with_d = []
for acc in accounts:
  name = name_of(acc)
  if name.startswith("D"):
    names_starting_with_d.append(name)

# use names_starting_with_d

Kinda boilerplatey, eh? Compared to nice SQL version.

In my case, I often used this function when working with a set of data received from some REST API and then making sure it matches some specific criteria, because the data is really volatile and comes in different shapes. filter is what brings back sense and structure into my dataflow, even if I have to remove bad apples.

l = list(range(1, 11))
is_even = lambda x: x % 2 == 0
print(magical_function(l, is_even)) # prints [2, 4, 6, 8, 10]

There’s another way to achieve this result, using list comprehensions:

print([x for x in range(1, 11) if x % 2 == 0])
# prints [2, 4, 6, 8, 10]

Overall the syntax for list comprehensions is way better in Python because it is functional and declarative by design. However, composing them is a pain, especially when you don’t know how exactly the variables get captured, so you have to do alpha-conversion by hand (another fancy name for renaming variables to avoid confusion and conflicts).

This is nice and dandy, but what so special about them? Well, nothing really, the ideas are simple. We will get more into the implications once we get a stronger base, but for now, just use the functions however you like. Afterall, the joy of learning is in experimenting and making mistakes. The most important aspects to keep in mind, is that the functions are pure, we are just transforming the data, and we do it without changing the data we were not supposed to change (immutability, remember?).

So, when you try to do things more and more declaratively, you’ll realize that you need more and more functions.

For example, you have a list ['a', 'b', 'c', 'd'], and you want it to have dashes interspersed:

assert intersperse(['a', 'b', 'c', 'd'], '-') == ['a', '-', 'b', '-', 'c', '-', 'd']

Or for example you have two lists and you want to zip them:

assert zip(['a', 'b', 'c'], ['d', 'e', 'f']) == [('a', 'd'), ('b', 'e'), ('c', 'f')]

Or you want to shift the elements in the list by some n:

assert shift([1, 2, 3, 4, 5], 1) == [2, 3, 4, 5, 1]

There are many functions that are simple, but can be quite powerful.

Again, to make the most use of them, the idea is to make them composable.

While the functions shown in the examples are not higher-ordered functions like map, reduce and filter, there are many functions that you need to use to achieve your goals and simply using “the trifecta” won’t be enough, so keep your eyes and mind open.

Also, there are some performance implications for using those functions: they are working on immutable lists, so in theory it means a new copy each and every time.

Remember when I said that the map returns iterator? Well, the thing is that you may find yourself composing map() with each other, or map() with filter() and so on. So instead of creating a new copy of a list each and every function call, the methods return iterator and then you generate the list only when you actually need to. This is some sort of optimization, but it is an “accidential complexity” still.

For that reason, there are several things that you can do:

  • Minimize the amount of data you work on: instead of using map and then filter, try filtering first and then mapping.
  • If you have several map functions, try to turn it into one map (and suffer style deductions):
map(f, map(g, map(h, l))) ~> map((lambda x: f(g(h(x)))), l)
  • If you really need performance, then use side-effects like mutation, but make sure that the function is pure when observed from outside! Make sure the side effects don’t leak outside of your function.
  • Perform actions in parallel (if they can be done so, like map and reduce, and filter)

If still, none of the solutions are fast enough, try to think if you really need that speed. And if you do, well, you are free to abandon the purity and composability alltogether. The most important part is that you know what you’re doing.

You’re a professional after all.

Well, we’re almost done with the higher ordered functions. Let’s just cover some of the concepts and ideas that rear their heads here and there all over the place in functional programming languages: currying and piping.

They are simple, trust me.

One of the reasons you weren’t able see all the interesting functional programming ideas is because the languages themselves don’t actively push you towards discovering those ideas, or worse yet, actively prevent you from coding in a pure and manageable style.

So those things you haven’t seen, well, because people usually don’t code like that.

You remember the talk about functions being composable? Well, the composability is really the important aspect that we are trying to achieve (sometimes to our own detriment). The more we code with simple functions, the more we feel the need to compose functions. But if we can’t compose functions, we curse, give up hope and get angry at the functional programming style for being useless and wasting our precious time that we could have spent on TikTok, Instagram and Twitter instead.

Imagine some magical function like this:

res = pipe(accounts,[
  get_boys,
  get_adults,
  extract_names,
  names_starting_with_a,
  capitalize
])
print(res)

While we don’t know how those get_boys, get_adults, extract_names etc. are implemented, we kind of get the idea of what is going on. This code is easy to understand. One would say that it is encapsulation (which I think it is not, it is just good code).

This thing is called pipe, and the process of using pipe is called piping (I told you it was easy). The idea is that we take the output of the previous function and make it an input of the next function. It is a pipe as in bash, shell, etc. (please tell me you know what bash is…) We provide initial argument (in this case accounts), and then pass it to an initial function and then go on.

This is simple to understand, but we have a problem: the application of the function must be delayed somehow.

Well, it is simple, if we know what currying is (no, it has nothing to do with curry, and no, I don’t know why it is called currying, please don’t ask me that).

If you start to get into Haskell, you’ll start to see something like this:

add :: Int -> Int -> Int

This is how type signatures are done in Haskell. When I was looking at that, I couldn’t understand why it is done like this.

If you don’t want to understand why it is done like that, then the simple explanation that the last item is the return type, and everything before that is the type that it accepts:

add :: Int -> Int -> Int
        ^      ^      ^
        |      |      |
        |      |  return type
        |    arg2
      arg1

Quite simple.

The code for that function looks like this:

add :: Int -> Int -> Int
add x y = x + y

In Python it would be something like this:

def add(x, y): return x + y

Except it is not.

In Python all functions are uncurried by default.

In Haskell all functions are curried by default.

So the functions are not equivalent. If we want to write a similar function, we need to write something like this:

def add(x):
  def f(y):
    return x + y
  return f

Now it is equivalent.

Are you asking “why?”?

Well, in Haskell we can write like this:

add3       = add 3
seven      = add3 4
also_seven = add 3 4

Now, since our Python function is curried, we can do this:

add3       = add(3)
seven      = add3(4)
also_seven = add(3)(4)

What I did just now is called partial application. It is when I apply function partially, i.e. I don’t give it all the arguments immediately, but do it in steps. One of reasons for doing so is:

  • Sometimes arguments are not available immediately
  • You have a list of arguments that you always pass. You can partially apply those values and create functions that accept only arguments that really matter
  • We can now do piping

Before that, let’s get back to our type signatures in Haskell:

f :: a -> b

This is a generic type signature in Haskell which says that f is a function from type a to type b, i.e. it accepts an argument of type a and the return type is b.

Now, let’s look at this:

g :: a -> b -> c

This is another generic functions that accepts two arguments of type a and b and produces type c.

Now, -> is actually an operator, a function operator. There are implicit parentheses placed in the last type signature, let’s make them explicit.

g :: (a -> (b -> c))

In fact, -> is right associative. Another fact, b -> c is just another type, so this is equivalent to:

type d = b -> c
g :: a -> d

Now, the magic. When we have a function of type a -> b and we provide it a, what we get is b. So by this logic, if we take a -> d and provide it a, we get d. But d is actually b -> c, so we actually get another function.

So now hopefully you see why we written the Python code that way:

def add(x):
  def f(y):
    return x + y
  return f

When we first supply x, we as a result get a new function which now accepts y. Once we get both x and y, we can evaluate the result.

With this we can do piping. Let’s write out those functions first:

def get_boys(accounts):
  return accounts['boys']
  
def get_adults(boys):
  return [x for x in boys if boys['age'] >= 18]
  
def extract_names(boys):
  return [x['name'] for x in boys]

def names_starting_with_a(names):
  return [x for x in names if x.startswith('a') or x.startswith('A')]

def capitalize(names):
  return [x.upper() for x in names]

This is good.

Actually, function with one argument can be used in a pipe. So let’s write a pipe function:

def pipe(v, l):
  res = v
  for f in l:
    res = f(res)
  return res

What we essentially did is create a function that takes the output value of a function and makes it an input for the next function. Essentially what pipe function is supposed to do.

Now, if the functions had to take more arguments, we would be in a bit of a trouble. Again, because functions are uncurried by default, the Python would complain that we applied less arguments than we should have.

Let’s make a function that curries other functions:

def curry_that(f):
  def foo(l):
    def bar(x):
      args = l + [x]
      return f(*args)
    return bar
  return foo

This is essentially a poor man’s curry.

What we are doing here, is that we are taking a function f, and then stage it to accept arguments in two ways:

  • The initial list of arguments, except for the last one
  • The last argument

After all the arguments are passed to the function, it will collect all the arguments into the list, and then expand them using the * operator.

This is how we would use it:

def triple_sum(a, b, c):
  return a + b + c
  
t_sum_curry = curry_that(triple_sum)

print(pipe(3, [t_sum_curry([1, 2])]))

It is a mess, I know. But then again, you should use better languages. Hopefully, you are starting to see why I don’t like Python.

There are better alternatives in the PyMonad library, like @curry decorator. Look up the documentation if you’re interested.

Now, hopefully you understand those tools and they will be useful to you or you finally decided to code in the better programming language.

We are almost there, I can see the light.