I’ve written several posts lately about various algorithms that can be expressed as functional folds:
These have all been numerical algorithms. Insertion sort is an example of a non-numerical algorithm that could be implemented as a fold.
Insertion sort is not the fastest sorting algorithm. It takes O(n2) operations to sort a list of size n while the fastest algorithms take O(n logĀ n) operations. But insertion sort is simple, and as it works its way down a list, the portion it has processed is sorted. If you interrupt it, you have correct results given the input so far. If you interrupt something like a quicksort, you don’t have a sorted list. If you’re receiving a stream of data points and want to sort them as they come, you have to use insertion sort rather than something like quicksort.
The function to fold over a list looks like this:
f as a = [x | x <- as, x < a] ++ [a] ++ [x | x <- as, x >= a]
Given a sorted list as
and a value a
, return a new sorted list that has inserted the element a
in the proper position. Our function f
does this by joining together the list of the elements less than a
, the list containing only a
, and the list of elements at least as big as a
.
Here’s how we could use this to alphabetize the letters in the word “example.”
foldl f [] "example"
This returns "aeelmpx"
.
Haskell takes our function f
and an empty list of characters []
and returns a new list of characters by folding f
over the list of characters making up the string "example"
.
You can always watch how foldl
works by replacing it with scanl
to see intermediate results.
scanl f [] "example"
returns
["", "e", "ex", "aex", "aemx", "aempx", "aelmpx", "aeelmpx"]
A nice feature of insertion sort is that it runs very quickly on inputs with few inversions (linear on already sorted inputs). If I understand your code (and Haskell) well enough, it will always be quadratic. Do you know if it’s possible to get the performance improvement without losing too much elegance?
Ben: I don’t know. I was strictly going for elegance, not efficiency.
It seems that a purely functional implementation is going to be less efficient since it takes so little work to do an insertion sort in place on a nearly sorted file. A pure functional implementation is going to make a lot of copies unless you do some magic under the hood to give the appearance of pure functionality while actually sorting in place.
Hi,
If you define
f as a = before ++ [a] ++ after where (before, after) = span (<a) as
then this is linear on inputs that are in reverse order. It avoids too much copying by leaving the tail of the list alone (not even inspecting it).
Cheers,
David
If Haskell, folds, and functional programming would not have been mentioned, then I would have thought that this is BrainFun programming language. Anyway, nice obfuscation!
DTA: Haskell can certainly be obfuscated. Sometimes it goes through painful contortions to do what would be trivial in another language. But I think in this case it’s pretty clear, once you know the notation.
The list comprehension
[x | x <- as, x < a] says "The set of x's such that x comes from the list as and x is less than a." One thing that would make it more readable would be to use something like "mylist" instead of "as." It's a convention in Haskell to use very short variable names, with "s" on the end for lists. Maybe you could use "datum" for the value "a" being inserted and "data" for the list "as." Then ++ [a] says "concatenate the list containing just a." Then we concatenate another list comprehension, the elements bigger than a. ++ [x | x <- as, x >= a]
It’s very declarative compared to a couple nested for loops.
Below is a direct translation of the code, with no attempt made to make it more efficient. It is just for fun.
import Control.Monad (ap)
import Data.List (partition)
insertionSort = foldl f ([])
where
f = ap (uncurry . flip ((.) . (++)) . (:)) . flip (partition . flip (<))