I recently came across this fairly basic, run-of-the-mill interview question that involves writing algorithms.
Given an array of elements that are positve numbers, write an algorithm to find the second largest element
I personally loathe interview questions involving algorithms because they never reflect what am I expected to actually do on the job, but I reckon it would be fun to use this question to compare the difference between C♯, a popular OOP language, and Haskell, an FP language.
If I were to write it in C♯, I would have probably written it like this:
This seems reasonably well-implemented:
- It is generic: it works on all number types such as
int
,double
,decimal
and all types of collections, asIEnumerable<T>
is the base type of all collections - It is reasonably performant: it does not do too many unnecessary allocations without sacrificing readability
- It is reasonably readable: I took advantage of the latest and greatest language features when I think they improve comprehension, e.g. the use of
ValueTuple
on line 6
Now contrast the way if I were to write it in Haskell:
Even if we discount the syntactic noise like braces and access modifiers in the C♯ version, the Haskell version still looks way more terse and concise than the C♯ version. Now granted Haskell was designed to be highly adapted to writing number algorithms like this, I cannot help but feel every keystroke I made on the Haskell version was more purposeful than the tedious ceremony I was required to comply with in order to make the C♯ version compile.
The Haskell version is also reasonably well-implemented (pardon my basic knowledge of Haskell that may limit my ability to judge what is well-implemented):
- It is generic: it works for all number types as well due to the
Num
typeclass constraint. It is even more generic than the C♯ version because this function would work on anyFoldable
structure, lists and arrays are just some examples ofFoldable
- It is reasonably performant: its usage of
foldl'
means it is strict and tail-recursive, thus avoiding stack overflows and stack frame allocation - It is reasonably readable: to untrained eyes, this might look like alien language but I can guarantee this is idiomatic Haskell and it is written professionally
But what makes the Haskell version so elegant and yet so scary to non-FP programmers? I would like to try and break it down in this article.
Deep Dive into Haskell
The first line is an import statement that imports the Data.Foldable
module. It is synonymous with how C♯’s using
statements bring in namespaces. Nothing too difficult here.
The third line declares the type signature of the secondLargest
function. It is not strictly necessary because Haskell’s compiler would have been able to figure that out itself but it is nevertheless good practice for human readers to look at the signature and have a vague idea of what the function is trying to achieve.
secondLargest :: (Num a, Ord a, Foldable t) => t a -> a
This means we are declaring a function that has two generic parameters, t
and a
, the input to the function is of the type t a
and the output to the function is of the type a
. The two generic parameters have different kinds.
t
has a kind of* -> *
, which means it is a type constructor that takes another type to form a proper type. You can think of it as aList<>
(open generic type) waiting for a generic argument to form a fully constructed typea
has a kind of*
, which means it is already a proper type
There are also a total of three generic constraints applied to t
and a
, specified before the fat arrow, =>
.
t
must have aFoldable
instance, which resembles C♯ generic constraints to require this generic argument to implement interfaces/abstract classes. This requirest
to implement eitherfoldMap
orfoldr
a
must have both aNum
instance and aOrd
instance, meaning the generic argument used must be a ‘number’ and must be able to be ‘ordered’
Let us move on to the actual implementation starting from line 4.
Point-free Style
secondLargest = snd . foldl' ...
We are immediately greeted with a peculiarity. I mentioned in the previous section that this function has one input of type t a
, and one output of type a
, but line 4 clearly contradicts my statement! Nothing was between secondLargest
and the =
sign which is where our parameters should have been placed!
This, in fact, is a very common way to write Haskell and is called the point-free style. We achieved this by using the .
operator between snd
and foldl'
.
The .
operator, a.k.a the function composition operator, is the bread-and-butter of functional programming. It is defined as:
It composes two functions f
and g
, returning a third function that takes x
as the input that first runs through g
, then f
, returning the result.
Exactly the same is happening with secondLargest
: we compose foldl' ...
with snd
, returning a function that takes a new parameter. That is how we are able to say secondLargest
is a function that takes one parameter without explicitly putting that parameter in its definition.
On to the implementation
The Haskell version of the algorithm actually looks very similar to its C♯ counterpart. They both keep track of the largest and second-largest elements found so far while traversing the list.
In Haskell’s case, the traversing is done via foldl'
using a two-element tuple (0, 0)
, with the left element representing the largest element so far and the right element as the second-largest so far, as the starting state. The result is then fed to snd
, which is a built-in function that returns the right element in the tuple, in our case, the second largest element found.
The Left Fold
First, let us examine the type signature of foldl'
:
foldl' :: (Foldable t) => (b -> a -> b) -> b -> t a -> b
It requires three arguments:
- A function of type
b -> a -> b
, which is called every time when the structure is traversed, taking the current state, of typeb
, and the currently traversed element, of typea
, returning the next state, of typeb
. We passed a helper function namedsecondLargest'
here to make it clearer how the next state is computed - A value of type
b
, which is the starting state. We passed(0, 0)
as discussed above - A
Foldable
value of typet a
. This is not explicitly passed in our implementation because we are leveraging currying and function composition to avoid explicitly mentioning this argument.
The fully eta-expanded version can be written as but it is usually good practice to eta-reduce when possible.
The Folder
The folding function passed into foldl'
as the first argument is secondLargest'
. It is very common for Haskllers to suffix an inner helper function with an apostrophe so that is what I have done.
secondLargest'
is also used before it is defined. This is allowed in Haskell with the use of the where
keyword, where the definition of secondLargest'
is defined after the keyword.
This helper function is defined using pattern matching and guard clauses. On line 5, we pattern match on the two-element tuple that is passed in (as the state). The elements are aptly named largest
(for the left element), and second
(for the right element) to allow easy differentiation.
I have then employed guard clauses on lines 6–8 to implement the actual algorithm of determining the next state:
- On line 6, if the current element
x
is larger than the largest element found so far, obviouslyx
is now the largest element and the originallargest
becomessecond
, so return(x, largest)
as the next state - On line 7, else if the current element
x
is larger than the second-largest element found so far, then we keeplargest
but takex
as the second-largest found so far, thus we return(largest, x)
as the next state - On line 8, otherwise,
x
is smaller or equal tolargest
andsecond
, no update of state is necessary, we return the original(largest, second)
tuple
After foldl'
is complete, we now have a tuple containing the actual largest and second-largest elements found in the Foldable
. We can finally move on to the last bit.
The Final Stretch
This part is easy. We have our tuple and we feed it to a function named snd
:
It simply returns the second element of a two-element tuple, i.e. the second-largest element from the Foldable
, completing the algorithm.
Phew, that was a lot of words to explain 7 lines of Haskell code! I hope you like my explanation and you have learnt yourself some Haskell today!