I was working with a toy example today and thought it would be fun to write a comparison on how to validate user input in object-oriented programming (using C♯) and functional programming (using F♯).
Let’s dive right into it.
The problem domain
Given a simple JSON configuration file with three fields:
{
"Url": "",
"Name": "",
"Age": 42
}
How can we validate the config so that consumers can use it without validating themselves?
To scope down the problem to fit a blog post, we will make a few assumptions:
- The JSON configuration file always exists
- The JSON configuration file always contains these three fields only
- The source code of constructing a
Config
model is not given to you, i.e. you must rely on the compiler and the API exposed to you
The only validation we want to perform is on the values of those three fields. Specifically, we want to enforce the following:
Url
andName
must not benull
or emptyUrl
must contain a valid HTTP/HTTPS URLAge
must be greater than 18 (imagine we are building an application for adults :))
First up, let’s see how we can do so in OOP.
OOP: Good old exceptions
If you are familiar with any kind of mainstream OOP language, one obvious choice might be using exceptions to halt the application due to the invalid JSON configuration.
This should be the bread and butter of every OO programmer and would not need much introduction. There are a few problems with this approach though:
- No errors caught at compile time. Callers of the constructor invoking
new Config(null, null, 0)
will have no way of knowing this might result in an exception before runtime. (This of course does not hold if you have access to the source code ofConfig
). - Errors do not aggregate. If the caller calls the constructor using
new Config("http://www.google.com", null, 10)
and got anArgumentNullException
(due toname
equals tonull
), fixing thename
and calling the constructor for the second time will cause another exception (due toage < 18
). You will have to fix theage
parameter and run the code for the third time to get a validConfig
instance constructed. Why can’t I get all the errors during my first call so that I can fix it in one go?
OOP: Error aggregation in constructor
A second OOP approach would be to have some sort of a error reporting when constructing the Config
model:
This might not like typical OOP code you see, but it does give you the benefit that calling new Config(null, null, 0)
will always succeeds. Examining the Errors
property also gives you all errors aggregated so that you can fix the problematic parameters in one go. However this also poses some problems:
- Relying on the caller to examine the
IsValid
orErrors
property after the constructor finishes. Since the constructor now always return aConfig
instance without exceptions, callers must examine theIsValid
orErrors
property to check for validity. This is extremely error-prone. - Temporal coupling between the
Errors
property and other properties of theConfig
model itself. Consumers of theConfig
model must always examineIsValid
before callingconfig.Name
orconfig.Url
because both or one of them may returnnull
, leading to unclear trust boundaries and error checking spread all over the system.
OOP: Return null config model
The last OOP validation I can think of is to return null
when trying to construct an invalid Config
instance. Since constructors cannot have return
statements, we must use a factory method:
The constructor is marked as private
to deliberately force callers to use the factory method instead. I personally think this is the worst OOP approach:
- Error information lost due to using one single value (null) to represent failure. Callers will not know which parameter causes the failure and will have to guess.
- Still no help from the compiler if null is returned from the factory method. A
NullReferenceException
will still occur at runtime if you forget to do a null check after calling the factory method.
FP: A config model, maybe?
One advantage of using a functional language like F♯ is that you get very expressive generic types built-into the core library itself. One of the types is called Option
, which is how F♯ represents the ‘absence’ of data.
This is especially important because unlike C♯, F♯ does not allow null
by default (except when interoping with other .NET languages). The only way to represent missing data is to use this wrapper type Option
.
Since I assume my readers to be unfamiliar with FP, I have redefined the Option
type here for your reference. But the focus here is to look at the makeConfig
function, it returns an Option<Config>
instead of a plain Config
.
The difference now is that you can no longer access the members of Config
directly after calling makeConfig
:
Contrast the above with the following where the Config
model is constructed directly without calling the makeConfig
function:
You can see that Intellisense gives different results because the first and second pieces of code returns different types, Option<Config>
from the first and Config
from the second.
Why is it a good thing? Because now the compiler is able to catch the error for us! In order to consume the Option<Config>
returned, we now have to perform an additional step to extract the underlying config out of the Option
type. This forces the caller to be aware of the possible failure!
The caller has to perform ‘pattern-matching’ in order to extract the underlying config from the return value. Otherwise, the code will not compile.
However, using the Option
type is limited in its own sense in terms of validation:
- We have lost all error information from the caller’s perspective. We do not know what causes the
makeConfig
function to returnNone
. - Errors do not aggregate as well. We do not know whether one, two, or more errors occurred.
FP seems to be worse than OOP shown above? What gives?
Do not worry, we will explore stronger types in the following FP programming techniques: the Result type and ‘a better Result type’.
FP: A better option: the Result type
Apart from Option
, we have another stronger type from the F♯, called Result
Don’t worry if the type definition looks confusing. Just focus on the makeConfig
function, the point here is to demonstrate how FP handles validation, not deep diving into the intricate details of monads.
Now things get a bit interesting. We get a major improvement from using Option
: the errors are back! And they specifically pinpoint what went wrong with the config, as shown in a simple interactive session playing with the code:
We get all the benefits from using a wrapper type like Option
, and changing Option
and Result
lets us clearly show what went wrong with the input parameters of makeConfig
.
However, we still cannot aggregate errors into a single place. Can we do better? Yes!
FP: Monoidal validation
Let’s first take a step back and split the validation into different helper, sub-validation functions:
We now have three separate functions, one for each field in the Config
model: validateName
, validateAge
, and validateUrl
. Nothing too special, but it will be easier to understand after we make the next change.
We will now try to solve the problem where errors are not aggregated properly when there are more than one. How? By storing the errors in a list of course!
Notice the sub-validation functions now all return a Result<T, string list>
instead of a Result<T, string>
.
This allows us to concatenate, and thus aggregate all the errors occurred from the helper functions. We can then modify the makeConfig
function to pattern-match on all possible cases and return the aggregated errors to the caller.
The reason why the pattern-matching looks so intimidating is because we have to cater every possible combination on where the errors occur, i.e. whether the errors arise from url
and/or name
and/or age
.
And finally! We got what we wanted:
To recap what is achieved:
- Type-safe error checking. Callers of
makeConfig
cannot ignore the possible validation failure because the compiler prevents the caller from directly access theName
,Age
, andUrl
fields - Meaningful and domain-specific error messages. Since validation functions are small and target specific fields, we can build very specific error messages that guide the caller to what is considered valid input.
- Aggregated error reporting. Callers of
makeConfig
get all the aggregated validation errors in one go. Subsequent errors are just ‘appended’ to the end of the original error list.
Some final words
All FP techniques I demonstrated here seem to be jumping through multiple hoops to get some very simple validation done. Is it worth it?
In my humble opinion, yes it does. Because we do not need to write most of these boilerplate pattern-matching code by hand. They are already discovered as a concept called Applicative Functors. If this seems a very scary word, yes it is. Fortunately, to most software engineers, this is already a solved problem and we can stand on the shoulders of giants and reuse well-known solutions:
Most of the heavy pattern-matching noise is gone by using two additional custom operators <!>
and <*>
. These two operators are well-known in the FP world as fmap and apply respectively. They come from the concept of applicative functors mentioned above.
If you try to compare how the same function with ‘normal’ parameters and parameters wrapped in Result
is called in F♯:
You can see that having these operators makes the transition extremely clean. In fact, this pattern is so common in FP such that Haskell’s typeclasses have them defined for every instance of Applicative.
I hope this article is useful to those who want to explore the FP paradigm through a meaningful, worked example.