Supercharge Your Testing With Property-Based Tests

Rasmus Feldthaus
Level Up Coding
Published in
11 min readJan 5, 2021

--

Photo by Charlotte Coneybeer on Unsplash

Unit testing is great for the cases you anticipated. Unfortunately, bugs tend to show up in the ones you did not. Property-based testing helps to bridge this gap. It does so by rather than having you handcraft each test input, it bombards your codebase with lots of carefully crafted randomized inputs.

As a professional software developer, it is almost impossible, not to encounter testing at some point in your career. Most of us have learned the value of having tests early on in the development cycle. Traditional unit tests have gained popularity because they are usually quick to write, fast to execute and provide a fast feedback loop. Many of us spend quite a bit of time discussing various aspects of unit tests, such as when to write them, how many of them to write, the level of code coverage, and so on.

In this article, we will however look at a different way of writing tests, so that one test covers many different inputs, rather than a single input. To do this we rely heavily on randomization to generate the input we want. For the code examples, we will use the F# language for brevity and leverage the FsCheck library and Xunit for testing. That should not discourage you from reading along as FsCheck is also very much usable from C#.

Unit Tests

A classical unit test consists of the three A’s: Arrange, Act, Assert. We arrange the components and prepare the inputs. We act, by invoking the logic we wish to test with the input, and finally, we check that our assertions about the output holds. To do this, we normally prepare various inputs with their matching output and invoke the logic to test. This is called a test suite. Sometimes if the setup is identical, we can parameterize the test, by attaching multiple combinations of inputs with their matching outputs to the same unit test.

Lets us examine a simple example where we want to test our implementation of a function that takes the absolute value of its argument. This function returns the identity of any positive number and the positive counterpart to any negative number. We also assume that we want to test this function.

So we have written three test cases, one for a positive number, one for a negative number, and one for the number zero. Code coverage looks great, we have tests for both negative and positive inputs, what more is there to be done? The astute reader or experienced developer may think, what about boundary cases? In this case the number -2,147,483,648. How would our implementation deal with that, and is this behavior intentional?

The point here is that it may be quite difficult coming up with the inputs that will cause your code to misbehave in advance, and on the other hand, writing unit tests that cover the entire input range manually may also be cumbersome, and take a very long time to execute. If only there was somewhat of a middle ground that would bring the best from extensive testing, with the fast execution and simplicity of unit tests. This is the middle ground that property-based tests attempt to cover.

Property-Based Testing

In property-based testing rather than specifying each input and output, we specify a relationship that should hold for the function being tested. This could either be between the inputs given to the function and its output, or if calling the function twice on something should be idempotent, or maybe even invert the result of the first call. You can pick quite a few things to choose from. To keep it simple we will choose the following properties to test our implementation from above.

  • The result of abs x must always be greater than or equal to 0 no matter the value of x. Mathematically written abs x ≥ 0
  • For any negative number x, when adding abs x to it, the result should be 0. Mathematicaly written as: For x < 0: x + abs x = 0

Writing randomized property-based tests that cover these properties are easily done using FsCheck and its Xunit integration, in the code below.

The first thing to notice is that rather than writing Fact attributes, we now use Property attributes instead. The second thing to notice is that our functions are no longer without arguments, and no longer has unit as return type. Instead, they accept the input we expect to pass on to the function being tested, and the return type is a boolean representing whether or not the property we want to test for holds for a particular input or not. When FsCheck sees code like this it will look in its library of generators, and start spewing out random input of this type. For each input, it will check if the property holds. FsCheck uses a concept of an Arbitrary, to generate random values. Arbitraries consists of both a Generator, responsible for generating random input and a Shrinker responsible for shrinking the random input to the simplest possible after an input is discovered that breaks the property in a test. FsCheck comes with a number of built-in helper types. IntWithMinMax, ensures int.MinValue and int.MaxValue is included, and NegativeInt will return only negative integers. The DoNotShrink type ensures the shrinker does not run.

After running the checks there is a very good probability that you will see the following error or something that resembles it.

Test Name: abs x >= 0
Test Outcome: Failed
Result Message:
FsCheck.Xunit.PropertyFailedException :
Falsifiable, after 67 tests (0 shrinks) (StdGen (167901972, 296833629)):
Original:
DoNotShrink (IntWithMinMax -2147483648)

What this error tells you, is that after generating 67 random inputs (FsCheck will default to 100 random inputs), it stumbled upon one input with the value: -2,147,483,648 where the absolute value is not positive. This is of course due to the old issue with the complement of two.

And this is exactly the case where using a bit of extra time on property based will give you insights of inputs you may have overlooked, and will either have to disallow and change the preconditions, or accept that your function returns something outside of what you would normally expect. But at least it is no longer hidden behind a veil of ignorance.

Locking Input: Fishing In Troubled Waters

When dealing with randomized testing and you have a failing property-based test, there is no guarantee that you will see this bad input again the next time you run your test. Luckily FsCheck will tell you the seed it has used for generating the bad input. This is what the StdGen (167901972, 296833629) line means from the output. You can use this information to reproduce the issue. For example, you can set up a classic Xunit test, where you force FsCheck to use the original seeds for the randomizer. This allows you to replay the bad input over and over until you found and fixed the error.

This may initially seem like overkill since FsCheck already tells you the bad number. However for much more complicated inputs. Manually replicating it may be a lot more difficult, and then this replay mechanism really shines.

Designing Property-Based Tests

When it comes to deriving good property-based tests, it often pays off to think in terms of domain logic. In many domains, there are operations that cancel each other. Let us assume we run a small exchange with a single order book, that has bids and offers in it. To keep matters simple, the matching engine of this exchange is simple to the borderline of being ridiculous. Bids and offers only match if an incoming order exactly matches the best order present on the other side. Also, order priority is by price only and no secondary ordering. The code has also not been written with efficiency in mind, and there are various issues with correctness, such as handling two orders being placed with the same id.

We may derive several properties we want to hold for this order book

  • Canceling an order, when the order book is empty, results in an empty order book.
  • Placing and canceling an order with the same id, results in an empty order book
  • Placing an order in the book, and then an equal and opposite order results in an empty order book.
  • Adding an order to an empty order book has it placed as either best bid or ask in the order book.

Let's first take a look at the sample code. An order is defined as a type that has a side, a quantity, a price, and a unique id. An action represents something that we wish to do to the order book. In this case, we can only place and cancel orders. The order book has various methods to construct an empty order book, obtain the best bid or ask, and to check if it is empty. Finally, it has the perform method that will perform the action given to an order book.

Each of the property based checks takes an argument of type Order.t, and FsCheck knows how to generate this as this is an F# record type. If you want to generate general objects such as instances of classes with mutable fields, in general, you will need a little more work. We will get around to this a little bit later in the article. In each of our tests, we have also attempted to specify the properties in terms of operations on the domain objects for increased readability.

Taming Randomness by Building Arbitraries

In some cases, you may have to specify how to build objects of the type you want. This is especially true if you happen to use value objects that have been defined as classes or structs in C#. This is where the concept of Arbitraries comes into play. Let's assume you have a very simple class representing a Currency object. While FsCheck, may generate random strings and pass them as arguments to the Currency constructor, these strings are random such as “‘\\X{}|X46s”, which may very well not be the value you would expect to go into a Currency. However, you can make your own Arbitrary and have FsCheck use that. Let’s have a look at a code snippet doing that.

In order to use your own Arbitraries, you have to set the Arbitrary field to an array of arbitrary generator types. These are classes that have static methods with signature unit -> Arb<’a>. The property-based test will then know where to look when it encounters one of those types. The second thing is defining the actual Arbitrary.

As mentioned in the start, Arbitrary objects consist of two different parts, a generator, and a shrinker. Here we want to use the generator from the original arbitrary, so we extract that and work our magic here. To do that we leverage two facts

  1. We know a finite number of valid values to sample from.
  2. FsCheck already knows how to generate random positive integers.

Now we first define an array of valid currencies, then we curb this random integer so it always falls into a valid index of our array of valid values. Then we pick that value from the array and pass it along into our constructor, which yields a generator that gives us random Currency objects with data from our valid range of currencies. Finally, we make an arbitrary with no shrinker attached to it from this generator.

The Gen module also has lots of other features such as filtering values, selecting many elements from the generator provided, etc. But one of the most important things to realize about generators is that they compose nicely. This is a very powerful concept, since that if either you or somebody else already have written Arbitraries for components of your new class, you can generate arbitraries for it with ease. For example, if you want to build random amount objects with only positive amounts and valid currencies, you can pull that off with the code below.

Notice how the property-based test only relies on the knowledge of the existence of an AmountArb type. Also, notice that AmountArb and CurrencyArb are two different types. You do not need to have all your Arbitraries as methods in the same class. When building random amounts, we use the filtering operator on the generator for the decimal to ensure we only get positive random values, and we get an instance of the currency Arbitrary and extract the generator from it. Finally, we combine the two generators, by using the Gen.map2 method. This takes a function that knows how to combine the output from the two generators, and all we have to do is to pass the arguments along to our Amount constructor. This gives us an Arbitrary that knows how to generate Amount objects.

Property-Based Acceptance Tests

Another place where property-based tests may come in handy, are the cases where you need a sort of acceptance test for some new code you are writing. Assume you have written a very simple piece of code, that you know is correct, but unfortunately is not quite fast enough for production. So you decide to try your luck writing a faster version of your algorithm, which may be slightly more involved or complicated. In this case, you can write a property-based test, where the property that must hold is that for any input, the result of the slow and simple algorithm must match the result of the faster and more convoluted algorithm. Then you crank up the number of randomized inputs high enough for you to proceed with confidence. If everything passes you should have a strong conviction that your faster algorithm behaves as well as its slower counterpart.

Now assume you have a black box in your code somewhere that you need to write a replacement for. This could either be an endpoint to a service somewhere, or a DLL that is linked into your program to which the source code has been lost, etc. Here the property you want to test is that for random inputs your implementation yields the same as the unknown reference implementation. If the service is external you may very well want to ensure that you are not DDoS’ing it by remote calls every time your tests are run.

Other Considerations When Testing With Properties

Property-based tests are useful but no silver bullet. They may be slightly harder to read for other developers maintaining the code base who are not used to doing property-based testing. When you decide on implementing your own Arbitraries and limit the possible values the randomizer may take, you risk leaving out important cases by being too strict, just as when writing unit tests. Property-based tests also run slower than unit tests, by default they generate 100 random inputs and test them. If you have lots of property-based tests, or you crank up the random inputs for some of them, your experience with concurrent test tools like NCrunch may degrade. Since you are dealing with randomness, that also means it is very unlikely that two test runs are completely identical. You may have one run passing and then one run failing. If the second run identifies an actual issue, then it is all good. You should write a specific unit test for that case, and then fix the issue. However if it is because one of your Arbitraries is generating values that fall far away from the expected input, it may be more difficult identifying and fixing the issue. This is because you first have to identify the Arbitrary yielding the unexpected values, and wrapping it in a filtering mechanism. While this is indeed possible, it is still more work than simply changing a constant value defined in a unit test.

Conclusion

Property-based testing is a very useful yet often overlooked tool. From a professional point of view, I have seen quite a few times that after introducing property-based tests on problematic parts of the code base, it very quickly revealed sometimes subtle issues, and got that part of the code base promoted from one of the usual suspects to one of the last places to look. However it does take some getting used to, and I hope this article has helped spark your interest, and overcome the initial hurdle of getting started with property-based testing. For more information, the FsCheck documentation is very much worth a look. This article has primarily used code examples in F#, but FsCheck is very much usable from C# as well. Feel free to let me know if you would like to see an article with a focus on FsCheck usage in C#.

--

--

Software developer, with a background working in the financial industry. Writes about software development, and other stuff I find interesting.