To take a simple example, look at the following code snippet from an implementation of Pac-Man. Intended to run on a single machine, it doesn’t send any messages over any network.
Engineers working on hard real-time distributed systems must test for all aspects of network failure because the servers and the network do not share fate.
Imagine trying to write tests for all the failure modes a client/server system such as the Pac-Man example could run into!
Let’s say an engineer came up with 10 scenarios to test in the single-machine version of Pac-Man. But, in the distributed systems version, they have to test each of those scenarios 20 times.
For each of those tests, you need to simulate what happens if the client received any of the four failure types (POST_FAILED, RETRYABLE, FATAL, and UNKNOWN) and then calls the server again with an invalid request.