Challenges with distributed systems

  • To take a simple example, look at the following code snippet from an implementation of Pac-Man. Intended to run on a single machine, it doesn’t send any messages over any network.
  • Engineers working on hard real-time distributed systems must test for all aspects of network failure because the servers and the network do not share fate.
  • Imagine trying to write tests for all the failure modes a client/server system such as the Pac-Man example could run into!
  • Let’s say an engineer came up with 10 scenarios to test in the single-machine version of Pac-Man. But, in the distributed systems version, they have to test each of those scenarios 20 times.
  • For each of those tests, you need to simulate what happens if the client received any of the four failure types (POST_FAILED, RETRYABLE, FATAL, and UNKNOWN) and then calls the server again with an invalid request.

