It ain’t just reds and greens: Automated Acceptance Testing and quaternary test outcomes
Although they seem simple enough on the surface, test outcomes are actually quite complicated beasts. Traditional unit tests, and basic TDD tests, have just two states, passing or failing, represented by red and green in the famous “RED-GREEN-REFACTOR” dicton. In Behaviour Driven Development (BDD), on the other hand, we have the additional concept of ‘pending’ tests: tests that have been specified (for example, in a Cucumber or JBehave story) but not yet implemented. When we report on test results, we need to be able to distinguish these three states, as a pending test has very different semantics to a failing test. Pending means it’s not yet done yet, but this may well be as expected, especially towards the start of a sprint. A failing test, on the other hand, needs fixing. Now.
Most BDD tools, such as Cucumber, JBehave, Concordion, easyb and so forth, report test results in terms of these three states. However, the complexity doesn’t stop here. Maintaining web tests, for example, requires ongoing effort, and can perturb the test reporting if not handed with care. For example, if a web page changes during normal development or refactoring work, the tests that use this page may break. Although good software engineering practices such as the use of Page Objects can reduce the risk of this quite a bit, and reduce the work involved in maintaining the tests when it does, it is still something that will happen regularly. And again, the semantics of a test that is broken is quite different to those of a failing test. A broken test needs maintenance work on the test suite. It may also mask an application error, but you will need to investigate to find out. A failing test means that the application is broken, and therefore needs urgent fixing.
In an attempt to address this limitation in conventional BDD reporting, Thucydides now distinguishes between test failures (triggered by an assertion error) from test errors (triggered by any other exception). When you run your automated acceptance tests using Thucydides, any error that triggers an AssertionError (or a subclass of AssertionError) will be considered a test failure. Anything else (such as the NotFoundException, when an element is not found on the page) is considered to be an error, and therefore indicative of a broken test.
In the future we may extend Thucydides further to make this concept more configurable: for example, so that users can provide exceptions that should be considered as either an error or a test failure, or even adding additional outcome states (e.g infrastructure failure, database not setup, etc.).