I run into the same problem quite often: people have a hard time distinguishing between good and bad tests.
Bad tests are those that don’t make the tester learn as much as possible with each test step. Said another way: bad tests are repetitive.
Repetitive tests are time wasters. They make testing a mundane task to be completed. Repetitive tests don’t emphasize the complexity inherent in most systems today. And they let things (read: bugs, defects, faults) slip through into production.
Question: How bad are bad tests?
Research: (not much out there)
Hypothesis: Bad tests are deceptively bad.
Experiment:
I took some tests that some testers all agreed cover the functionality that needed to be covered for the story they were testing.
I sat down and analyzed them. Repetitive as any I’d seen. I sifted through them to pull out the main testing ideas. These I would use as parameters and values. I entered these into Hexawise.
I used our Requirements feature to lock-in their repetitive tests. So Hexawise would be forced to use their tests first (this would allow me to model how many interactions each of their tests covered)
This is the chart Hexawise produced. In 30 tests, they covered 47% of the total possible pairwise interactions.
Before we go on: Do you notice that there are little plateaus? From Test 5 to 6, 7 to 8, 14 to 15, 20 to 21, 22 to 23, 26 to 27, and 29 to 30? That means those were tests that did not include ANY new pairs. No new information could have been learned from those tests. Learning literally plateaued. 7 times. Nearly 1 out of 4 tests didn't teach the testers anything.
Then I unleashed the Kraken Hexawise.
I removed those Requirements. See how many tests Hexawise needs to cover all of the interactions in this specific functionality.
Okay, to be honest, I wanted Hexawise to do it in like 20 tests. (More Coverage. Fewer Tests.) But it used 30 (More Coverage). BUT (and this is a big BUT snickers) Hexawise covered 100% of the pairwise interactions in 30 tests.
No one would have guessed their tests were that bad by just reading through them. They looked perfectly fine. They read like good test cases. But as we started to visualize their coverage, we saw that perhaps they weren't achieving all they could. And when we compared bad tests to Hexawise tests, more coverage (in the same amount of tests) is a clear winner.
In short: