Kathleen Poulsen of Fidelity Investments gave a presentation at STAREAST 2017 sharing her experience using Hexawise to improve their software testing performance. Watch a 10 minute video with highlights of that talk:

We didn't really have what I'd call a scientific methodology to approaching the tests...

Our regression test suites were continuously expanding... We found there was a repition of tests.

We had 3 different projects that I will talk about that I feel like combinatorial or pairwise testing was the key to answering all of those problems.

Hexawise allows you to harness the power of combinatorial software testing with test plans designed to provide thourogh testing of interaction impacts on the software being testing. Hexawise provides more coverage with fewer tests.

All the teams that are using Hexawise can use that same file, they can talk to each other. [Another] thing I liked about Hexawise was the coverage chart... I go back to my business partners and say I am not running these tests. If they are important to you I add them back in with the click of a button. I love that... it was a game changer for me.

Using the Hexawise exporting options

the tests that we produced were converted into the given then when type scenarios automatically and when they are exported into excel you can use them to drive the Sellenium test automation framework. No additional work from us involved.

Using Hexawise's ability to create highly optimized test plans Fidelity was able to greatly reduce the number of tests while also greatly improving coverage.

We were able to reduce from 12,000 tests down to 600.

This type of result sounds amazing, and it is. But it is also what we find consisently from clients over and over. There are certain things people just cannot do well and designing test plans to cover incredible large numbers of interactions between test values and conditions is one of those things. Using highly optimized alogorithms to create test plans to cover these interactions in order to reliably create software customers will love is key. This also frees people to do what they do best.

Kathleen also discussed the significant improvement in communication within Fidelity that was brought about by using Hexawise.

The common language has become the test plan that comes out of Hexawise today.

Improving communiction is an area many organization see as important but finding concrete ways to achieve better communication is often difficult. We have designed Hexwise to aid the communication between stakeholders, including: software developers, software testers, product owners, help desk support staff and senior management.

The simplicity of this tool along with the way you can enter your parameters using the mind map tool, getting that coverage chart automatically out of it, having it export your data into a pretty commonly usable format - those are things that were teribly important to me. They gave me real value... I love that.

I can accomodate many differnt types of testing. We are testing at the class method level, at the services interface level, at the UI level...

Related: 84% of Software Defects Found in Production Could Have Been Found Using Pairwise Testing - Create a Risk-based Testing Plan With Extra Coverage on Higher Priority Areas - 2 Minute Introduction to Hexawise Software Testing Solution

By: John Hunter on Jun 9, 2017

Categories: Combinatorial Software Testing, Combinatorial Testing, Efficiency, Hexawise, Hexawise test case generating tool, Multi-variate Testing, Pairwise Software Testing, Pairwise Testing, Recommended Tool, Software Testing, Software Testing Presentations, Software Testing Efficiency, Testing Case Studies, User Experience

We are excited to announce an ongoing partnership with Datalex to improve software test efficiency and accuracy. Datalex has achieved extreme benefits in software quality assurance and speed-to-market through their use of Hexawise. Some of these benefits include:

  • Greater than 65 percent reduction in the Datalex test suite.
  • Clearer understanding of test coverage
  • Higher confidence in the thoroughness of software tests.
  • Complete and consistently formatted tests that are simple to automate

An airline company’s regression suite typically contains thousands of test cases. Hexawise is used by Datalex to optimize these test cases, leading to fewer tests as well as greater overall testing coverage. Hexawise also provides Datalex with a complete understanding of exactly what has been tested after each and every test, allowing them to make fact-based decisions about how much testing is enough on each project.

Hexawise has been fundamental in improving the way we approach our Test Design, Test Coverage and Test Execution at Datalex... My team love using Hexawise given its intuitive interface and it’s ability to provide a risk based approach to coverage which gives them more confidence during release sign-off.

Screen shot 2016 10 04 at 12.58.53 pm

Áine Sherry

Global Test Manager at Datalex

“As a senior Engineer in a highly innovative company, I find Hexawise crucial in regards to achieving excellent coverage with a fraction of the time and effort. Hexawise will also facilitate us to scale onwards and upwards as we continue to innovate and grow,“ – Dean Richardson, Software Test Engineer at Datalex.

By eliminating duplicative tests and optimizing the test coverage of each test case Hexawise provides great time savings in the test execution phase. Hexawise can generate fewer test scenarios compared to what testers would create on their own and those test cases provide more test coverage. Time savings in test execution come about simply because it takes less time to execute fewer tests.

Related: How to Pack More Coverage Into Fewer Software Tests - Large Benefits = Happy Hexawise Clients and Happy Colleagues

By: John Hunter on Nov 10, 2016

Categories: Business Case, Customer Success, Testing Case Studies

Since creating Hexawise, I've worked with executives at companies around the world who have found themselves convinced in the value of pairwise testing. And then they need to convince their organization of the value.

They often follow the following path: first thinking "pairwise testing is a nice method in theory, but not applicable in our case" then "pairwise is nice in theory and might be applicable in our case" to "pairwise is applicable in our case" and finally "how do I convince my organization."

In this post I review my history helping convince organizations to try and then adopt pairwise, and combinatorial, software testing methods.

About 8 years ago, I was working at a large systems integration firm and was asked to help the firm differentiate its testing offerings from the testing services provided by other firms.

While I admittedly did not know much about software testing then but by happy coincidence, my father was a leading expert in the field of Design of Experiments. Design of Experiments is a field that has applicability in many areas (including agriculture, advertising, manufacturing, etc.) The purpose of Design of Experiments is to provide people with tools and approaches to help people learn as much actionable information as possible in as few tests as possible.

I Googled "Design of Experiments Software Testing." That search led me to Dr. Madhav Phadke (who, by coincidence, had been a former student of my father). More than 20 years ago now, Dr. Phadke and his colleagues at ATT Bell Labs had asked the question you're asking now. They did an experiment using pairwise test design / orthogonal array test design to identify a subset of tests for ATT's StarMail system. The results of that testing effort were extraordinarily successful and well-documented.

Shortly after doing that, while working at that systems integration firm, I began to advocate to anyone and everyone who would listen that designing approach to designing tests promised to be both (a) more thorough and (b) require (in most but not all cases) significantly fewer tests. Results from 17 straight projects confirmed that both of these statements were true. Consistently.

 

Repeatable Steps to Confirm Whether This Approach Delivers Efficiency and Thoroughness Improvement (and/or document a business case/ROI calculation)

How did we demonstrate that this test design approach led to both more thorough testing and much more efficient testing? We followed these steps:

  1. Take an existing set of 30 - 100 existing tests that had already been created, reviewed, and approved for testing (but which had not yet been executed).

  2. Using the test ideas included in those existing tests, design a set of pairwise tests (often approximately half as many tests as were in the original set). When putting your tests together, if there are particular, known, high-priority scenarios that stakeholders believe are important to test, it is important to make sure that that you "force" your pairwise test generator to include such high-priority scenarios.

  3. Have two different testers execute both sets of tests at the same time (e.g., before developers start fixing any defects that are uncovered by testers executing either set of tests)

Document the following:

  • How long did it take to execute each set of tests?

  • How many unique defects were identified by each set of tests?

  • How long did it take to create and document each set of tests?*

 

*This third measurement was usually an estimate because a significant number of teams had not tracked the amount of time it took to create the original set of tests.

The results in 17 different pairwise testing "bake-off" projects conducted at my old firm included:

  • Defects found per tester hour during test execution: when pairwise tests were used, more than twice as many defects were found per tester hour

  • Total defects found: pairwise tests as many or more defects in every single project (despite the fact that in almost every case there were significantly more tests in the each original set of tests)

  • Defects found by pairwise tests but missed by traditional tests: a lot (I forget the exact number)

  • Defects found by traditional tests but missed by pairwise tests: zero

  • Amount of time to select and document tests: much less time required when a pairwise test generator was used (As mentioned above, precise measurements were difficult to gather here)

 

More recent project benefits have included these:

bcbs1

bcbs2

Those experiences - combined with the realization that many Fortune 500 firms were starting to try to implement smarter test design methods to achieve these kinds of benefits but were struggling to find a test design tool that was user-friendly and would integrate into their other tools - led me to the decision to create Hexawise.

 

Additional Advice and Lessons Learned Based on My Experiences

Once the testing the value of pairwise software testing at a specific organization it is very common to find the proponent of taking advantage of pairwise testing advantages to find themselves saying:

I have already elaborated some test plans that would save us up to 50% effort with that method. But now my boss and other colleagues are asking me for a proof that these pairwise test cases suffice to make sure our software is running well.

In that case, my advice is three-fold:

First, appreciate how your own thinking has evolved and understand that other people will need to follow a similar journey (and that others won't have as much time to devote as you have had to experience learnings first-hand).

When I was creating Hexawise, George Box, a Design of Experiments expert with decades of experience explaining to skeptical executives how Design of Experiments could deliver transformational improvements to their organizations' efficiency and effectiveness, told me "Justin, you'll experience three phases of acceptance and acceptance will happen more gradually than you would expect it to happen. First, people will tell you 'It won't work.' Next, they'll say "It won't work here." Eventually, he said with a smile, they'll tell you 'Of course this works. I thought of it first!'

When people hear that you can double their test execution productivity with a new approach, they won't initially believe you. They'll be skeptical. Most people you're explaining this approach to will start with the thought that "it is nice in theory but not applicable to what I'm doing." It will take some time and experience for people to understand and appreciate how dramatic the benefits are.

Second, people will instinctively be dismissive of pairwise testing case study after case study after case study that show how effective this approach has been for testers in virtually all types of industries and all types and phases of testing. George Box was right when he predicted that people will often respond with 'It won't work here.' Sometimes it is hard not to smile when people take that position.

Case in point: I will be talking to a senior executive at a large capital markets firm soon about how our tool can help them transform the efficiency and effectiveness of their testing group. And I can introduce them to a client of ours that is using our test design tool extensively in every single one of their most important IT projects. Will that executive take me up on my offer? I hope so, but based on past experience, I suspect odds are good that he'll instead react with 'Yes, yes, sure, if companies were people, that company would be our company's identical twin, but still... It won't work here.'

Third, at the end of the day, the most effective approach I have found to address that understandable skepticism and to secure organizational-level buy-in and commitment is through gathering hard, indisputable evidence on multiple projects that the approach works at the company itself through a bake-off approach (e.g., following those four steps outlined above. A few words of advice though.

My proposed approach isn't for the faint of heart. If you're working at a large company with established approaches, you'll need patience and persistence.

Even after you gather evidence that this approach works in Business Unit A, and B and C, someone from Business Unit D will be unconvinced with the compelling and irrefutable evidence you have gathered and tell you 'It won't work here. Business Unit D is unique.' The same objections may likely arise with results from "Type of Testing" A, B, and C. As powerful and widely-applicable as this test design approach is, always remember (and be clear with stakeholders) that it is not a magical silver bullet.

James Bach raises several valid limitations with using this approach. In particular, this approach won't work unless you have testers who have relatively strong analytical skills driving the test design process. Since pairwise test case generating tools are dependent upon thoughtful test designers to identify appropriate test inputs to vary, this approach (like all test design approaches) is subject to a "garbage in / garbage out" risk.

Project leads will resist "duplicating effort." But unless you do an actual bake-off stakeholders won't appreciate how broken their existing process is. There's inevitably far more wasteful repetition hidden away in standard tests than people realize. When you start reporting a doubling of tester productivity on several projects, smart managers will take notice and want to get involved. At that point - hopefully - your perseverance should be rewarded.

 

Some benefits data and case studies that you might find useful:

 

If you can't change your company, consider changing companies

Lastly, remember that your new-found skills are in high demand whether or not they're valued at your current company. And know that, despite your best efforts and intentions, your efforts might not convince skeptics. Some people inevitably won't be willing to take the time to understand. If you find yourself in a situation where you want to use this test design approach (because you know that these approaches are powerful, practical, and widely-applicable) but that you don't have management buy-in, then consider whether or not it would be worth leaving your current employer to join a company that will let you use your new-found skills.

Most of our clients, for example, are actively looking for software test designers with well developed pairwise and combinatorial test design skills. And they're even willing to pay a salary premium for highly analytical test designers who are able to design sets of powerful tests. (We publicize such job openings in the LinkedIn Hexawise Guru group for testers who have achieved "Guru" level status in the self-paced computer-based-training modules in the tool).

 

Related: Looking at the Empirical Evidence for Using Pairwise and Combinatorial Software Testing - Systematic Approaches to Selection of Test Data - Getting Known Good Ideas Adopted

By: Justin Hunter on Nov 21, 2013

Categories: Pairwise Software Testing, Software Testing, Software Testing Efficiency, Testing Case Studies, Testing Strategies

We have created a new site to highlight Hexawise videos on combinatorial, pairwise + orthogonal array software testing. We have posted videos on a variety of software testing topics including: selecting appropriate test inputs for pairwise and combinatorial software test design, how to select the proper inputs to create a pairwise test plan, using value expansions for values in the same equivalence classes.

Here is a video with an introduction to Hexawise:

 

 

Subscribe to the Hexawise TV blog. And if you haven't subscribed to the RSS feed for the main Hexawise blog, do so now.

By: John Hunter on Nov 20, 2013

Categories: Combinatorial Testing, Hexawise test case generating tool, Multi-variate Testing, Pairwise Testing, Software Testing Presentations, Testing Case Studies, Testing Strategies, Training, Hexawise tips

Attempting to assess the relative benefits of more than 200 software development practices is not for the faint of heart. Context-specific considerations run the risk of confounding the conclusions at every turn. Even so, Capers Jones, a software development expert with dozens of years of experience and nearly twenty books related to software development to his credit, recently attempted the task. He's literally devoted decades of his career to assessing such things for clients. We're quite pleased with how using Hexawise fared in the analysis.

Scoring and Evaluating Software Methods, Practices and Results by Capers Jones (Vice President and CTO, Namcook Analytics) provides some great idea on software project management. The article is based on the Software Engineering Best Practices with some new data is taken from The Economics of Software Quality (two of the books Capers Jones has authored).

Software development, maintenance, and software management have dozens of methodologies and hundreds of tools available that are beneficial. In addition, there are quite a few methods and practices that have been shown to be harmful, based on depositions and court documents in litigation for software project failures.

In order to evaluate the effectiveness or harm of these numerous and disparate factors, a simple scoring method has been developed. The scoring method runs from +10 for maximum benefits to -10 for maximum harm.

The scoring method is based on quality and productivity improvements or losses compared to a mid-point. The mid point is traditional waterfall development carried out by projects at about level 1 on the Software Engineering Institute capability maturity model (CMMI) using low-level programming languages. Methods and practices that improve on this mid point are assigned positive scores, while methods and practices that show declines are assigned negative scores.

The data for the scoring comes from observations among about 150 Fortune 500 companies, some 50 smaller companies, and 30 government organizations. Negative scores also include data from 15 lawsuits.

 

The article provides guidance, based on the results achieved by many, and varied, organizations with respect to software projects.

finding and fixing bugs is overall the most expensive activity in software development. Quality leads and productivity follows. Attempts to improve productivity without improving quality first are not effective.

 

This is an extremely important point for business managers to understand. Those involved in software development professionally don't find this surprising. But business people often greatly underestimate the costs of maintaining and updating software. The costs of bugs introduced by fairly minor feature requests to a system that doesn't have good software test coverage or test plans often create far more trouble than business managers expect.

This is especially true because there is a high correlation between software applications that have poor software testing processes (including poor test coverage and poor or completely missing test plans) and those application that were designed without long term maintenance in mind. Both deficiencies result of decisions made to minimize initial development costs and time. They both show a lack of appreciation for wise software engineering practices and software application project management.

The article discusses a complicating factor for accessing the most effective software development practices: the extremely wide differences in software engineering scope. Projects range from simple applications one software developer can create in a short period of time to massive application requiring thousands of developer-years or effort.

In order to be considered a “best practice” a method or tool has to have some quantitative proof that it actually provides value in terms of quality improvement, productivity improvement, maintainability improvement, or some other tangible factors.

Looking at the situation from the other end, there are also methods, practices, and social issues have demonstrated that they are harmful and should always be avoided. ... Although the author’s book Software Engineering Best Practices dealt with methods and practices by size and by type, it might be of interest to show the complete range of factors ranked in descending order, with the ones having the widest and most convincing proof of usefulness at the top of the list. Table 2 lists a total of 220 methodologies, practices, and social issues that have an impact on software applications and projects.

The average scores shown in table 2 are actually based on the composite average of six separate evaluations:

  1. Small applications < 1000 function points

  2. Medium applications between 1000 and 10,000 function points

  3. Large applications > 10,000 function points

  4. Information technology and web applications

  5. Commercial, systems, and embedded applications

  6. Government and military applications

The data for the scoring comes from observations among about 150 Fortune 500 companies, some 50 smaller companies, and 30 government organizations and around 13,000 total projects. Negative scores also include data from 15 lawsuits.

The scoring method does not have high precision and the placement is somewhat subjective.

Top 10 tools and practices listed in the article:

Practice Score
1. Reusability (> 85% zero-defect materials) 9.65
2. Requirements patterns - InteGreat 9.50
3. Defect potentials < 3.00 per function point 9.35
4. Requirements modeling (T-VEC) 9.33
5. Defect removal efficiency > 95% 9.32
6. Personal Software Process (PSP) 9.25
7. Team Software Process (TSP) 9.18
8. Automated static analysis - code 9.17
8. Mathematical test case design (Hexawise) 9.17
10. Inspections (code) 9.15

 

We are obviously thrilled that Hexawise is listed. We have seen the value our customers have achieved using mathematical based combinatorial software test plans (see several Hexawise case studies). It is great to see that value recognized in comparison to other software development practices and judged to be of such high value to software development projects.

The article makes it clear the importance of the results is not "the precision of the rankings, which are somewhat subjective, but in the ability of the simple scoring method to show the overall sweep of many disparate topics using a single scale."

The methodology behind the results shown in the article can be used to evaluate your organization's software development practice and determine opportunities for improvement. But, as stated above, software projects cover a huge range of scopes. The specific software project needs will drive which practices are most critical to achieving success for a specific project. The list in the article, of what practices have provided huge value and what practices have resulted great harm, is a very helpful resource but project managers and software developers and testers need to apply their judgement to the information the article provides in order to achieve success.

A leading company will deploy methods that, when summed, total to more than 250 and average more than 5.5. Lagging organizations and lagging projects will sum to less than 100 and average below 4.0.

 

The use of Hexawise has been growing; that has helped increase the number of software projects using best practices (that score 9, or higher), however as the article states there is quite a need for improvement.

From data and observations on the usage patterns of software methods and practices, it is distressing to note that practices in the harmful or worst set are actually found on about 65% of U.S. Software projects as noted when doing assessments. Conversely, best practices that score 9 or higher have only been noted on about 14% of U.S. Software projects. It is no wonder that failures far outnumber successes for large software applications!

 

A score of 9 to 10 for a practice means that practice results 20-30% improvement in quality and productivity of software projects.

Conclusion: while your individual mileage may vary, this report provides further evidence that using Hexawise really does lead to large, measurable improvements in efficiency and effectiveness.

We are very proud of the success of Hexawise thus far; as a new year starts we see huge potential to help many organizations improve their software development efforts.

The article includes a list of references and suggested readings that is valuable. Included in that list are:

DeMarco, Tom; Controlling Software Projects, 1986, 296 pages.

Gilb, Tom and Graham, Dorothy; Software Inspections, 1994, 496 pages.

Jones, Capers; Applied Software Measurement, 3rd edition, 2008, 662 pages.

McConnell, Code Complete, (I'm linking to the 2nd edition the article references the 1st edition) 2004, 960 pages.

 

Related: Maximizing Software Tester Value by Letting Them Spend More Time Thinking - A Powerful Software Test Design Approach - 3 Strategies to Maximize Effectiveness of Your Tests

By: John Hunter on Mar 18, 2013

Categories: Software Testing, Software Testing Efficiency, Testing Case Studies, Testing Strategies

Based on my experience, over dozens of pilot projects where we've gathered hard data, many software testers would literally more than double their productivity overnight on many projects if they used combinatorial test design methods intelligently (in comparison to selecting test case conditions by hand).

In this 10 project study, Combinatorial Software Testing Case Studies, we found 2.4 times more defects per tester hour on average when we compared testers who executed manually-selected test cases to testers who executed test cases created by a combinatorial testing algorithm designed to achieve as much coverage as possible in as few tests as possible.

How many participating testers thought they would see dramatic increases before they gathered the data? Almost none (even the testers told about the prior experiences of their other colleagues on similar projects). How many participating testers are glad that they took the time to use the scientific method?

  • hypothesis

  • experiment

  • evidence

  • revise world-view

Every one of them.

What stops more people from using the scientific method on their projects and gathering data to prove or disprove hypotheses like the one addressed in the study above? A pilot could take one person's time for less than 2 days. If past experience is any indication of future results (and granted, it isn't always), odds would appear pretty good that results would show that productivity would double (as measured in defects found per tester hour).

What's stopping the testing community from doing more such analysis? Perhaps more importantly, what is stopping you from gathering this kind of data on your project?

Additional empirical studies on the effectiveness of software testing strategies would greatly benefit the software testing community.

 

Related: Hexawise case studies on software testing improvement (health insurance, IT consulting and mortgage processing studies) - How Not to Design Pairwise Software Tests - 3 Strategies to Maximize Effectiveness of Your Tests

By: Justin Hunter on Mar 5, 2013

Categories: Combinatorial Testing, Efficiency, Pairwise Software Testing, Testing Case Studies, Testing Strategies, Experimenting