“Did you test your code?” It’s a simple question, and one that may seem to have a simple answer.
But in practice, the answer is not simple. Software testing takes many forms, it should occur at multiple stages in a development project, and it varies in its activities. In fact, it has evolved into modern practices of test-driven development (where you define how to test before you build) and continuous testing (where you never stop testing—autonomously if possible). Automating these tests is key as it frees you up to continue innovating instead of manually testing.
Let’s explore the various software testing strategies and methods available today, and explain why you might use one approach over another in a given context.
FIRST, UNIT TESTING
Unit testing is usually the most granular form of testing in software development in terms of both focus and complexity, and occurs at the earliest stages of development. You focus on relatively small portions of an overall system, its code, and its functionality. As part of this, the goal is to test software components (a class method, an interface definition, function, stored procedure, network call, and so on) in isolation. A good approach is to keep it simple, limit or eliminate dependencies, and test one thing at a time. This often involves writing driver code to emulate other components, mock objects or interfaces, and any other stub code needed to test with.
When I began as a professional software developer around 1993, I read the book Writing Solid Code by Steve Maguire. In it, he describes the need to step through every line of code, including error/exception handling code, even if you have to force error conditions to ensure that it’s never executed in production for the first time. I agree with this approach, and step through as much code as I possibly can, given all of the constraints involved.
UNIT TEST FRAMEWORKS
We’ve come a long way since 1993. Today there are test frameworks such as JUnit, TestNG, RayGun, and so on, or cloud-based tools such as Sauce Labs, to help formalize and automate your unit tests. Whereas unit tests were once little more than a check mark in a to-do list, today they execute continuously as code is written, using sophisticated tools. This can go a long way towards building confidence in terms of software quality, starting from an early stage of development.
Sometimes called smoke tests, sanity checks test for obvious results, and are often used to ensure software works at the most basic level. Examples include compiling a “Hello World!” source file to prove a compiler toolchain works, or the use of ping to ensure basic networking is in place. If a sanity check fails, then you need to fix the source of the issue before continuing.
Other times, sanity checks are used to uncover anomalies in an attempt to isolate the true cause of a failed test. This can mean rolling back to a last known good version of a software component, or undoing a change that otherwise seems innocuous. Sanity checks can reveal that something else—usually as an unintended side-effect of a development process—has changed when it either should not have or was thought not to have been included. On a more positive note, sanity checks are best used as a basis or comparative, ensuring that obvious results are returned in simple test cases before layering more complex unit tests or integration tests into the mix.
Moving beyond the individual component tests in the unit test phase, integration tests involve combining (hence, integrating) multiple components to ensure they work together as intended. Although more than one component or even system may be involved, isolated testing is still a good approach to integration testing.
For example, say you have a component that’s meant to upload data via HTTP, FTP or other transfer method to another server. A good strategy is to test the integration process in phases, first making sure that basic connectivity can be achieved; next, that a secure transfer process can be initiated, and finally, that the resulting file is in place and validated on the remote server.
This bottom-up approach is in contrast with other approaches such as big-bang, otherwise known as “The Hail Mary Pass,” where you test integration by assuming the best. I once worked for a manager whom—when I would begin a methodical check of whether two components would integrate well for the first time—would tell me to “just let it rip” in an attempt to see if it works. Unfortunately, it rarely does.
Although integration testing strategies may vary, they are most successful when results are carefully tracked and recorded to ensure that nothing is missed, and that one of the components involved isn’t wrongly accused of failing. Wrong assumptions at this stage of software development can begin to get costly in terms of time and effort expended.
As confidence in the quality of individual components and even groups of components grows, it’s time to begin testing the functionality of the system being built or enhanced. This means ensuring that requirements are met by using a form of black-box testing, where a specific set of functionality (and hence a subset of the system) is tested in terms of its behavior.
When existing systems are enhanced or modified in some way, for any reason, tests are performed to ensure that existing functionality works as expected. A form of functional testing, regression testing verifies that select areas of the system continue to work as expected. Because it may not be feasible, cost-effective, or efficient to run every test on a complete system when just one component is modified, regression tests are subsets of tests that are carefully selected to represent confidence in an area of the system if passed. The name is something of a misnomer, in that regression testing is meant to prove that the system still works, and has not regressed in any way.
When it comes to technology and software, performance is a relative term. What was once considered “fast” may eventually be thought of as slow relative to newer technology. Therefore, software performance should be expressed quantitatively and meaningfully, and its related testing should aim to validate this through precise measurements.
Performance not only applies to back-end servers and network infrastructure, but also to front-end performance. In fact, the best forms of performance tests are those that test systems end-to-end, from the UI to the server, and more. Beyond overall system responsiveness, performance testing can also test and measure system stability and reliability. Improving a system’s responsiveness in exchange for overall stability, for example, may not pass performance testing criteria. Additionally, a system that performs well for a single transaction may not fare as well when used at scale, so performance tests also need to include these cases.
More importantly, it’s often not enough to say a system should “scale well,” or be “responsive to the user.” Instead, these tests should use real quantities that can be measured. (For example, testing that a server can handle 10,000 simultaneous requests and respond to each within 100ms, or that UI latency to user touch should not exceed 205ms, will yield more reliable results.)
Finally, UI performance testing can be automated and non-intrusive to the system being tested, ensuring that your application remains responsive and pleasant to use under the widest variety of conditions, for a number of key performance characteristics. These include scroll performance, visual latency and lag, tracking of user interactions (especially where drawing, dragging, or multi-touch gestures are involved), and communications with back-end systems.
There are many offshoots of performance testing that need to be considered, such as:
- Load Testing: measuring system behavior under expected usage scenarios.
- Stress testing: measuring the effects of extreme system load (beyond expected usage).
- Network testing: measuring network infrastructure and communications (including or simulating global WAN connections) in performance tests and results.
- Breakpoint testing: determining the precise threshold of system load across components where the system either fails to meet user responsiveness expectations or falls over entirely. The goal is to make the system fail in some way.
- Soak testing: performing load or stress testing over a significant amount of time to uncover areas of instability, such as memory leaks. This may be considered a form of breakpoint testing, although the goal is usually to ensure the system survives the soak test.
The types of software testing discussed to this point are usually performed by the company developing them. Usability testing, however, involves the intended users of the system. The goal is to uncover a wide range of possible defects, including:
- Incomplete feature implementation
- Missing features
- Features or UI elements that are non-intuitive, difficult to use, clunky, or otherwise confusing
- Poor front-end performance (e.g. mobile application loading time, or UI responsiveness)
- Poor end-to-end performance (e.g. slow system response)
- Overall attractiveness (i.e. user interfaces should be pleasing to look at and use)
As patterns emerge in people’s responses to using new software, they’re distilled into improvements that are made iteratively and fed back into the usability testing process.
Of all of the types of tests described, in my opinion, acceptance testing is often overlooked, poorly defined, or too lightly covered academically. Good engineers with an excellent testing process may be unprepared when it comes to proving a given system or piece of software meets the terms of a contract or specification that was previously agreed upon. This type of testing aims to demonstrate and prove that a system meets the needs of the intended users within defined constraints.
The tedious nature of this activity requires preparation, and behind-the-scenes data gathering and number crunching (to prove, for example, that load tests, among others, pass and meet expectations), and a degree of showmanship as system functionality is demonstrated.
CONCLUSION: CONTINUOUS TESTING
Have you noticed that I haven’t mentioned anything about a quality assurance (QA) team in this article so far? That was on purpose. Testing isn’t something that’s performed after software is thrown over the wall to a QA team separate from development (although QA personnel still play a very important and valuable role in software delivery). Software testing should be a far more fundamental activity to system development than most consider it to be. In fact, even at an executive level, testing should be front and center as software flaws equal lost users, and lost revenue.
I used to go by the rule that you need to define tests along with requirements. After all, if you cannot specify how to test it, how can you be sure you’re accurately describing what needs to be built? I’ve modified this to define my main role as not a software developer, but a software deliverer. Since delivery implies it works, this also means you’re primary role is to continuously test as you’re developing.
Is it possible to deliver bug-free code as part of systems that don’t fail? Both Maguire and I agree that it is. I believe that the key to this is through continuous (automated) testing, the philosophy that you always need to be ready to deliver software, and that testing should drive your process. A little help from some tools never hurt, either.