- Supertype
- Product & Services
- Portfolio
- Computer Vision
- Custom BI Development
- Managed Data Analytics & Development
- Programmatic Report Generation
Analytics Products & Services
Data analytics and data engineering services
Data Science by Applications
Implementations of data science in various industries
Bespoke solution for enterprises
Advisory & Consulting
Portfolio & highlights
Curation of featured projects and enterprise work
- Articles
Data Engineering
Technical articles by data engineers & automation developers solving real-world problems
Data Science
Articles and first hand observations by data scientists & analytics experts in the field
Full-Cycle Data Science Consultancy
Data Science & Analytics Consulting
- About
- Contact
Software testing isn't quality assurance
Why hiring software developers to write great software tests (unit tests) are not the same as performing quality assurance.
Samuel Chan, Software Engineer
Good Software Testing
Edsger Wybe Dijkstra is an influential computer scientist. He was perhaps most popularly known for his conception of the Dijkstra’s algorithm (pronounced “dike-struh”) which finds the shortest paths between nodes in a graph. It has wide-ranging applications, from areas like logistical and road networks to the less physical: digital forensics, social studies (links between terrorists; government structures) and counter-fraud intelligence. In 1972, he became the first person who was neither American nor British to win the Turing Award.
Program testing shows the presence, but never the absence of bugs
— Edsger Wybe Dijkstra
On the topic of software testing, Dijkstra once quipped that program testing can be used to show the presence of bugs, but never to show their absence.
And such is the strength of this axiom, that it presents somewhat of a cyclical paradox in and of itself. Software engineering teams write good, exhaustive tests but the most valuable tests should reveal flaws in the software design. If the software engineers were able to write tests that reveal these bugs, wouldn’t they have the same mental clarity to prevent the bugs from creeping in in the first place?
What the axiom posits, is that software teams (data scientists, data engineers, software developers etc) cannot prove the absence of bugs, much like you cannot prove the absence of black swan — you can merely falsify it (by spotting the first black swan, in 1697 Western Australia and hence falsifying that presumption).
How to write good tests
A common pattern of software testing is to first check for failed tests, perform code changes that address these failures, and re-run the test suite. This routine implicitly assumes that the software developer is fully aware of all conditions that will trigger an unexpected behavior. It assumes that all such conditions are being tested.
Consider the following, adapted from a command-line tool: TaskQuant that I wrote:
def score_accum(task_path, verbosity=False):
"""
Create a scoreboard using 'score' attribute of tasks
"""
tw = TaskWarrior(data_location=task_path)
completed = tw.tasks.completed()
total_completed = len(completed)
cl = list()
for task in completed:
cl.append(
(
task["project"],
task["end"].date(),
task["effort"] or "",
task["score"] or 0,
task["tags"],
)
)
# sort cl by ["end"] date
cl_sorted = sorted(cl, key=lambda x: x[1])
agg_date = [
[k, sum(v[3] for v in g)] for k, g in groupby(cl_sorted, key=lambda x: x[1])
]
agg_date_dict = dict(agg_date)
startdate = agg_date[0][0]
enddate = agg_date[-1][0]
The function above will rightfully and expectedly fail if any of the conditions were met:
- any
task
incompleted
has less than 4 attributes; thesum(v[3])
will fail with anIndexError: list index out of range
- any
task
incompleted
has a misingproject
,end
ortags
;cl.append((task['tags']))
will raise aKeyError: 'tags'
- if the
end
attribute isn’t adatetime
type, it will fail with anAttributeError: 'str' object has no attribute date'
- if the fourth attribute isn’t a numeric,
sum(v[3] for v in g)
will fail with aTypeError: sum() can't sum strings [use ''.join(seq) instead]
verbosity
is not used in our function above, so it does not matter. Buttask_path
must point to a valid path and if it isn’t we should expect to raise aNotADirectoryError
All of the above conditions will throw an exception and fail immediately. What’s less obvious are bugs that do not throw an exception (fail quietly) or logical errors that we’ve made that were not caught. For example, consider:
cl_sorted = sorted(cl, key=lambda x: x[1])
We expect the above to sort the list based on the second key (index 1), which refers to date
. If we incorrectly passed in index 2, it would have used effort
as the key and the program would execute to the end returning an incorrect aggregation and both the end-user and us would be none the wiser.
Supposed you write tests to account for the cases we identified above, each with their own corresponding assertions. We supply a path to our sample task archive through the task_path
, run the test suite, and wait for the much coveted green dots in pytest
(a green dot represent a passing test):
===================================== test session starts =====================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/samuel/writing-good-tests
plugins: xdist-1.34.0, hypothesis-6.27.3
collected 5 items
tests/application.py .....
====================================== 5 passed in 0.85s ======================================
Hurray! 🎉
We tested all possible scenarios where our tool can possibly break. We have, right?
Not so quick. We’ve merely verified that our tests performed as expected under those very defined circumstances. Just like you can’t present evidence of “no black swans” — you merely disprove it by waiting for somebody to spot the world’s first black swan; As the saying goes, an absence of evidence should not be taken as evidence of absence. And that is how I interpret the essence of Dijkstra’s position on software testing.
At a minimum, consider the following. Is there any other conditions where our tool may have failed with an exception that isn’t expected by our test conditions yet?
There is of course, the fact that any code changes to make to address a bug will introduce new scenarios — and we have to write tests for these scenarios as well. Then there is the second, more undesirable situation that we may also overlooked conditions that lead to our programming throwing an error that we didn’t catch. One such scenario is this:
- if
total_completed
is 0 (the user has no completed tasks),startdate = agg_date[0][0]
will fail with anIndexError: list index out of range
This isn’t a scenario that we’ve accounted for in the original test. We have 5 passed tests out of 5, but none of them addresses this scenario. This isn’t the case where a user has passed an invalid path (NotADirectoryError
), this is the case where a user has passed in a valid location, with a legitimate source of data, but the user simply has not completed any task yet.
Identifying this, we should then add a 6th unit test, and make the following code changes:
import warnings
def score_accum(task_path, verbosity=False):
...
if total_completed < 1:
return warnings.warn(
f"A curious case of 0 completed tasks. Check to make sure the path to Taskwarrior's .task is set correctly or try to complete some tasks in Taskwarrior!",
)
Quality assurance is not provable
What all of this goes to show, is that software developers can write tests to:
- Check for the existence of bugs
- Check for the absence of known bugs
- Check that a code change has adequately address the bug based on known conditions
And no amount of software testing can:
- Prove the absence of known bugs
- Prove the absence of unknwon bugs
- Prove that a code change has entirely eliminated the bug
So — the goals of writing great software tests are quite distinctly different from the objectives of quality assurance.
Good software developers know what to test, and what not to. A developer didn’t have to test that cl_sorted
is a list
before feeding that through to the groupby
. We know it is a list from the upstream operation. The following line of you code in your test is hence unneccessary:
assertIsInstance(cl_sorted, list)
If you’re interested in this topic, I have a video where I walk you through how I write unit tests for a Smart Contract. I walk you though the Solidity code line by line (familiarity in Solidity or Ethereum Smart Contract not required), and we write python tests (using pytest
) for a 100% coverage, handling expected exceptions as we go. Here is the video:
If you want to see more on building and publishing TaskQuant:
- Build w/ Python 1: Command line productivity scoreboard for TaskWarrior
If you need help with software engineering, or any data science work for your project, reach out and we’d love to explore ways on working together.