Code coverage is a tool used to make sure that the test suite actually tests the entire application. It does this by executing the test suite, and checking which parts of the code get used during the tests. Code that isn’t executed during the test, is marked as uncovered. High test coverage is typically considered a metric of code quality, many projects advertise the level of code coverage, even before explaining the purpose of the project.
High code coverage can be misleading, however. The coverage methodology doesn’t check if the code is actually being tested, it just checks if it is being executed. When a developer writes tests, they are expected to test all of the edge cases, and likely failure modes for the code under test. When a piece of code has some unit tests dedicated to it, and all of the code is being used by that unit test, we should assume that the code is being tested properly.
What does it mean if code has ‘unit tests dedicated to it’? Typically we express relations between code and its tests using file and symbol names. A function named process_foo(bar)
, may have a test named test_process_foo
, and perhaps also test_process_foo_no_bar
. These relationships are usually informal, to keep things readable, names may be abbreviated. Also, scenarios involving multiple items may have some group name, that’s only tangentially similar to the units being tested. As a result, there is no way check which test belongs to which piece of code.
Consider this example:
def squared(value):
return value + value # incorrect for demonstration purposes
class Square:
def __init__(self, side):
if side <= 0:
raise ValueError('invalid dimension')
self.side = side
@property
def area(self):
return squared(self.side)
With these tests:
class TestSquare(unittest.TestCase):
def test_quare_constructor(self):
self.assertEqual(2, Square(side=2))
with self.AssertRaises(ValueError):
Square(side=0)
with self.AssertRaises(ValueError):
Square(side=-1)
def test_quare_area(self):
self.assertEqual(4, Square(side=2).area)
This code contains very big bug: the squared
function is entirely incorrect. Still, this code has 100% coverage. The developer has foreseen all of the edge cases of the Square
class. The real problem is, of course, that the developer failed to write tests for the squared
function, and code coverage has failed to recognize that that function was not being tested, even though it was being used.
Especially utility or library functions are susceptible to this kind of bloated coverage numbers.
When you cluster your code and tests, you can run coverage on smaller clusters of code. On could, for example, test on a per-package basis, and invoke coverage with the --source
argument, like this: coverage run --source myapp.foo nosetests tests.myapp.foo
. This way, tests from one package, will not be considered valid tests for other packages. This works especially well if you have the habit of centralizing utility functions. The disadvantage is that this complicates the build process.
Package based exclusion is nice and all, but it is rather crude. Ideally, we’d want fine grained control over which tests apply to what code. To do this, we’ll need to tap into the internals of coverage
. We can hijack the trace function, and only call it whenever our filter tells us we want this bit of code covered. This is somewhat complicated, so I factored it out into a python package named coverage_filter
. It can be found on github and pypi.
The above example would look like this (assuming the Square
class is stored in shape.py
):
With these tests:
from coverage_filter import CoverageFilter
from shape import Square
class TestSquare(unittest.TestCase):
@CoverageFilter('square.py:__init__')
def test_quare_constructor(self):
self.assertEqual(2, Square(side=2))
with self.AssertRaises(ValueError):
Square(side=0)
with self.AssertRaises(ValueError):
Square(side=-1)
@CoverageFilter('square.py:area')
def test_quare_area(self):
self.assertEqual(4, Square(side=2).area)
Running nosetests --with-cover --cover-package shape
will result in:
Name Stmts Miss Cover
------------------------------
shape.py 9 1 89%
----------------------------------------------------------------------
Ran 2 tests in 0.007s
Which clearly show that one line is not covered (that’s the incorrect squared
function), and implicitly invites the user to write specialized tests