False test coverage

Code coverage is a tool used to make sure that the test suite actually tests the entire application. It does this by executing the test suite, and checking which parts of the code get used during the tests. Code that isn’t executed during the test, is marked as uncovered. High test coverage is typically considered a metric of code quality, many projects advertise the level of code coverage, even before explaining the purpose of the project.

High code coverage can be misleading, however. The coverage methodology doesn’t check if the code is actually being tested, it just checks if it is being executed. When a developer writes tests, they are expected to test all of the edge cases, and likely failure modes for the code under test. When a piece of code has some unit tests dedicated to it, and all of the code is being used by that unit test, we should assume that the code is being tested properly.

What does it mean if code has ‘unit tests dedicated to it’? Typically we express relations between code and its tests using file and symbol names. A function named process_foo(bar), may have a test named test_process_foo, and perhaps also test_process_foo_no_bar. These relationships are usually informal, to keep things readable, names may be abbreviated. Also, scenarios involving multiple items may have some group name, that’s only tangentially similar to the units being tested. As a result, there is no way check which test belongs to which piece of code.

False coverage

Consider this example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def squared(value):
    return value + value  # incorrect for demonstration purposes

class Square:
    def __init__(self, side):
        if side <= 0:
            raise ValueError('invalid dimension')
        self.side = side

    @property
    def area(self):
        return squared(self.side)

With these tests:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class TestSquare(unittest.TestCase):
    def test_quare_constructor(self):
        self.assertEqual(2, Square(side=2))
        with self.AssertRaises(ValueError):
            Square(side=0)
        with self.AssertRaises(ValueError):
            Square(side=-1)

    def test_quare_area(self):
        self.assertEqual(4, Square(side=2).area)

This code contains very big bug: the squared function is entirely incorrect. Still, this code has 100% coverage. The developer has foreseen all of the edge cases of the Square class. The real problem is, of course, that the developer failed to write tests for the squared function, and code coverage has failed to recognize that that function was not being tested, even though it was being used.

Especially utility or library functions are susceptible to this kind of bloated coverage numbers.

Solution 1: cluster your code

When you cluster your code and tests, you can run coverage on smaller clusters of code. On could, for example, test on a per-package basis, and invoke coverage with the --source argument, like this: coverage run --source myapp.foo nosetests tests.myapp.foo. This way, tests from one package, will not be considered valid tests for other packages. This works especially well if you have the habit of centralizing utility functions. The disadvantage is that this complicates the build process.

Solution 2: detailed white list

Package based exclusion is nice and all, but it is rather crude. Ideally, we’d want fine grained control over which tests apply to what code. To do this, we’ll need to tap into the internals of coverage. We can hijack the trace function, and only call it whenever our filter tells us we want this bit of code covered. This is somewhat complicated, so I factored it out into a python package named coverage_filter. It can be found on github and pypi.

The above example would look like this (assuming the Square class is stored in shape.py):

With these tests:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from coverage_filter import CoverageFilter

from shape import Square

class TestSquare(unittest.TestCase):
    @CoverageFilter('square.py:__init__')
    def test_quare_constructor(self):
        self.assertEqual(2, Square(side=2))
        with self.AssertRaises(ValueError):
            Square(side=0)
        with self.AssertRaises(ValueError):
            Square(side=-1)

    @CoverageFilter('square.py:area')
    def test_quare_area(self):
        self.assertEqual(4, Square(side=2).area)

Running nosetests --with-cover --cover-package shape will result in:

1
2
3
4
5
Name       Stmts   Miss  Cover
------------------------------
shape.py       9      1    89%
----------------------------------------------------------------------
Ran 2 tests in 0.007s

Which clearly show that one line is not covered (that’s the incorrect squared function), and implicitly invites the user to write specialized tests