What is the minimum acceptable code coverage

How do you check if the code was auto-covered?


I'm in the process of setting up a Bamboo server for some new projects to push to TDD in a CI / CD workflow. Sure, unit tests are great, but only logged as they are.

This may be better in a GIT pre-receive hook for specific branches (e.g. development and major release branches), but how should code coverage be enforced, if at all. I'm glad I can trust the committers to make sure the code is covered. But how are these things maintained without deviating from care and consistency?

In short, I want to see others enforce test coverage as an automatic process during the commit or build phase.

Reply:


You shouldn't automatically enforce code coverage.

This is like forcing the maximum lines of code per method: OK, most methods should be less than say 20 LOC, but there are valid cases where methods would be longer.

In the same way, targeting a certain percentage of code coverage per class can have undesirable consequences. For example:

  • Boilerplate code classes or classes created by code generators may not be tested at all. Forcing developers to test them has no benefit and is a significant cost in terms of time spent doing it.

  • Simple code handling of unimportant parts of the application does not necessarily have to be tested.

  • In some languages, part of the code cannot be tested. I had this case in C # with anonymous methods in a library where I really have 100% code coverage wanted . These cases can be demoralizing for developers.

More importantly, the Code coverage should be proportional to two aspects of the code: how critical and how complicated it is :

  • Some of the code with complicated logic that is part of the main feature of an application is better tested carefully because errors or regressions can have important consequences.

  • Simple code that handles a function that nobody uses might have basic tests that only cover basic cases.

Of course you can Continue to use code coverage as a measure , especially to compare how different teams are achieving code coverage: There may be teams that are less disciplined and less reluctant to test. In these cases, you may want to combine this metric with others, such as: For example, the number of bugs, the time spent fixing bugs, or the number of comments made during code review.

You may also want to work with individual projects for whom this makes sense (be sure to exclude prototypes, generated code, CRUD, etc), at least one certain Forcing code coverage (e.g. 60% ¹) so that developers can mark certain classes as excluded from code coverage is nice too². In this case, this can be done as part of a verification that will fail a build if the code coverage is below the minimum required. I would do it in the build phase, not the commit phase as you are not expected to run unit tests during commit.


¹ I would consider 60% a reasonable minimum based on my code base: Almost any project or class with less than 60% code coverage is real untested . This can vary widely from language to language and from company to company (in some companies, 0% is a standard). Discuss what with your team normal and what for her is too high . Maybe they hit 95% all the time and can easily hit 99%, or they may have difficulty increasing their code coverage from 70% to 75%.

² Given that code reviews will reveal potential misuse, don't be afraid to give developers this option. This is similar to being able to exclude some parts of the code from being checked by the linters or style checkers. JSLint, StyleCop, and Code Analysis are three examples where exclusion is supported and actually useful without encouraging abuse.






Consider the following code:

There is no way to create a test case that reaches that other branch. However, if this were safety critical flight software, people would be in the whole author's case if this protection against sending a negative value were not in place. Typically, the calculation and extraction of the square root are separated by several lines of code. In the meantime, a cosmic ray can turn on the sign bit.

In fact, the flight software has called the equivalent of multiple times. Typically, you need to move control to a redundant computer, gracefully shut down the suspect computer, restart the suspect computer, and finally take the suspect computer as a backup. This is way better than the primary flight computer going insane.

Even in flight software, it is impossible to achieve 100% code coverage. The people who claim to have achieved this either have trivial code or they don't have enough tests against these impossible events.





Test coverage is a useful measure of the overall health of your project. With high test coverage, you can make an informed decision about whether the software will perform as expected when deployed. with low test coverage, it means you are only guessing. There are tools to automatically measure coverage. These usually work by running the program in a debugging context or by inserting accounting operations into the running code.

There are different types of tests and different types of coverage metrics. Common coverage metrics include function coverage, instruction coverage, branch coverage, and condition coverage, although there are more.

  • Unit tests check whether the implementation of a conceptual unit (module, class, method, ...) conforms to its specification (in TDD, the test is the specification). Units without their own unit tests are a red flag, although they may be covered by integration-style tests.

    Unit tests should imply almost complete functional coverage. Since the unit test controls the entire public interface of this unit, there should not be any functions that are not affected by these tests. When introducing unit testing into an existing code base, feature coverage is a rough indicator of progress.

    A unit test should aim for good coverage (75% –100%). Instruction coverage is a quality metric for a unit test. Full coverage is not always possible, and you can probably better use your time than improving coverage past 95%.

    Branch and state coverage are more difficult. The more complicated or important a code, the higher these metrics should be. With unspectacular code, however, a high level of coverage of the instructions is usually sufficient (and this already implies a branch coverage of at least 50%). Looking at a unit's health report can help create better test cases.

  • Integration tests check whether several units can work together correctly. Integration tests can be very useful without scoring high on a coverage metric. While integration tests usually occupy a large part of the interfaces of their unit (ie have a high functional coverage), the internals of these units have already been covered by the unit tests.

It is a good idea to run tests before merging code into a main branch. However, computing the test coverage metrics for the entire program is a long process - this is a good job for a nightly build. If you can figure out how to do this, a good compromise is to just run modified tests or unit tests on modified units in a git hook. Test failures are not acceptable for anything other than work in progress commits. When selected coverage metrics fall below a certain threshold (e.g., coverage below 80% or new methodology being introduced without appropriate testing), these issues should be treated as a warning with the ability for the developer to address these potential issues. However, sometimes there are good reasons to ignore these warnings, and developers should be able to do so.

Testing is good, but too much of it can get annoying. Quick, relevant feedback can help draw attention to quality, but you don't want it to get in the way of adding value. Personally, I prefer to run tests manually as it gives me quicker feedback on the part I'm working on. Before release, I'll focus on quality, using static analysis, profilers, and code coverage tools to find problem areas (with some of these steps being part of a pre-release test suite).


Nobody mentioned mutation testing. The idea behind it is very practical and intuitive.

They work by randomly changing the source code (e.g. toggling ">" to "<") - hence mutation - and checking whether these random changes break a test.

If not, either a) the code in question may not be required, or b) (more likely) that code will not be covered by a test because the code breaking goes undetected.


Code coverage data can of course be obtained automatically, but for reasons others have discussed, automatic decisions should not be made based on that. (Too blurry, too much room for error.)

However, the next best thing is an established process where the current status of the project in terms of code coverage is regularly checked by people, possibly with daily reports arriving in the project manager's inbox.

In corporate environments, this is done with tools for the continuous integration like Hudson, Jenkins etc. achieved. These tools are configured to periodically check out the entire project from the source code repository, build, run the tests, and generate reports. Of course, they can be configured to run the tests in code coverage mode and include the results in these reports.

Jetbrains also does TeamCity, which seems a bit easier to me and is suitable for smaller software shops.

For example, the project manager receives regular code coverage reports, uses his own judgment, and acts as an enforcer when necessary.



The code coverage can be checked automatically despite the general opinion. Rational's Purify tool suite included a code coverage feature. It relied on instrumenting all of the functions (it worked on the binaries, updated each function, or called with some extra code) so that it could write out data which was then displayed to the user. Pretty cool technology, especially at the time.

But even if we tried really hard to get 100% coverage, we only got about 70%! So it's a pointless exercise.

In the situation of writing unit tests, however, I consider 100% coverage of unit tests to be even more pointless. Unit test the methods that require unit test, not every getter or setter! Unit testing should be about checking out the tricky features (or TBH classes) and not trying to tick boxes in a process or tool that shows nice green checkmarks.


I built a tool for it

https://github.com/exussum12/coverageChecker

Using

Fails if less than 70% of the diff is covered by unit tests.

Get the diff through.

Suppose you have branched from the master and are merged back to the master

Ignore the phpunit flag in the code, it's really just a Klee check so anything Klee can output can use it.

As other answers have suggested, setting this to 100% is not a good idea

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.