Engineering quality checks

April 27, 2026 · View on GitHub

This is part of a broader quality framework and forms a key part of the metrics section.

Summary

Quality checks are at the heart of good engineering, and are essential for rapid and safe delivery of software changes. This page provides an index of the various quality checks described within our principles, patterns and practices.

The checks are classified here based on the concerns they help to address:

Functionality
Security
Resilience
Maintainability

Usage

All applicable quality checks should be applied. Not all checks are universal, i.e. applicable in all contexts. For example, accessibility testing is only applicable to applications with a user interface.

The majority of these checks should be automated via continuous integration / continuous deployment: the optimal sequencing of these checks within CI/CD pipelines will depend on the project's branching strategy, deployment strategy, etc.

All of these checks are important, even where their purpose overlaps with other checks in the list. For example, comprehensive functional testing could be achieved without unit testing, instead only using the other functional test types on this list - this would result in a very long-running and inefficient test suite, precluding fast feedback and impeding rapid and safe delivery of software changes. For further details please see test practices, and especially the test pyramid.

Although we separate resilience checks into their notional components, when it comes to implementation these may be joined up where useful. For example, an automated test suite may test the application for peak load and then drop down to sustained load in order to conduct a soak test.

RAG scale

We rate our applications against each of these checks as follows:

Green = the quality check is applied frequently and consistently (in practice this typically means automated via continuous integration / continuous deployment), the output of the check is a quality gate (as opposed to just a warning / for information), and the tolerances for that quality gate (e.g. code coverage %) are agreed and understood.
Amber = the quality check is applied, but not all conditions for green are met - for example the check generates warnings that may or may not be acted on, or is executed on an ad-hoc basis and cannot offer a consistent quality gate guarantee.
Red = the quality check is not applied.
N/A = the quality check is not applicable.

Tracking progress

We recommend tracking progress on an Engineering Quality dashboard, for example:

Example Dashboard

Details

Quality check	Classification	Applicability	What it means	Why we care	Tolerances for green	Endorsed tools / configuration	Further details
Accessibility test	Other	Universal	The practice of making applications usable by as many people as possible.	It is a regulatory requirement that our applications are accessible by as many people as possible. Catching accessibility failures up front is essential to maximise the accessibility
API / contract test	Functionality	Contextual	Check whether the API interface adheres to the agreed contract	Any API interface is an integration point with another component or a software systems. An extra care has to be taken to ensure compatibility and stability of that integration are maintained so that we don't break applications that depend on our APIs	Builds fail if any tests fail	Postman	Automate Your API Tests with Postman
Capacity test	Resilience	Contextual	Identify the application's breaking point in terms of an increasingly heavy load. Degradation may manifest itself as throughput bottlenecks increasing response times * error rates rising	Without this test, we don't know how much load the application can handle before the application breaks or degrades
Chaos test	Resilience	Contextual	Cause failures in a system to test the resiliency of that system and its environment, and our ability to respond to failures	Give the team confidence that failures in a given environment will not lead to unplanned downtime or a negative user experience Ensures that the team has visibility (e.g. dashboards and alerts) to be able to identify issues Surface performance bottlenecks, weaknesses in system design and tipping points that aren't visible through other types of testing. Help the team to understand their mean time to recovery (MTTR) and to build muscle memory & confidence for recovery activities	Regular (at least every couple of months) game days, and: Builds fail if any test fails - note, these tests are slow, and are likely to be part of an infrequently-triggered (e.g. overnight) build The tests cover whether the system self-heals, auto-scales, and alerts as expected	aws-fis
Code review	Other	Universal	A second person manually checking a code change	Quality check by a human, as opposed to via a tool	Enforced & audited step within workflow	TBC	Code review guidance
Code smells	Maintainability	Universal	Check whether the software code adheres to the principles, patterns and practices of writing clean code	The code is written once but read and executed many times. If the code is not clean, the cost and risk of making software changes both increase exponentially over time	Must use SonarQube's default rules, profiles and gateways Build pipeline must fail if gateway is not met	SonarQube	Clean Code: Smells and Heuristics
Code coverage	Maintainability	Universal	The proportion of the application code which is executed (in this context: during testing)	The higher the code coverage, the more thorough the testing, and therefore the higher the likelihood of detecting functional issues early	Must use SonarQube's default rules, profiles and gateways Build pipeline must fail if gateway test coverage is not met For new code, must meet coverage specified in default Sonarqube gateway. For legacy code, coverage amount can be initially lowered to deal with historic tech debt. But must have a plan for increasing coverage over time.	SonarQube (in conjunction with testing frameworks)
Dead code	Maintainability	Universal	Detecting unused code and files that are not needed	Code is written once but read and executed many times. The more code you have, the greater the risk of somethign going wrong
Dependency scan	Security	Universal	Check for security issues and vulnerabilities in dependent areas of code that are outside of our direct control	Without this we have no way of knowing of any issues or security vulnerabilities of third party components that we are not responsible for	Must check against CVE database Must check dependencies of dependencies Must fail build if any High severity vulnerabilities are found It should be easy to determine why the build failed: which vulnerability it was, and in which top-level dependency Tools must include ability to exclude accepted vulnerabilities. These should include a date at which the exclusion expires and the build fails again. These should include a description of why they are excluded	One option is (other options are being added): dependency-check-maven
Duplicate code scan	Maintainability	Universal	Check whether the same code is used in multiple places	Duplicate code increases the cost and risk of making software changes and will increase exponentially over time	Must use SonarQube's default rules, profiles and gateways Build pipeline must fail if gateway is not met	SonarQube
Integration test	Functionality	Universal	Check interactions with other components and dependant systems. e.g. across microservices, authentication layers, database, third-party systems. Ideally includes full end-to-end testing across all components	When components are developed in isolation, it's vital that we regularly test them working together. Changes in one component can break the whole system	Builds fail if any tests fail
Performance test	Resilience	Contextual	Check whether application performance is acceptable at different levels of load. This may include: Baseline test (one-off) - to establish how the system interacts Smoke test - to establish that the key functionality is working before performing longer tests Regression test - run a suite of repeatable test cases to validate existing functionality Load test - to understand the system behaviour under an expected load	Without these tests, we don't know how load will affect the performance of the application, or whether existing functionality has been broken.	The performance of the system must be scored at build time so that it can be tracked Build pipeline must fail if performance does not meet the acceptable level	One option is to use APDEX to quantify performance to a numeric value, and to use this value to pass/fail the build pipeline	Performance test practices
Secret scan	Security	Universal	Check for secrets (e.g. passwords, API keys, certificates or tokens) accidentally included in software code	This protects us against accidentally leaking secrets in source code, commit history, logs and configuration, which could compromise the security of the application	Review the detection rules and exclusions regularly ... then: Full repository (including history) scan, and all secrets removed And: Local scanning to block commits containing the patterns And: Server-side scanning within the code repository for new commits containing the patterns	GitLeaks guidance
Security scan	Security	Universal	Check for indications of possible security issues (for example injection weaknesses)	This gives fast feedback about security issues. Code analysis is not as thorough as security testing in terms of finding complex weaknesses or issues that only manifest themselves at runtime, but it has much greater coverage. It's a better option for finding simple weaknesses and it's much quicker to execute. Security code analysis and security testing are both important to achieve rapid and thorough security testing.	If using SonarQube, must use SonarQube's default rules, profiles and gateways Build pipeline must fail if gateway is not met	One option is SonarQube. For the purpose of security code analysis, Developer Edition or higher is required as it includes advanced OWASP scanning.
Security test	Security	Contextual	Check for security issues (for example injection weaknesses)	More thorough than security code scanning, but much slower to execute, so both are important to achieve both rapid and thorough security testing
Soak test	Resilience	Contextual	Check whether sustained heavy load for a significantly extended period causes a problem such as memory leaks, loss of instances, database failovers etc.	Without this test, we don't know if application performance will suffer under prolonged heavy load, how stable the system is, how it performs without interventions.
Stress test	Resilience	Contextual	Check how the system performs under stress including a level load near the maximum capacity for a prolonged period sudden spikes in load with a lower baseline load	Without this test, we don't know if the application will begin to fail as a result of memory leaks, connection pool blocking etc. or will fail under a sharp increase in load triggered by adverts, news coverage or TV tea breaks
Tech radar check	Other	Universal	Checking that the tools to be used are in line with organisational / team standards	To prevent the unnecessary proliferation of a wide variety of tools and technologies, which would have a negative impact on overall effectiveness
UI test	Functionality	Contextual	Check that the user interface components behave as expected, particularly checking the visual elements to verify that they are functioning according to requirements	As the only aspects of software that end users come into contact with, it is essential that these elements behave as expected and allow users to get only what they need from our software applications	Builds fail if any tests fail
Unit test	Functionality	Universal	Logic tests for individual blocks of code, e.g. individual methods	This is the quickest (to execute) type of functional test, so these are essential to achieve both rapid and thorough functional testing	Builds fail if any tests fail	-	Test practices

Publishing code

All code should be treated the same and treated well (please see everything as code), but code that is being published (i.e. made available outside of NHS Digital, for example in a public repository on GitHub) incurs additional considerations.

For example, it's never good to include credentials and other secrets in source code, but the impact of this is obviously greater if the code is made available to the public.

Therefore, for published code, the following minimums are required:

Unit tests: GREEN
Integration tests: AMBER (where applicable)
API / contract tests: AMBER (where applicable)
UI tests: AMBER (where applicable)
Secret scanning: AMBER (including removal of any secrets)
Security code analysis and Security testing: AMBER for at least one of these
Dependency scanning: AMBER
Code coverage: AMBER
Duplicate code scan: AMBER
Code smells scan: AMBER
Dead code scan: AMBER
Code review: GREEN
Accessibility tests: AMBER (where applicable)