How to use the framework?

July 17, 2020 · View on GitHub

The first step will be to do a self-assessment of the current status of your Product Team for each one of the identified capabilities.
Define the desired end-point at the end of the next improvement cycle, a cycle can be a month, a quarter, a semester ... every team can define their improvement cycles although a good start would be to set quarterly targets to be able to define meaningful actions.
Identify the actions you will need to achieve the desired end-point.

DEVELOPMENT

CAPABILITY	CRAWL	WALK	RUN
Use version control for all production artifacts	No version control	Source code or other assets under version control	Source code or other assets under version control and all production artifacts versioned and stored in the corresponding artifact repository
Automate deployment processes	Manual deployment process	Partially automated deployment process	Fully automated deployment process
Implement test automation	Manual test script execution	Partially automated testing (unit or regression or performance tests)	Fully automated testing (unit and reliability (regression and performance tests)
Implement infrastructure automation	Manual deployment process	Partially automated deployment process. Provisioning is done by the teams	Fully automated deployment (infrastructure-as-code). Platform Engineering provides base images
Support test data management	No test data management	Partially automated test data management (e.g. manually triggered import and export of test data)	Fully automated test data management incl. strategy (e.g. consumer data only in PROD)
Implement continuous delivery	No continuous delivery	Partially automated delivery pipeline (e.g. automated build, test process with the manual deployment)	Fully automated pipeline (automated build, test, deployment across environments)
Include NFR’s in Definition of Done	No NFR's used	Ad-hoc NFR checks	Standardised NFR checklist as acceptance criteria for successful releases
Shift left on security	No security aspects considered during development cycle	Security aspects considered during development cycle but shifted towards release (not a priority)	Security aspects included during development cycle from the very start
Build for resilience	No resilience build into system	Design infrastructure and code for failure	Design infrastructure and code for failure with fully automated error recovery (self-healing)
Enable team for troubleshooting	No control over development lifecycle (e.g. access to PROD)	Team has full control over development lifecycle (e.g. access to PROD), but no access to logs and tools relevant for troubleshooting	Team has full control over development lifecycle (e.g. access to PROD) and full access to logs and tools for troubleshooting
Feature handling	No feature branches for controlled releases	Feature branches are implemented for controlled releases of distinct features	Feature branching and toggles are implemented to facilitate development, roll-out and roll-back (if needed) of usable features to production
Releases	Releases to all users and all sites / geographies in one go	Releases to subset of users or sites or geographies	Gradual releases to subset of users in specific sites / geographies thereby limiting the blash raduis for potential issues

PRODUCT & PROCESSES

CAPABILITY	CRAWL	WALK	RUN
Gather and implement customer feedback	No customer (internal or external) feedback gathered in development cycles	Customer feedback (internal or external) gathered on an ad-hoc basis	Customer feedback (internal or external) gathered after all releases
Work in small batches and deploy more frequently	Big work batch size and releases on a monthly basis or longer	Work batch size optimized for weekly releases, but deployment frequency not in sync with business requirements (e.g lead time)	Work batch size optimized for frequent releases and deployment frequency in sync with business requirements (e.g. lead time)
Have a lightweight change approval process	Change approval needed from multiple parties outside the team	Change approval needed within the team	No change approval needed or change approval process totally automated
Integrate application data into Big Data Platform	No application data transferred at all	Partial business-relevant application data transferred to Big Data Platform or provided via API	All business-relevant application data transferred to Big Data Platform
SRE role and activities	No clear SRE role and responsibility from Product team perspective	SRE tasks are defined and agreed from Execution (Operations, Automation, Hotfix) perspective	SRE tasks are defined for Execution and Governance areas and agreed with all stakeholders (Business, Development)
Postmortems	No causal analysis done for all outages	All outage RCA conducted and tied to change / release	Blameless Postmortems are conducted for all outages
Resiliency / Chaos Engineering	No resiliency tests are conducted	Define environment dependencies (failure points) and execute resiliency tests to ensure no customer impact	Regular chaos (resiliency) exercise scheduled basis stead state / functionality change

MANAGEMENT & MONITORING

CAPABILITY	CRAWL	WALK	RUN
Monitor application and infrastructure performance	No monitoring in place	Application or infrastructure performance monitored but no alerting in place	Application and infrastructure performance is monitored; alerting in place for relevant KPI's
Monitor software delivery performance	No metrics monitored	Selected metrics monitored	All key metrics monitored
Limit Work in Progress	More than 10 features in progress	Less than 10 features in progress	Not more than 5 features in progress
Release governance	Product changes rolled out to production are not regulated for stability and reliability	Production changes are regulated basis stability and reliability benchmarks in test environments	Error Budget consumption regulates future releases to a product and act as gate to production changes
Resilience Monitoring	No KPI's defined for MTTx as per ITIL guidelines	Infra and Monitoring KPI's are defined as per ITIL guidelines for MTTx, availability, throughput, reported and deviations tracked to closure	Key monitoring signals form SLI, SLO (latency, throughput, error rate, saturation) are captured, reported and tied to product flow from business perspective

CULTURE

CAPABILITY	CRAWL	WALK	RUN
Build it and run it	Product teams build the system, operations run (and fix) it. No end to end ownership for product lifecycle. Dev and Ops staffed in separated teams	Full ownership for product teams to build and run the system supported by SRE. No L2 support needed	Full ownership for product teams to build and run the system. T-shape engineering profiles within the product teams to operate in full DevOps mode with enabled SRE in the product teams
Foster and enable team experimentation linked to business value	No time or resources dedicated for teams experimentations	Irregular time slots or events blocked for team experimentations (e.g. team hackathon)	Regular time slots or events blocked for team experimentations (e.g. team hackathon every month or quarter)
Support and facilitate collaboration among teams	No collaboration with other teams although necessary for the product	Irregular exchange between team members and or other teams (e.g. CoP, meetings, lunch, coffee, sports)	Regular exchange between among team members and other teams (e.g. CoP, meetings, lunch, coffee, sports)
Collaboration	No collaboration with Operations around product design from stability, reliability perspective	Product teams take design inputs (feedback) around stability, reliability from SRE experts. SRE experts are involved during testing phase (in development cycle) or post issues in production	Product architects collaborate regularly (from planning) with SRE experts to evolve the design of the product from performance, stability, reliability

ARCHITECTURE

CAPABILITY	CRAWL	WALK	RUN
Use a loosely coupled architecture	Monolithic application with a high level of interdependencies	Re-architecture in progress moving from a monolithic solution to a microservice-based architecture	System has no or very few direct dependencies to other systems. And those dependencies are tied to open standards and not tied to technologies and frameworks (e.g. Java RPC)
Focus on independent deployability and testability	Dependent deployability and testability across teams	Some components can be deployed and tested independently but parts of the components still have dependencies across teams	Teams can deploy and test their systems independently
Use established Platform Engineering solutions as a default	Custom solutions used even though provided by Platform Engineering	All solution aligned with Platform Engineering, Solution and Domain Architecture, but exceptions were granted	All solutions aligned with Platform Engineering, Solution and Domain Architecture and no custom solutions used that are provided by Platform Engineering