How to use the framework?

July 17, 2020 · View on GitHub

  1. The first step will be to do a self-assessment of the current status of your Product Team for each one of the identified capabilities.

  2. Define the desired end-point at the end of the next improvement cycle, a cycle can be a month, a quarter, a semester ... every team can define their improvement cycles although a good start would be to set quarterly targets to be able to define meaningful actions.

  3. Identify the actions you will need to achieve the desired end-point.

DEVELOPMENT


CAPABILITYCRAWLWALKRUN
Use version control for all production artifactsNo version controlSource code or other assets under version controlSource code or other assets under version control and all production artifacts versioned and stored in the corresponding artifact repository
Automate deployment processesManual deployment processPartially automated deployment processFully automated deployment process
Implement test automationManual test script executionPartially automated testing (unit or regression or performance tests)Fully automated testing (unit and reliability (regression and performance tests)
Implement infrastructure automationManual deployment processPartially automated deployment process. Provisioning is done by the teamsFully automated deployment (infrastructure-as-code). Platform Engineering provides base images
Support test data managementNo test data managementPartially automated test data management (e.g. manually triggered import and export of test data)Fully automated test data management incl. strategy (e.g. consumer data only in PROD)
Implement continuous deliveryNo continuous deliveryPartially automated delivery pipeline (e.g. automated build, test process with the manual deployment)Fully automated pipeline (automated build, test, deployment across environments)
Include NFR’s in Definition of DoneNo NFR's usedAd-hoc NFR checksStandardised NFR checklist as acceptance criteria for successful releases
Shift left on securityNo security aspects considered during development cycleSecurity aspects considered during development cycle but shifted towards release (not a priority)Security aspects included during development cycle from the very start
Build for resilienceNo resilience build into systemDesign infrastructure and code for failureDesign infrastructure and code for failure with fully automated error recovery (self-healing)
Enable team for troubleshootingNo control over development lifecycle (e.g. access to PROD)Team has full control over development lifecycle (e.g. access to PROD), but no access to logs and tools relevant for troubleshootingTeam has full control over development lifecycle (e.g. access to PROD) and full access to logs and tools for troubleshooting
Feature handlingNo feature branches for controlled releasesFeature branches are implemented for controlled releases of distinct featuresFeature branching and toggles are implemented to facilitate development, roll-out and roll-back (if needed) of usable features to production
ReleasesReleases to all users and all sites / geographies in one goReleases to subset of users or sites or geographiesGradual releases to subset of users in specific sites / geographies thereby limiting the blash raduis for potential issues

PRODUCT & PROCESSES


CAPABILITYCRAWLWALKRUN
Gather and implement customer feedbackNo customer (internal or external) feedback gathered in development cyclesCustomer feedback (internal or external) gathered on an ad-hoc basisCustomer feedback (internal or external) gathered after all releases
Work in small batches and deploy more frequentlyBig work batch size and releases on a monthly basis or longerWork batch size optimized for weekly releases, but deployment frequency not in sync with business requirements (e.g lead time)Work batch size optimized for frequent releases and deployment frequency in sync with business requirements (e.g. lead time)
Have a lightweight change approval processChange approval needed from multiple parties outside the teamChange approval needed within the teamNo change approval needed or change approval process totally automated
Integrate application data into Big Data PlatformNo application data transferred at allPartial business-relevant application data transferred to Big Data Platform or provided via APIAll business-relevant application data transferred to Big Data Platform
SRE role and activitiesNo clear SRE role and responsibility from Product team perspectiveSRE tasks are defined and agreed from Execution (Operations, Automation, Hotfix) perspectiveSRE tasks are defined for Execution and Governance areas and agreed with all stakeholders (Business, Development)
PostmortemsNo causal analysis done for all outagesAll outage RCA conducted and tied to change / releaseBlameless Postmortems are conducted for all outages
Resiliency / Chaos EngineeringNo resiliency tests are conductedDefine environment dependencies (failure points) and execute resiliency tests to ensure no customer impactRegular chaos (resiliency) exercise scheduled basis stead state / functionality change

MANAGEMENT & MONITORING


CAPABILITYCRAWLWALKRUN
Monitor application and infrastructure performanceNo monitoring in placeApplication or infrastructure performance monitored but no alerting in placeApplication and infrastructure performance is monitored; alerting in place for relevant KPI's
Monitor software delivery performanceNo metrics monitoredSelected metrics monitoredAll key metrics monitored
Limit Work in ProgressMore than 10 features in progressLess than 10 features in progressNot more than 5 features in progress
Release governanceProduct changes rolled out to production are not regulated for stability and reliabilityProduction changes are regulated basis stability and reliability benchmarks in test environmentsError Budget consumption regulates future releases to a product and act as gate to production changes
Resilience MonitoringNo KPI's defined for MTTx as per ITIL guidelinesInfra and Monitoring KPI's are defined as per ITIL guidelines for MTTx, availability, throughput, reported and deviations tracked to closureKey monitoring signals form SLI, SLO (latency, throughput, error rate, saturation) are captured, reported and tied to product flow from business perspective

CULTURE


CAPABILITYCRAWLWALKRUN
Build it and run itProduct teams build the system, operations run (and fix) it. No end to end ownership for product lifecycle. Dev and Ops staffed in separated teamsFull ownership for product teams to build and run the system supported by SRE. No L2 support neededFull ownership for product teams to build and run the system. T-shape engineering profiles within the product teams to operate in full DevOps mode with enabled SRE in the product teams
Foster and enable team experimentation linked to business valueNo time or resources dedicated for teams experimentationsIrregular time slots or events blocked for team experimentations (e.g. team hackathon)Regular time slots or events blocked for team experimentations (e.g. team hackathon every month or quarter)
Support and facilitate collaboration among teamsNo collaboration with other teams although necessary for the productIrregular exchange between team members and or other teams (e.g. CoP, meetings, lunch, coffee, sports)Regular exchange between among team members and other teams (e.g. CoP, meetings, lunch, coffee, sports)
CollaborationNo collaboration with Operations around product design from stability, reliability perspectiveProduct teams take design inputs (feedback) around stability, reliability from SRE experts. SRE experts are involved during testing phase (in development cycle) or post issues in productionProduct architects collaborate regularly (from planning) with SRE experts to evolve the design of the product from performance, stability, reliability

ARCHITECTURE


CAPABILITYCRAWLWALKRUN
Use a loosely coupled architectureMonolithic application with a high level of interdependenciesRe-architecture in progress moving from a monolithic solution to a microservice-based architectureSystem has no or very few direct dependencies to other systems. And those dependencies are tied to open standards and not tied to technologies and frameworks (e.g. Java RPC)
Focus on independent deployability and testabilityDependent deployability and testability across teamsSome components can be deployed and tested independently but parts of the components still have dependencies across teamsTeams can deploy and test their systems independently
Use established Platform Engineering solutions as a defaultCustom solutions used even though provided by Platform EngineeringAll solution aligned with Platform Engineering, Solution and Domain Architecture, but exceptions were grantedAll solutions aligned with Platform Engineering, Solution and Domain Architecture and no custom solutions used that are provided by Platform Engineering