Date Documented: 2023-02-09 Date Accepted: 2023-02-14
Superseded by ADR 36. Integration service promotes components to GCL immediately after builds complete
The Integration Service is in charge of running integration test pipelines by executing the Tekton pipeline for each user-defined IntegrationTestScenario. The main payloads that are being tested are Snapshots which contain references to all Components that belong to the Application, along with their images.
One problem faced by the integration service is caused by the fact that testing an application with Tekton pipelines takes time. During the course of testing an application, multiple component builds can happen in quick succession, leading to a potential race condition between different Snapshots that are created for each of those builds. This would primarily manifest itself by two Snapshots getting promoted in quick succession but neither of them having the latest images contained within them.
In order to protect against the race conditions, the integration service will leverage a two phase approach to testing. The phases will consist of a Component phase that will always be executed and the optional Composite phase which would come into play only when race conditions between component builds are detected.
When a single component image is built, the Integration Service tests the application by creating a Snapshot. All Components with their images from the Global Candidate List are included within the Snapshot and then the Component that was newly built is updated/overwritten to complete the Snapshot creation.
After all test pipelines for the Snapshot finish successfully, the Integration service updates the Global Candidate List with the newly built Component image and checks if it can promote the Component Snapshot. If the Global Candidate List for other Components doesn’t match the rest of the Component Snapshot contents, its status is marked as invalid and the testing goes into the Composite phase. Otherwise, the Component Snapshot is promoted according to user preferences.
The Composite phase is used when the Global Candidate List changes while testing a Snapshot in the Component phase.
The Composite phase exists to resolve a race condition when teams merge multiple PRs to multiple components of the same application at nearly the same time. When multiple components are built at the same time, the Integration Service tests the application by creating a composite Snapshot using multiple Components updated to use the newly built images.
If all testing pipelines pass successfully, the Composite Snapshot is promoted according to user preferences.
To illustrate the consequences of implementing the above approach, we can outline two scenarios, one with a happy path where only a single component is built at a time, and one with a race condition where two components are built in quick succession.
In the happy path without race conditions, one PR merges to one component.
In the path with race conditions, two PRs merge to two components at the same time.
For future consideration, if two components rely on each other to the point that breaking changes from one component create issues in another component during testing, a new feature to support batching of components together is being investigated. This would allow for the Integration service to hold off on testing a Component build until the dependent build is also completed.
We originally made this decision verbally and with diagrams back in May of 2022, and have been operating with it as the operating design since then. However, we realized (through conversations like this slack conversation) that it is not obvious without documentation. We are documenting it here as an ADR for posterity, visibility.