18. Continuous Performance Testing (CPT) of Apps in Konflux

Date: 2023-03-10

Status

In consideration

Context

In general, performance testing is just another form of testing that helps application teams to ensure there are no regressions in their code and that their application behaves as expected.

The IntegrationTestScenario pipelines in Konflux are not suitable for full-blown performance and scale testing (that usually takes lots of time and involves human analysis so it is not suitable for quick automated checks we need for release gating runs in Konflux), but are a good place for a quick and small-scale regression test.

What makes performance testing different from functional testing is it is harder to decide if the test passed or failed as every performance test tests different aspects of the application and so it expects different performance, different metrics, different thresholds. Furthermore, even if the measured performance of the app under test did not change, there might be some significant resource usage by the application which we want to cause the test to be marked as failed.

The approach to make this pass/fail detection possible proposed here, is to use historical results as a benchmark – not only actual results of the performance test, but also monitoring data about the resource usage, etc.

Example: Imagine you develop a web e-shop application that uses PostgreSQL data backend. Your perf test is browsing through a goods catalog and measures latency of pages. When you want to decide on pass or fail result of this test, you need to check that metrics like below are aligned to previous results of a same test with same configuration:

  • Average page load latency.
  • Backend service CPU and memory usage…
  • PostgreSQL CPU and memory usage…
  • Internal PostgreSQL metrics like number of commits during the test or average number of database sessions during the test…

If any of these metrics does not align to historical results, we mark this test result as a failure.

And even if some test fails, the application team should be able to review the data and change the test definition, so next time a new test result is being evaluated based on the historical results, this new result will be one of them.

The algorithm that says if the current test passes or fails when compared to historical results can vary. It can be:

Goal of this ADR is to propose a way for this kind of testing to be implemented in Konflux (feature STONE-679). Even if it would not deprecate full-blown performance and scale testing, having a release gate with some short and small scale performance test is desirable for many application teams.

Glossary

Decision

Let’s use this architecture with a single Horreum instance per control plane cluster (as it is similar to what we do for Tekton results). Horreum instances would be managed by Red Hat and used by tenants on specific cluster.

Architecture diagram with Horreum

  1. Performance test runs in the Tekton pipeline and generates JSON with test parameters and results.
  2. Pipeline gathers configured monitoring metrics and add them to the JSON.
  3. Pipeline uploads the JSON with all the data to Horreum.
  4. Horreum performs result analysis, looking for changes in configured metrics.
  5. Pipeline gets PASS/FAIL decision from Horreum back to the pipeline, so pipeline can return proper result.

Although Horreum provides rich web UI for configuring JSON parsing, change detection and data visualization, it will stay hidden to Konflux users. Konflux will expose subset of that functionality in it’s own web UI and will talk to Horreum via it’s API interface.

We need to make a decision about one instance per cluster or one instance per tenant.

Consequences

Pros:

Cons: