Collectors

The release process with Konflux is well-structured, and the documentation provides clear examples of how to supply data to the Release, ReleasePlan, or ReleasePlanAdmission resources for use within the release workflow.

Despite this, a limitation remains that prevents full workflow automation. In scenarios where a data field in one of the release resources needs to be populated with dynamic information retrieved from an external service before initiating the release, relying on manual steps or custom scripts introduces inefficiency and potential for error.

To address this limitation, Konflux includes a feature called collectors.

A collector is essentially a Python script executed as part of the tenant and managed collectors pipelines. It generates information that is embedded into the Release status. These pipelines are integrated into the release workflow and run at the very beginning, immediately after the validation step. As a result, the collected data becomes available to both the tenant and managed pipelines.

Using a collector in a Konflux release

To use a collector, the first step is to select one from the available options in the official repository. The structure of this repository may evolve over time, but the README.md file provides useful details about the available collectors and the data they produce. The key piece of information needed is the collector’s name, which will be referenced in one of the release resources.

Collectors can be defined in the following resources:

ReleasePlan: Collectors defined here are executed by the tenant collectors pipeline, which runs in the tenant namespace.
ReleasePlanAdmission: Collectors defined here are executed by the managed collectors pipeline, which runs in the managed namespace.

For example, to run the jira collector—which retrieves a list of Jira issues when provided with a server and a query—the following configuration should be added to the ReleasePlan:

apiVersion: appstudio.redhat.com/v1alpha1
kind: ReleasePlan
metadata:
  labels:
    release.appstudio.openshift.io/auto-release: 'true' (1)
    release.appstudio.openshift.io/standing-attribution: 'true'
  name: collectors-rp
  namespace: dev-tenant-namespace (2)
spec:
  application: <application-name> (3)
  collectors:
    serviceAccountName: <service-account> (4)
    items: (5)
      - name: project-issues
        params:
          - name: url
            value: https://issues.redhat.com
          - name: query
            value: 'project = "My Project" AND summary ~ "test issue"'
          - name: secretName
            value: "jira-collectors-secret"
        timeout: 60
        type: jira (6)
    secrets: (7)
      - jira-collectors-secret
    serviceAccountName: collector-service
  data: <key> (8)
  target: managed-tenant-namespace

1	Optional: Control if Releases should be created automatically for this ReleasePlan when tests pass. Defaults to true.
2	The development team’s tenant namespace. The collector pipeline will be executed in this namespace.
3	The name of the application that you want to release via a pipeline in the development tenant namespace.
4	The ServiceAccount that the pipeline will use.
5	List of parameters to be passed to the collector.
6	The collector type as seen in the official collectors repository.
7	Secrets to be provided to the collectors.
8	Optional: An unstructured key used for providing data for the managed Pipeline.

Retrieving collectors data

After the collectors pipelines complete execution, the output from each collector is added to the Release resource under the status.collectors field. Below is an example showing the result of a collector defined in the previously mentioned ReleasePlan:

apiVersion: appstudio.redhat.com/v1alpha1
kind: Release
...
status:
  collectors:
    tenant:
      - project-issues:
          releaseNotes:
            fixed:
             - id: "CVE-3444"
               source: "issues.redhat.com"

In this case, the project-issues collector generated a list of issues, which is included under status.collectors.tenant. Since this collector was defined in the ReleasePlan, its output is categorized under the tenant section. Collectors defined in a ReleasePlanAdmission will have their results stored under the managed key instead.

The following example shows a Release status containing results from multiple collectors, both tenant and managed:

apiVersion: appstudio.redhat.com/v1alpha1
kind: Release
...
status:
  collectors:
    managed:
      - foo:
          releaseNotes:
            cves:
             - key: "CVE-3444"
               component: "my-component"
    tenant:
      - bar:
          baz: qux
      - project-issues:
          releaseNotes:
            issues:
              fixed:
                - id: "CPAAS-1234"
                  source: "issues.redhat.com"

Collectors in the managed pipeline

Releases can reference managed pipelines, which—as described in other sections—rely on the data field to retrieve user-provided information. To ensure that data generated by collectors is also considered, the contents of status.collectors are merged with the data fields from the Release, ReleasePlan, and ReleasePlanAdmission resources.

The order of precedence follows the same hierarchy previously described, with status.collectors having the lowest priority. This means that if both the collector output and any data field define the same key, the value from the data field will take precedence.

For example, if a collector like jira produces the following output:

status:
  collectors:
    tenant:
      - project-issues:
          releaseNotes:
            issues:
              fixed:
                - id: "CPAAS-1234"
                  source: "issues.redhat.com"
          releaseNotes:
            cves:
             - key: "CVE-3444"
               component: "my-component"

And the ReleasePlanAdmission defines this:

data:
  releaseNotes:
    issues:
      fixed: []

Then the empty issues.fixed array from the data field will override the collector’s output.

In contrast, if the data field contains unrelated content:

data:
  foo: bar

Then both sources will be merged, and the final data used by the managed pipeline will be:

data:
  foo: bar
  releaseNotes:
    issues:
      fixed:
        - id: "CPAAS-1234"
          source: "issues.redhat.com"

This merging strategy ensures flexibility while allowing user-defined data to take precedence when needed.