6. Log Conventions

Status

Accepted

Context

We need log conventions to make our offering easier to operate and maintain. These will also make it possible for us to create queries that cross components in our logs, which supports STONE-201.

We initially decided to require log messages formatted by the console encoder, thinking that it would be advantageous to be able to consume logs in a human readable format directly from our controller pods (2023-03-08), but on subsequent review we realized that this bears an undesirable cost in configuration of our logging: the default encoder for the “production” profile in zap is the json encoder (2023-03-29).

Decision: Log conventions

In our controller logs, we will use structured log messages, formatted in JSON with key-value pairs as described below.

For example:

{"ts": "2023-03-07T11:32:29.167Z", "level": "info", "logger": "controllers.Component", "msg": "updating devfile component name kubernetes-deploy ...", "namespace": "samburrai-tenant", "name": "go-net-http-hello-prqx", "controllerKind": "Component"}
{"ts": "2023-03-07T11:32:29.020Z", "level": "info", "logger": "controllers.Component", "msg": "API Resource updated", "namespace": "samburrai-tenant", "audit": "true", "name": "go-net-http-hello-prqx", "controllerKind": "Component", "action": "UPDATE"}

The ts field stands for “timestamp” (encoded as UTC/ISO-8601) and the meaning of log level should be self evident. The logger name helps your team understand which part of your service is emitting the log (usually Log.WithName from zap). The msg is a human readable string describing the event being logged. Further json key value pairs contain additional fields useful for searching. See full list of fields in the Appendix.

The opinionated logging mechanism from the controller-runtime framework already conforms to this scheme. Care should be taken to make sure specific key/value pairs are logged.

1. When did it happen?

Included in: ts

Encode timestamps in UTC/ISO-8601 format.

2. What happened?

Included in: msg and action

Use the key action with possible values VIEW, ADD, UPDATE, and DELETE, if applicable.

Optionally, also describe the action in the human-readable msg.

3. Where did it happen?

Included in: none

There is no need to identify which Konflux subsystem the log is coming from (i.e., HAS, SPI, or GitOps).

Consider: if an engineer is looking at logs directly in the namespace where a controller is deployed, then you know which service you are looking at. If an engineer is looking at logs centralized in cloudwatch or splunk, then the namespace from which the log came will be included automatically as namespace_name, which is sufficient to determine what Konflux subsystem produced the log. See section on automatically added logs below.

4. Who was involved?

Included in: msg and namespace

Use the key namespace when logging the namespace of the targeted resource that is being modified or interacted with. (Note that the namespace_name key is automatically added by the fluent collectors and reflects the namespace in which the controller is running. See section on automatically added logs below.)

Optionally, also refer to the namespace of the targeted resource in the human-readable msg.

5. Where it came from?

Included in: msg, controllerKind, name, logger, and caller

Use the key controllerKind with possible resource kind values determined by the component team. For example, for HAS this can be Application, Component, etc. For SPI this can be SPIAccessToken. The controllerKind key is automatically added to the log context by operator-sdk.

Use the key name with the name of the resource being acted upon. The name of the resource is automatically added to the log context by operator-sdk.

Optionally, use the key logger to describe the part of your controller that is emitting the log.

Optionally, use the key caller to direct developers to the line of source code where the event was logged (example in SPI).

Optionally, refer to these in the human-readable msg.

Controller audit logs

Included in: audit

For the specific types of audit logs required by SSML.PW.5.1.4 Perform Event Logging, one of the key-value pairs in the log entry should be audit: true. This makes it easier to query aggregated logs to find these special log entries.

More details can be found in SSML-8093, and a good example of a working implementation can be found in the GitOps code implemented for this story.

Automatically added fields

Note that fluentd and Fluent Bit will annotate your log messages with the following information automatically:

The cluster, node, pod and container names are also part of the log stream name. For example, under /aws/containerinsights/<Cluster_Name>/application:

For more details on Fluentd vs. Fluent Bit logs, see Set up Fluent Bit as a DaemonSet to send logs to CloudWatch Logs.

It is best to avoid using the same key as the log collectors in our controller logs, to avoid confusion.

Out of Scope

TaskRun logs are specifically out of scope of this ADR. Those logs are typically meant to be viewed and interpreted by users. Use human-readable strings for most logs in that context.

Upstream controllers are also out of scope. Where possible, influence upstream controllers to align to this ADR. It would just be nice. But it would be unreasonable to try to bring all of our dependencies under the same log scheme.

User application logs are out of scope. Obviously, we cannot control their log format, nor do we want to for the goals of this ADR.

Consequences

logger "sigs.k8s.io/controller-runtime/pkg/log"

func f() {
    log = logger.FromContext(ctx)

    log.Info("log message", "jsonfield1", "jsonfield1value", etc... )
}

Appendix

JSONSchema

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "ts": {
      "description": "Timestamp of the event",
      "type": "string"
    },
    "level": {
      "description": "Level of the log event (debug, info, etc..)",
      "type": "string"
    },
    "logger": {
      "description": "Name of the logger used to produce the event",
      "type": "string"
    },
    "msg": {
      "description": "Human-readable description of the logged event",
      "type": "string"
    },
    "action": {
      "description": "Indicates what action the log event is responding to",
      "type": "string",
      "enum": ["VIEW", "ADD", "UPDATE", "DELETE"]
    },
    "namespace": {
      "description": "Indicates the target namespace of the resource being reconciled",
      "type": "string"
    },
    "controllerKind": {
      "description": "The kind of the resource being reconciled",
      "type": "string"
    },
    "name": {
      "description": "The name of the resource being reconciled",
      "type": "string"
    },
    "caller": {
      "description": "A reference to the line of source code that is emitting this log",
      "type": "string"
    },
    "audit": {
      "description": "Set to true for specific types of audit logs required by SSML.PW.5.1.4",
      "type": "boolean"
    }
  },
  "additionalProperties": true,
  "required": ["ts", "level", "msg"],
  "not": {
    "required": [
      "namespace_name",
      "container_name",
      "pod_name",
      "container_image",
      "pod_ip",
      "host",
      "hostname",
      "namespace_labels",
      "message",
      "level",
      "time"
    ]
  }
}