Skip to content

Monitoring

This page describes how to monitor a Squash Orchestrator installation.

Logging

As is customary with Docker and Kubernetes deployments, all logs are generated in the console.

By default, the log level is INFO. You can customize that log level for all services by defining the DEBUG_LEVEL environment variable. Possible values are NOTSET, DEBUG, INFO (the default), WARNING, ERROR, and FATAL (from the most verbose to the least verbose mode).

Info

You can fine-tune the desired log level per service. For more information please refer to Environment variables.

Watchdog

The Squash Orchestrator is a collection of services running together. It contains a built-in watchdog: if a service dies, the instance will die too.

When the watchdog is triggered, the failed services are listed in the log console.

Temporary Storage

A Squash Orchestrator deployment is stateless: it may include configuration files, but no data is generated on disk that would have to be preserved upon restart.

It does use temporary storage (typically on /tmp) to hold information on running and recently completed workflows.

The retention policy can be configured as described in the Observer Service documentation.

By default, information related to a given workflow are kept for one hour after the workflow's completion or cancellation.

Liveness endpoint

The Squash Orchestrator receptionist service provides a POST /workflows?ping endpoint that returns a 200 code with a Pong! body if the orchestrator service is operational.

This endpoint requires a valid token that is allowed to create workflows.

curl -X POST \
     -H "Authorization: Bearer ${TOKEN}" \
     http://orchestrator.example.com/workflows?ping
curl -X POST ^
     -H "Authorization: Bearer %TOKEN%" ^
     http://orchestrator.example.com/workflows?ping
curl.exe -X POST `
     -H "Authorization: Bearer $Env:TOKEN" `
     http://orchestrator.example.com/workflows?ping
{
  "apiVersion":"v1",
  "kind":"Status",
  "metadata":{},
  "message":"Pong!",
  "details":null,
  "status":"Success",
  "reason":"OK",
  "code":200
}

Activity endpoint

GET /subscriptions

The Squash Orchestrator event bus service exposes a GET /subscriptions endpoint that returns the list of active subscriptions and their statuses as a JSON document.

Each subscription manifest contains a status part that shows the last publication timestamp, the publication count, the publication status summary, and the quarantine status:

This endpoint requires a valid token that is allowed to list subscriptions.

{
  "apiVersion": "opentestfactory.org/v1alpha1",
  "kind": "Subscription",
  "metadata": {
    "name": "allinone",
    "subscription_id": "fa50d95d-98b0-42d5-be56-10f24e1f2736"
  },
  "spec": {
    ...
  },
  "status": {
    "lastPublicationTimestamp": "2024-05-30T02:54:22.387308",
    "publicationCount": 9,
    "publicationStatusSummary": {
      "200": 9
    },
    "quarantine": 0
  }
}

GET /workflows/status

The Squash Orchestrator observer service exposes a GET /workflows/status endpoint that returns the overall status of the orchestrator as a JSON document.

This endpoint requires a valid token that is allowed to get statuses.

An orchestrator that is currently processing workflows would return something like:

{
  "apiVersion": "v1",
  "kind": "Status",
  "metadata": {},
  "message": "1 workflows in progress",
  "details": {
    "items": ["50c7e5d1-7bdc-46f0-9422-3cc6660d00c0"],
    "status": "BUSY"
  },
  "status": "Success",
  "reason": "OK",
  "code": 200
}

An idle orchestrator would return something like:

{
  "apiVersion": "v1",
  "kind": "Status",
  "metadata": {},
  "message": "No workflow in progress",
  "details": {
    "items": [],
    "status": "IDLE"
  },
  "status": "Success",
  "reason": "OK",
  "code": 200
}

Probes

The Squash Orchestrator image includes an opentf-ctl command that can be used in readiness and liveness probes to help Kubernetes monitor the health of the container.

startupProbe:
  exec:
    command:
    - opentf-ctl
    - get
    - subscriptions
  failureThreshold: 20
  initialDelaySeconds: 1
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 3
readinessProbe:
  exec:
    command:
    - opentf-ctl
    - get
    - subscriptions
  failureThreshold: 3
  initialDelaySeconds: 1
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 5

Channels

The Squash Orchestrator observer service exposes a GET /channels endpoint that returns the known channels (execution environments), the capabilities (tags) they provide, and their statuses (IDLE, BUSY, PENDING, or UNREACHABLE) as a JSON document.

Miscellaneous

The Squash Orchestrator observer service exposes a GET /version endpoint that returns the version of the running components as a JSON document.