Monitoring
This page describes how to monitor a Squash Orchestrator installation.
Logging
As is customary with Docker and Kubernetes deployments, all logs are generated in the console.
By default, the log level is INFO
. You can customize that log level for all services
by defining the DEBUG_LEVEL
environment variable. Possible values are NOTSET
, DEBUG
,
INFO
(the default), WARNING
, ERROR
, and FATAL
(from the most verbose to the least
verbose mode).
Info
You can fine-tune the desired log level per service. For more information please refer to Environment variables.
Watchdog
The Squash Orchestrator is a collection of services running together. It contains a built-in watchdog: if a service dies, the instance will die too.
When the watchdog is triggered, the failed services are listed in the log console.
Temporary Storage
A Squash Orchestrator deployment is stateless: it may include configuration files, but no data is generated on disk that would have to be preserved upon restart.
It does use temporary storage (typically on /tmp
) to hold information on running and
recently completed workflows.
The retention policy can be configured as described in the Observer Service documentation.
By default, information related to a given workflow are kept for one hour after the workflow's completion or cancellation.
Liveness endpoint
The Squash Orchestrator receptionist
service provides a POST /workflows?ping
endpoint that returns a 200 code with a Pong!
body if the orchestrator service is operational.
This endpoint requires a valid token that is allowed to create workflows.
curl -X POST \
-H "Authorization: Bearer ${TOKEN}" \
http://orchestrator.example.com/workflows?ping
curl -X POST ^
-H "Authorization: Bearer %TOKEN%" ^
http://orchestrator.example.com/workflows?ping
curl.exe -X POST `
-H "Authorization: Bearer $Env:TOKEN" `
http://orchestrator.example.com/workflows?ping
{
"apiVersion":"v1",
"kind":"Status",
"metadata":{},
"message":"Pong!",
"details":null,
"status":"Success",
"reason":"OK",
"code":200
}
Activity endpoint
GET /subscriptions
The Squash Orchestrator event bus
service exposes a GET /subscriptions
endpoint that returns the list of active subscriptions and their statuses as a JSON document.
Each subscription manifest contains a status
part that shows the last publication timestamp,
the publication count, the publication status summary, and the quarantine status:
This endpoint requires a valid token that is allowed to list subscriptions.
{
"apiVersion": "opentestfactory.org/v1alpha1",
"kind": "Subscription",
"metadata": {
"name": "allinone",
"subscription_id": "fa50d95d-98b0-42d5-be56-10f24e1f2736"
},
"spec": {
...
},
"status": {
"lastPublicationTimestamp": "2024-05-30T02:54:22.387308",
"publicationCount": 9,
"publicationStatusSummary": {
"200": 9
},
"quarantine": 0
}
}
GET /workflows/status
The Squash Orchestrator observer service
exposes a GET /workflows/status
endpoint
that returns the overall status of the orchestrator as a JSON document.
This endpoint requires a valid token that is allowed to get statuses.
An orchestrator that is currently processing workflows would return something like:
{
"apiVersion": "v1",
"kind": "Status",
"metadata": {},
"message": "1 workflows in progress",
"details": {
"items": ["50c7e5d1-7bdc-46f0-9422-3cc6660d00c0"],
"status": "BUSY"
},
"status": "Success",
"reason": "OK",
"code": 200
}
An idle orchestrator would return something like:
{
"apiVersion": "v1",
"kind": "Status",
"metadata": {},
"message": "No workflow in progress",
"details": {
"items": [],
"status": "IDLE"
},
"status": "Success",
"reason": "OK",
"code": 200
}
Probes
The Squash Orchestrator image includes an opentf-ctl
command that can be
used in readiness and liveness probes to help Kubernetes monitor the health of the
container.
startupProbe:
exec:
command:
- opentf-ctl
- get
- subscriptions
failureThreshold: 20
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
readinessProbe:
exec:
command:
- opentf-ctl
- get
- subscriptions
failureThreshold: 3
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
Channels
The Squash Orchestrator observer service
exposes a GET /channels
endpoint
that returns the known channels (execution environments), the capabilities (tags) they provide, and
their statuses (IDLE
, BUSY
, PENDING
, or UNREACHABLE
) as a JSON document.
Miscellaneous
The Squash Orchestrator observer service
exposes a GET /version
endpoint
that returns the version of the running components as a JSON document.