Health Checks
Health checks emit 2 events
check.passed
check.failed
notification.yamlapiVersion: mission-control.flanksource.com/v1
kind: Notification
metadata:
name: api-http-fail-alert
namespace: default
spec:
events:
- check.failed
filter: check.type == 'http'
title: API HTTP Check {{.check.name}} failing
body: |
## Check Failed
Error: {{.status.error}}
Failed at {{.status.created_at}}
to:
email: alerts@acme.com
Default Templates
check.passed
Title
{{ if ne channel "slack"}}Check {{.check.name}} has passed{{end}}
Template
{{ if eq .channel "slack"}}
{
"blocks": [
{{slackSectionTextMD (printf `:large_green_circle: *%s* is _healthy_` .canary.name)}},
{"type": "divider"},
{{ if .status.message}}{{slackSectionTextMD status.message}},{{end}}
{
"type": "section",
"fields": [
{{slackSectionTextFieldMD (printf `*Canary*: %s` .canary.name) }},
{{slackSectionTextFieldMD (printf `*Namespace*: %s` .canary.namespace) }}
{{if ne .agent.name "local"}}
,{{slackSectionTextFieldMD (printf `*Agent*: %s` .agent.name) }}
{{end}}
]
},
{{ if .check.labels}}{{slackSectionLabels .check}},{{end}}
{{ slackURLAction "View Health Check" .permalink "🔕 Silence" .silenceURL}}
]
}
{{ else }}
Canary: {{.canary.name}}
{{if .agent}}Agent: {{.agent.name}}{{end}}
{{if .status.message}}Message: {{.status.message}} {{end}}
{{labelsFormat .check.labels}}
[Reference]({{.permalink}})
{{end}}
check.failed
Title
{{ if ne channel "slack"}}Check {{.check.name}} has failed{{end}}
Template
{{ if eq channel "slack"}}
{
"blocks": [
{{slackSectionTextMD (printf `:red_circle: *%s* is _unhealthy_` .check.name)}},
{"type": "divider"},
{{ if .status.error}}{{slackSectionTextMD status.error}},{{end}}
{
"type": "section",
"fields": [
{{slackSectionTextFieldMD (printf `*Canary*: %s` .canary.name) }},
{{slackSectionTextFieldMD (printf `*Namespace*: %s` .canary.namespace) }}
{{if ne .agent.name "local"}}
,{{slackSectionTextFieldMD (printf `*Agent*: %s` .agent.name) }}
{{end}}
]
},
{{ if .check.labels}}{{slackSectionLabels .check}},{{end}}
{{ slackURLAction "View Health Check" .permalink "🔕 Silence" .silenceURL}}
]
}
{{ else }}
Canary: {{.canary.name}}
{{if .agent}}Agent: {{.agent.name}}{{end}}
Error: {{.status.error}}
{{labelsFormat .check.labels}}
[Reference]({{.permalink}})
{{end}}
Template Variables
Field | Description | Scheme |
---|---|---|
agent | Details of the agent that created the config. | |
canary | canary | |
check | Check | |
permalink | Link to the Catalog in mission control |
|
status | check status |
Agent
Field | Description | Scheme |
---|---|---|
description | Short description of the agent |
|
id | The id of the agent |
|
name | The name of the agent |
|
Canary
Field | Description | Scheme |
---|---|---|
created_at | The created at of the canary |
|
deleted_at | The deleted at of the canary |
|
id | The id of the canary |
|
labels | The labels of the canary |
|
name | The name of the canary |
|
namespace | The namespace of the canary |
|
source | The source of the canary |
|
updated_at | The updated at of the canary |
|
Check
Field | Description | Scheme |
---|---|---|
created_at | The created at of the check |
|
deleted_at | The deleted at of the check |
|
description | The description of the check |
|
id | The id of the check |
|
labels | The labels of the check |
|
last_runtime | The last runtime of the check |
|
last_transition_time | The last transition time of the check |
|
latency | The past 1 hour latency summary | |
name | The name of the check |
|
next_runtime | The next runtime of the check |
|
severity | The severity of the check |
|
status | Check status details |
|
transformed | Whether the check has been transformed |
|
type | The type of the check |
|
updated_at | The updated at of the check |
|
uptime | The past 1 hour uptime summary |
CheckStatus
Field | Description | Scheme |
---|---|---|
check_id | The id of the check associated with this status |
|
created_at | The created at of the check |
|
duration | The duration of the check |
|
error | The error of the check in case of failure |
|
invalid | Whether the check errored out |
|
message | The success message of the check |
|
status | The status of the check |
|
time | The time of the check |
|
Uptime
Field | Description | Scheme |
---|---|---|
failed | The number of checks that failed |
|
last_fail | The last time a check failed |
|
last_pass | The last time a check passed |
|
p100 | The percentage of checks that passed |
|
passed | The number of checks that passed |
|
Latency
Field | Description | Scheme |
---|---|---|
p95 | The latency of the check |
|
p97 | The latency of the check |
|
p99 | The latency of the check |
|
rolling1h | The latency of the check |
|