Healthcheck can monitor your cron jobs and notify you when they don't run at
expected times. Assuming
wget is available, you will not need to install
any new software on your servers.
The principle of operation is simple: your cron job sends an HTTP request ("ping") to Healthcheck every time it completes. When Healthcheck does not receive the HTTP request at the expected time, it notifies you. This monitoring technique, sometimes called "heartbeat monitoring", is a type of dead man's switch. It can detect various failure modes:
Let's take a look at an example cron job:
# run backup.sh at 06:08 every day
8 6 * * * /home/me/backup.sh
To monitor it, first create a new Check in your Healthcheck account:
After creating the check, copy the generated ping URL , and update the job's definition:
# run backup.sh, then send a success signal to Healthcheck
8 6 * * * /home/me/backup.sh && curl -fsS -m 10 --retry 5 -o /dev/null https://healthcheck.rhmn.de/ping/your-uuid-here
The extra curl call lets Healthcheck know the cron job has run successfully. Healthcheck keeps track of the received pings and notifies you as soon as a ping does not arrive on time.
Note: you can alternatively add the extra
curl call as a final line inside the
/home/me/backup.sh script to keep the cron job's definition clean and short.
You can use an HTTP client other than curl to send the HTTP request.
The extra options in the above example tell curl to retry failed HTTP requests, limit the maximum execution time, and silence output unless there is an error. Feel free to adjust the curl options to suit your needs.
/home/me/backup.sh exits with an exit code 0.
Grace Time is the amount of extra time to wait when a cron job is running late before declaring it as down. Set Grace Time to be above the expected duration of your cron job.
For example, let's say the cron job starts at 14:00 every day and takes between 15 and 25 minutes to complete. The grace time is set to 30 minutes. In this scenario, Healthcheck will expect a ping to arrive at 14:00 but will not send any alerts yet. If there is no ping by 14:30, it will declare the job failed and send alerts.
Healthcheck has integrations to deliver notifications over different channels: email, webhooks, SMS, chat messages, incident management systems, and more. You can and should set up multiple ways to get notified about job failures:
Additionally, to make sure no issues "slip through the cracks", in the Account Settings › Email Reports page you can configure Healthcheck to send repeated email notifications every hour or every day as long as any of the jobs is down:
Classic cron implementations have a built-in method of notifying about cron job failures, the MAILTO variable:
8 6 * * * /home/me/backup.sh
So why not just use that? There are several drawbacks:
If your cron job consistently pings Healthcheck an hour early or an hour late, the likely cause is a timezone mismatch: your machine may be using a timezone different from what you have configured on Healthcheck.
On modern GNU/Linux systems, you can look up the time zone using the
timedatectl status command and looking for "Time zone" in its output:
$ timedatectl status
Local time: C 2020-01-23 12:35:50 EET
Universal time: C 2020-01-23 10:35:50 UTC
RTC time: C 2020-01-23 10:35:50
Time zone: Europe/Riga (EET, +0200)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
On a systemd-based system, you can use the
journalctl utility to see system logs,
including logs from the cron daemon.
To see live logs:
To see the logs from e.g. the last hour, and only from the cron daemon:
journalctl --since "1 hour ago" -t CRON