Monitoring and alerting checklist
Comprehensive list of monitoring best practices.
Monitor the uptime of websites and APIs
Monitoring your customer facing websites and critical API endpoints can make you aware of serious issues that should be resolved urgently. You should add uptime monitoring to check these endpoints at regular intervals and raise alerts if the endpoints are not responding or sending a wrong response.
Monitor the uptime of your databases
Your database is a critical part of your applications and monitoring the uptime of the database is of critical importance.
Relevant Spike.sh integrations:
Webhooks
Monitor the disk space utilization
Disk space can often fill up due to growing log files and is especially dangerous because it can be hard to keep track of.
You can monitor disk space usage using the guides for AWS, Google Cloud Platform or Microsoft Azure below.
Alternately, you can create a script to check disk space and raise alert via webhook if the disk space utilisation
goes above a certain threshold (usually 80-90%). A cron job should execute this script at regular intervals (at least once a day).
Monitor your infrastructure
Keep track of utilization and load for your infrastructure by monitoring CPU and memory usage.
High CPU utilisation can lead to programs slowing down or freezing altogether.
High memory utilisation can lead to performance bottlenecks and inability to handle more users on your website and apps.
You should also raise alerts when your network I/O usage goes up either due to user load or suspicious network activity.
Keep track of application errors
Keep track of important errors and exceptions in your web and mobile apps which affect your customers.
Configure the error monitoring to raise alerts based on the importance and frequency of the errors.
Monitor your cron jobs
Cron jobs form the backbone of your system and keep track of important tasks like DB backups, user data management etc.
Cron job failures can often go unnoticed and cause havoc. Keep track of them and raise alerts when necessary.
Relevant Spike.sh integrations:
Healthchecks,
Cronitor
Monitor your application performance
Poor application performance can lead to a bad user experience and lead to users leaving your website and apps.
Monitor the performance of your apps and alert your engineering and ops teams when performance thresholds are crossed.
Monitor your security alerts
Keep track of security events in your apps before they become security incidents or breaches.
Configure your security products to raise alerts when serious security events take place.
Relevant Spike.sh integrations:
Webhooks
Monitor your important business functionality
Raise alerts when important business critical features are facing issues.
e.g. For e-commerce websites, raise alerts when cart or payment functionality errors arise.
Relevant Spike.sh integrations:
Webhooks