{"id":329,"date":"2023-05-31T13:19:36","date_gmt":"2023-05-31T13:19:36","guid":{"rendered":"https:\/\/blog.spike.sh\/2023\/05\/31\/postmortem-of-our-dashboards-outage\/"},"modified":"2025-06-06T12:21:28","modified_gmt":"2025-06-06T06:51:28","slug":"postmortem-of-our-dashboards-outage","status":"publish","type":"post","link":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/","title":{"rendered":"Postmortem of Our Dashboard&#8217;s Outage"},"content":{"rendered":"\n<nav aria-label=\"Table of Contents\" class=\"wp-block-table-of-contents\"><ol><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#what-went-wrong\">What went wrong?<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#how-did-we-fix-the-issue\">How did we fix the issue?<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#did-it-impact-our-services\">Did it impact our services?<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#what-are-we-doing-to-prevent-this-from-happening-again\">What are we doing to prevent this from happening again?<\/a><\/li><li><a class=\"wp-block-table-of-contents__entry\" href=\"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#closing-notes\">Closing notes<\/a><\/li><\/ol><\/nav>\n\n\n\n<p class=\"wp-block-paragraph\">Our dashboard went down at 6:55 PM UTC on 30th May 2023 for a total time of 195 minutes. This critical incident impacted only our dashboard. All the other services like Hooks, Alerts, API, Escalations, and Status Page were not impacted. During the dashboard&#8217;s downtime, since all other services were operational, we did see incidents and alerts getting triggered.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>tl;dr<\/strong><br>Our process manager (pm2) was automatically restarting processes because our temp directory triggered the pm2 watch. Nothing a quick patch couldn&#8217;t fix but it took some time to understand why it was happening.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-went-wrong\">What went wrong?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/app.spike.sh\/\" rel=\"noopener noreferrer nofollow\">Our dashboard<\/a> is built on primarily NodeJS and we use PM2 as a process manager. We believe incident management is as much about humans as incidents themselves which is why we started our focus on well, slightly personalising your dashboard. The first thing we did was just add profile pictures (<em>it&#8217;s not enough but definitely a start<\/em>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While uploading a picture we first write on a temp folder before we begin uploading to S3. Unfortunately, the process manager&#8217;s watcher triggered and restarted the process because we accidentally changed the path of upload from temp to the root folder.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Uploading pictures is not an everyday business. Mainly because we have made it difficult for everyone to just upload a picture the moment you see your initials (<em>we are working on improving this<\/em>). To upload a picture, <a href=\"https:\/\/app.spike.sh\/settings\/personal-profile\" rel=\"noopener noreferrer nofollow\">visit profile settings<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Last night \/ Early morning, our dashboard service kept getting restarting causing 504 timeout errors. Our uptime monitoring solution is hooked to Twilio painfully via Zapier. Also, some of you emailed and created tickets. Thanks for reaching out. We did connect over a quick call with some of you after the incident was resolved. However, we couldn&#8217;t get everyone. Please consider our apologies.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-did-we-fix-the-issue\">How did we fix the issue?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Understanding the issue took all of our time. The fix was easy &#8211; patching a fix and deploying basically did the trick.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"did-it-impact-our-services\">Did it impact our services?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">All our services &#8211; Alerts, Escalations, Incidents, API, Status pages, and Hooks were not impacted.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-are-we-doing-to-prevent-this-from-happening-again\">What are we doing to prevent this from happening again?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Test: <\/strong>Better testing, more integration and E2E tests are coming in place. This is something a test suite would have caught early and prevented this from happening.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Process:<\/strong> We believe in an honest and transparent policy here. We are setting up a process, a checklist of sorts, on things to do to bring about this transparency with all of you.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"closing-notes\">Closing notes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Managing incidents and triggering alerts is a major responsibility and one we take very seriously. During this outage, the reality is, we did not update our status page effective immediately to reflect the critical outage. We should have. We are better than this. Our sincere apologies. Many of you are seniors to us in this very industry and we would like to keep this transparency and bring in better processes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Going further, we will keep our <a href=\"https:\/\/status.spike.sh\/\" rel=\"noopener noreferrer nofollow\">status page<\/a> updated (<em>also automate it<\/em>). You can also learn about these incidents as they happen on <a href=\"https:\/\/twitter.com\/spikedhq\" rel=\"noopener noreferrer nofollow\">Twitter<\/a>, <a href=\"https:\/\/www.linkedin.com\/company\/spike-hq\" rel=\"noopener noreferrer nofollow\">LinkedIn<\/a>, and <a href=\"https:\/\/www.reddit.com\/r\/spikesh\/\" rel=\"noopener noreferrer nofollow\">Reddit<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We are sorry for the disruption this caused. We are actively making these improvements to ensure improved stability moving forward so that this problem will not happen again.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Postmortem of Dashboard&#8217;s outage on 30th May 2023. Incidents, Alerts, Escalations, API, and Status Page were NOT impacted.<\/p>\n","protected":false},"author":191914268,"featured_media":739,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","_lmt_disableupdate":"","_lmt_disable":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wpas_customize_per_network":false,"jetpack_post_was_ever_published":false},"categories":[1444],"tags":[1391],"class_list":["post-329","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postmortem","tag-postmortem"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Postmortem of Our Dashboard&#039;s Outage<\/title>\n<meta name=\"description\" content=\"Dashboard outage postmortem: Learn how we identified, resolved, and prevented a 195-minute dashboard outage at Spike for better reliability.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Postmortem of Our Dashboard&#039;s Outage\" \/>\n<meta property=\"og:description\" content=\"Dashboard outage postmortem: Learn how we identified, resolved, and prevented a 195-minute dashboard outage at Spike for better reliability.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/\" \/>\n<meta property=\"og:site_name\" content=\"Spike&#039;s blog\" \/>\n<meta property=\"article:published_time\" content=\"2023-05-31T13:19:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-06T06:51:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.spike.sh\/wp-content\/uploads\/2023\/05\/postmorem-31st-may.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2076\" \/>\n\t<meta property=\"og:image:height\" content=\"1201\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Kaushik\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kaushik\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/\"},\"author\":{\"name\":\"Kaushik\",\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/#\\\/schema\\\/person\\\/b137e57ace218547f02b86fdcb2d0e64\"},\"headline\":\"Postmortem of Our Dashboard&#8217;s Outage\",\"datePublished\":\"2023-05-31T13:19:36+00:00\",\"dateModified\":\"2025-06-06T06:51:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/\"},\"wordCount\":576,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/blog.spike.sh\\\/wp-content\\\/uploads\\\/2023\\\/05\\\/postmorem-31st-may.png\",\"keywords\":[\"postmortem\"],\"articleSection\":[\"Postmortem\"],\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/\",\"url\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/\",\"name\":\"Postmortem of Our Dashboard's Outage\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/blog.spike.sh\\\/wp-content\\\/uploads\\\/2023\\\/05\\\/postmorem-31st-may.png\",\"datePublished\":\"2023-05-31T13:19:36+00:00\",\"dateModified\":\"2025-06-06T06:51:28+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/#\\\/schema\\\/person\\\/b137e57ace218547f02b86fdcb2d0e64\"},\"description\":\"Dashboard outage postmortem: Learn how we identified, resolved, and prevented a 195-minute dashboard outage at Spike for better reliability.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/#primaryimage\",\"url\":\"https:\\\/\\\/blog.spike.sh\\\/wp-content\\\/uploads\\\/2023\\\/05\\\/postmorem-31st-may.png\",\"contentUrl\":\"https:\\\/\\\/blog.spike.sh\\\/wp-content\\\/uploads\\\/2023\\\/05\\\/postmorem-31st-may.png\",\"width\":2076,\"height\":1201},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/postmortem-of-our-dashboards-outage\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/blog.spike.sh\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Postmortem of Our Dashboard&#8217;s Outage\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/#website\",\"url\":\"https:\\\/\\\/blog.spike.sh\\\/\",\"name\":\"Spike&#039;s blog\",\"description\":\"Learnings and opinions in a changing world\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/blog.spike.sh\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/blog.spike.sh\\\/#\\\/schema\\\/person\\\/b137e57ace218547f02b86fdcb2d0e64\",\"name\":\"Kaushik\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/c7ec6b633161978fc09ed325cefde9061797a65a730e4b98c0eb26bc6925bc81?s=96&d=robohash&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/c7ec6b633161978fc09ed325cefde9061797a65a730e4b98c0eb26bc6925bc81?s=96&d=robohash&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/c7ec6b633161978fc09ed325cefde9061797a65a730e4b98c0eb26bc6925bc81?s=96&d=robohash&r=g\",\"caption\":\"Kaushik\"},\"description\":\"Founder of Spike. I like sharing how we are building Spike and the intricacies of building a startup by waking people up for critical incidents.\",\"url\":\"https:\\\/\\\/blog.spike.sh\\\/author\\\/spikehq\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Postmortem of Our Dashboard's Outage","description":"Dashboard outage postmortem: Learn how we identified, resolved, and prevented a 195-minute dashboard outage at Spike for better reliability.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/","og_locale":"en_GB","og_type":"article","og_title":"Postmortem of Our Dashboard's Outage","og_description":"Dashboard outage postmortem: Learn how we identified, resolved, and prevented a 195-minute dashboard outage at Spike for better reliability.","og_url":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/","og_site_name":"Spike&#039;s blog","article_published_time":"2023-05-31T13:19:36+00:00","article_modified_time":"2025-06-06T06:51:28+00:00","og_image":[{"width":2076,"height":1201,"url":"https:\/\/blog.spike.sh\/wp-content\/uploads\/2023\/05\/postmorem-31st-may.png","type":"image\/png"}],"author":"Kaushik","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Kaushik","Estimated reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#article","isPartOf":{"@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/"},"author":{"name":"Kaushik","@id":"https:\/\/blog.spike.sh\/#\/schema\/person\/b137e57ace218547f02b86fdcb2d0e64"},"headline":"Postmortem of Our Dashboard&#8217;s Outage","datePublished":"2023-05-31T13:19:36+00:00","dateModified":"2025-06-06T06:51:28+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/"},"wordCount":576,"commentCount":0,"image":{"@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.spike.sh\/wp-content\/uploads\/2023\/05\/postmorem-31st-may.png","keywords":["postmortem"],"articleSection":["Postmortem"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/","url":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/","name":"Postmortem of Our Dashboard's Outage","isPartOf":{"@id":"https:\/\/blog.spike.sh\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#primaryimage"},"image":{"@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.spike.sh\/wp-content\/uploads\/2023\/05\/postmorem-31st-may.png","datePublished":"2023-05-31T13:19:36+00:00","dateModified":"2025-06-06T06:51:28+00:00","author":{"@id":"https:\/\/blog.spike.sh\/#\/schema\/person\/b137e57ace218547f02b86fdcb2d0e64"},"description":"Dashboard outage postmortem: Learn how we identified, resolved, and prevented a 195-minute dashboard outage at Spike for better reliability.","breadcrumb":{"@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#primaryimage","url":"https:\/\/blog.spike.sh\/wp-content\/uploads\/2023\/05\/postmorem-31st-may.png","contentUrl":"https:\/\/blog.spike.sh\/wp-content\/uploads\/2023\/05\/postmorem-31st-may.png","width":2076,"height":1201},{"@type":"BreadcrumbList","@id":"https:\/\/blog.spike.sh\/postmortem-of-our-dashboards-outage\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.spike.sh\/"},{"@type":"ListItem","position":2,"name":"Postmortem of Our Dashboard&#8217;s Outage"}]},{"@type":"WebSite","@id":"https:\/\/blog.spike.sh\/#website","url":"https:\/\/blog.spike.sh\/","name":"Spike&#039;s blog","description":"Learnings and opinions in a changing world","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.spike.sh\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/blog.spike.sh\/#\/schema\/person\/b137e57ace218547f02b86fdcb2d0e64","name":"Kaushik","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/c7ec6b633161978fc09ed325cefde9061797a65a730e4b98c0eb26bc6925bc81?s=96&d=robohash&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/c7ec6b633161978fc09ed325cefde9061797a65a730e4b98c0eb26bc6925bc81?s=96&d=robohash&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c7ec6b633161978fc09ed325cefde9061797a65a730e4b98c0eb26bc6925bc81?s=96&d=robohash&r=g","caption":"Kaushik"},"description":"Founder of Spike. I like sharing how we are building Spike and the intricacies of building a startup by waking people up for critical incidents.","url":"https:\/\/blog.spike.sh\/author\/spikehq\/"}]}},"modified_by":"Sreekar","jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/blog.spike.sh\/wp-content\/uploads\/2023\/05\/postmorem-31st-may.png","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pfMe4Q-5j","jetpack-related-posts":[{"id":4457,"url":"https:\/\/blog.spike.sh\/incident-postmortem\/","url_meta":{"origin":329,"position":0},"title":"Incident Postmortem: How to Learn From Failures and Build Reliable Systems","author":"Samyati Mohanty","date":"27th November, 2025","format":false,"excerpt":"Incident postmortems help teams learn from outages without blame. This guide explains what they are, how to run them well, and how they strengthen reliability and continuous improvement.","rel":"","context":"In &quot;Uncategorized&quot;","block_context":{"text":"Uncategorized","link":"https:\/\/blog.spike.sh\/category\/uncategorised\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/Getting-started-with-Incident-Management-1.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/Getting-started-with-Incident-Management-1.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/Getting-started-with-Incident-Management-1.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/Getting-started-with-Incident-Management-1.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/Getting-started-with-Incident-Management-1.png?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/Getting-started-with-Incident-Management-1.png?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":4354,"url":"https:\/\/blog.spike.sh\/how-to-run-blameless-postmortem\/","url_meta":{"origin":329,"position":1},"title":"How to Conduct a Blameless Postmortem","author":"Randhir Kumar","date":"20th November, 2025","format":false,"excerpt":"Incidents happen. A blameless postmortem is how your team learns from them without finger-pointing. This blog explains how to run an effective postmortem and build a resilient engineering culture.","rel":"","context":"In &quot;Post Incident&quot;","block_context":{"text":"Post Incident","link":"https:\/\/blog.spike.sh\/category\/incident-management\/post-incident\/"},"img":{"alt_text":"Blog cover titled \"How to Conduct a Blameless Postmortem\"","src":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/The-Top-10-On-Call-Management-Tools-for-DevOps.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/The-Top-10-On-Call-Management-Tools-for-DevOps.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/The-Top-10-On-Call-Management-Tools-for-DevOps.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/11\/The-Top-10-On-Call-Management-Tools-for-DevOps.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":4515,"url":"https:\/\/blog.spike.sh\/postmortem-on-call-system-discrepancy\/","url_meta":{"origin":329,"position":2},"title":"Postmortem of On-Call System Discrepancy","author":"Damanpreet","date":"4th December, 2025","format":false,"excerpt":"On December 4th, 2025, we identified a critical discrepancy between displayed on-call schedules and actual alert routing for weekly rotations. This postmortem details how a recent bug fix exposed months of underlying system misalignment, our investigation process, and key lessons.","rel":"","context":"In &quot;Postmortem&quot;","block_context":{"text":"Postmortem","link":"https:\/\/blog.spike.sh\/category\/postmortem\/"},"img":{"alt_text":"blog cover postmortem of on-call system discrepancy","src":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/12\/Postmoretem.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/12\/Postmoretem.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/12\/Postmoretem.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/12\/Postmoretem.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":351,"url":"https:\/\/blog.spike.sh\/postmortem-incident-grouping\/","url_meta":{"origin":329,"position":3},"title":"Postmortem on Incorrect Incident Grouping","author":"Kaushik","date":"21st March, 2024","format":false,"excerpt":"SummaryLeadupImpactDetectionResponsePutting Spike to good useRecoveryLessonsClosing note On March 14th, we encountered an incident involving incorrect grouping of different incidents. Our postmortem has some extensive details for all our users. At Spike, we are committed to alerting you when things go awry so it\u2019s only fair we keep it absolutely transparent\u2026","rel":"","context":"In &quot;Postmortem&quot;","block_context":{"text":"Postmortem","link":"https:\/\/blog.spike.sh\/category\/postmortem\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2024\/03\/Postmortem-incident-groupring-v2-4-.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2024\/03\/Postmortem-incident-groupring-v2-4-.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2024\/03\/Postmortem-incident-groupring-v2-4-.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2024\/03\/Postmortem-incident-groupring-v2-4-.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2024\/03\/Postmortem-incident-groupring-v2-4-.png?resize=1050%2C600&ssl=1 3x"},"classes":[]},{"id":2074,"url":"https:\/\/blog.spike.sh\/postmortem-of-escalations-triggering-out-of-order\/","url_meta":{"origin":329,"position":4},"title":"Postmortem of escalations triggering out of order","author":"Kaushik","date":"30th June, 2025","format":false,"excerpt":"On 27th June, 2025, we identified an incident in our escalation engine where steps fired out of sequence. Escalation policies containing more than four escalation steps\u2014and at least 2 alerts configured in Step 3 onwards \u2014occasionally triggered step 4 (and subsequent steps) earlier than their defined intervals. Although every notification\u2026","rel":"","context":"In &quot;Building Spike&quot;","block_context":{"text":"Building Spike","link":"https:\/\/blog.spike.sh\/category\/building-spike\/"},"img":{"alt_text":"postmorte-escalations triggering out of order","src":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/06\/postmorte-escalations-triggering-out-of-order.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/06\/postmorte-escalations-triggering-out-of-order.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/06\/postmorte-escalations-triggering-out-of-order.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/06\/postmorte-escalations-triggering-out-of-order.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":4561,"url":"https:\/\/blog.spike.sh\/postmortem-on-datadog-incidents-not-autoresolving\/","url_meta":{"origin":329,"position":5},"title":"Postmortem on Datadog incidents not auto-resolving","author":"Damanpreet","date":"18th December, 2025","format":false,"excerpt":"On December 17th, 2025, we found that Datadog incidents weren't auto-resolving due to an issue in our Incident Grouping logic. We resolved the issue, and now Datadog incidents are auto-resolved as expected. This postmortem details the incident timeline, root cause analysis, and lessons learned.","rel":"","context":"In &quot;Postmortem&quot;","block_context":{"text":"Postmortem","link":"https:\/\/blog.spike.sh\/category\/postmortem\/"},"img":{"alt_text":"Blog cover titled \"Postmortem on Datadog incidents not auto-resolving\"","src":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/12\/background-47.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/12\/background-47.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/12\/background-47.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.spike.sh\/wp-content\/uploads\/2025\/12\/background-47.png?resize=700%2C400&ssl=1 2x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/posts\/329","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/users\/191914268"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/comments?post=329"}],"version-history":[{"count":3,"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/posts\/329\/revisions"}],"predecessor-version":[{"id":1769,"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/posts\/329\/revisions\/1769"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/media\/739"}],"wp:attachment":[{"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/media?parent=329"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/categories?post=329"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.spike.sh\/wp-json\/wp\/v2\/tags?post=329"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}