Disruption in Webhook FIFO Event Delivery

Disruption in Webhook FIFO Event Delivery

Executive Summary

Between 13th July 2024 13:45 UTC to 13th July 2024 17:00 UTC, our webhook delivery service experienced a significant disruption specifically affecting FIFO (First In, First Out) event processing. This disruption was traced back to the database used for storing webhook events. The root cause was identified as an issue with the database configuration related to automated backups.

Events Timeline

After the incident was closed, the Engineering and Customer Success teams continued to review all active sessions during this period for any adverse effects.

Closed July 13 17:00 UTC
Following additional monitoring, no further errors in webhook FIFO event deliveries were observed. The incident was officially closed.

Update and Monitoring July 13 16:20 UTC
Our engineering team made necessary updates to the database configurations to rectify the issue.

Investigating July 13 15:05 UTC
An incident was declared after it was confirmed that the automated backup configuration had impacted the database, leading to failures in webhook delivery.

Identified July 13 15:00 UTC
A network monitoring alert revealed a decrease in webhook delivery rates, specifically impacting FIFO events, prompting an investigation by our Engineering team.

Started July 13 13:45 UTC
Webhook delivery disruption started. An issue with the real-time monitoring/alerting system caused a delay in identification by an hour.

Mitigation Actions

To prevent these types of issues from happening again in the future, we have taken or are taking the following actions:

  • Escalation of the issue with our database service provider to address and resolve the root causes associated with the automated backup configuration.
  • Enhanced alerting and monitoring systems for our webhook processing system.