Splash Is Experiencing Intermittent Site-Wide Downtime
Incident Report for Splash
Postmortem
Issue summary:

The platform experienced intermittent errors impacting multiple features caused by an elevated number of incoming API calls sent by a third-party automated process.

 

Issue timeframe:

February 6, 2024 03:18 AM EST to 04:58 AM EST (1 hour 48 mins)

Sequence of events:
  • February 6, 03:18 AM EST First internal alert received; investigation started.
  • February 6,  03:40 AM EST System impact identified, and root cause traced to an abnormally elevated number of incoming API calls.
  • February 6, 04:10 AM EST Source IP addresses of the requests identified
  • February 6, 04:22 AM EST IP addresses blocked, and system begins to return to stability
  • February 6, 04:58 AM EST Functionalities fully restored.

Root cause:

Due to an automated process by an external application, an unnecessary number of API calls were sent to Splash that exceeded our native limits, which caused intermittent issues in several parts of the platform.

After identifying the IPs making this call, action was taken to block them, and all functionalities in the platform were restored.

Steps to prevent recurrence:
  • Splash connected with the third party generating the API calls, to ensure this process is corrected in all future instances.
  • Splash will enhance our IP address-level API rate limiting.
  • Splash will implement additional quality assurance and testing controls around rate limiting.
Posted Feb 07, 2024 - 19:14 UTC

Resolved
This incident has been resolved.
Posted Feb 06, 2024 - 09:54 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 06, 2024 - 09:42 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Feb 06, 2024 - 09:09 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 06, 2024 - 08:48 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Feb 06, 2024 - 08:47 UTC
Investigating
Splash is presently experiencing a site-wide issue and is intermittently unavailable. We are investigating the underlying cause of this issue.
Posted Feb 06, 2024 - 08:41 UTC
This incident affected: Logged In Experience (Event Page Design (CMS), Guest List Management (RSVP & Ticketed), Event Creation, Team Management, Analytics & Reporting, Email Design, User Login, Event Settings) and Splash API.