Cloud Incident: Runs on Old Data (March 14th, 6:50 PM PT - 7:10 PM PT)

Hi Everyone,

Summary (TL;DR)

On 14th March 2024, between 6:50 PM PT and 7:10 PT (Pacific Time) , there was an issue with our polling triggers fetching the wrong previous state, causing the detection of old items as new ones. You may have noticed that you received runs on the old data. We apologize for all the spam that occurred.

All tasks associated with these runs have been removed from the task usage.

UPDATE on Impact:

  1. Most of the runs were marked as internal errors. If you see these errors during this timeframe, it means they were not executed. You can safely ignore them as we halted the runs before execution.

  2. A small number of synchronous webhooks were unable to execute on time due to the queue being too full

Issue timeline

All timestamps referenced are in PT (Pacific Time):

  • New release deployed at 6:50 PM.
  • Alert thrown for our team regarding a sudden increase in runs at 7:00 PM.
  • We rolled back the previous release.
  • We confirmed it’s not malicious attack.
  • Everything is back to how it was. 7:10 PM
  • Server caught up with processing 7:30 PM
  • We found the recent feature related to support run scope in storage service changed the default scope for storage which caused triggers to fail.

Lessons

  • The end-to-end tests and the alerts were able to notice the spike but were not enough to prevent it.
  • We understand that the weakness of storage service and how one mistake it cause large incident. The only way to detect these issues is to have a similar environment to production and perform a couple of end-to-end tests on this environment for a while to ensure everything is correct before moving to production.

Conclusion

We are aware of multiple incidents that happened this month in Activepieces. We implemented safeguards in each area that experienced incidents and learned more about the weak spots in our software.

We can see that the results of many issues were prevented by these new safeguards, but there is still new ground we need to cover.

Usually, we say “Until next time” in our announcements, but this time I will say, hopefully, there won’t be a next time.

3 Likes