Due to extreme load to our platform, several backend services slowed down resulting in cascading failures.
The cause was an extremely high number of unique visitors, using a project/platform wrongly connecting multiple times to our backend per client. The amount of clients was not the kind of traffic that would organically ever hit our backends.
Normally our rate limiting would have kicked in, but the project owners created quite a lot of API credentials, with which they created many payloads, distributed to many users, who then connected multiple times each. Public payload websocket connections weren’t rate limited.
The following measures have been taken (and would have prevented this outage):
After implementing these limits, backend infrastructure was restarted to force getting rid of pending connections & queries.
Project owners who create a relatively high amount of legitimate payloads may be rate limited on payload creation: if this is the case, please reach out to us so we can apply suitable payload creation limits.