Luminesce instability

Major incident Luminesce
2026-02-27 13:00 GMT · 4 hours

Updates

Retroactive

On Friday 26th February at approximately 13:00 -17:00 GMT, Luminesce instability was encountered.

The disruption was the result of two factors occurring at the same time.
First, the number of active services running on the Luminesce platform grew significantly over this period, primarily driven by expanded use of a particular feature by clients. As more services came online, the Luminesce platform’s internal coordination layer - which is responsible for registering and verifying that each service is available and ready - had to handle a much higher volume of background activity than it was previously sized for.

Second, routine background maintenance on the platform’s infrastructure caused certain components to restart more frequently than normal. Each restart required those components to re-register with the coordination layer, adding further to the load.

Under these combined pressures, the coordination layer periodically became overloaded and stopped processing background requests. When this happened, services appeared active but were unable to respond to client requests - resulting in the job failures and error messages that were observed.

Once the backlog was cleared, either automatically or by engineering intervention, the coordination layer resumed normal operation and services recovered.

The longer-term resolution is an upgrade to a newer version of RabbitMQ, currently in testing in lower environments, which handles higher load more efficiently and will significantly reduce the risk of recurrence. This will be communicated in due course.

March 12, 2026 · 13:03 GMT

← Back