Fix sharded federation sender sometimes using 100% CPU.

We pull all destinations requiring catchup from the DB in batches.
However, if all those destinations get filtered out (due to the
federation sender being sharded), then the `last_processed` destination
doesn't get updated, and we keep requesting the same set repeatedly.
This commit is contained in:
Erik Johnston 2021-04-08 17:30:01 +01:00
parent 48d44ab142
commit 3a569fb200
2 changed files with 5 additions and 2 deletions

1
changelog.d/9770.bugfix Normal file
View file

@ -0,0 +1 @@
Fix bug where sharded federation senders could get stuck repeatedly querying the DB in a loop, using lots of CPU.

View file

@ -734,16 +734,18 @@ class FederationSender(AbstractFederationSender):
self._catchup_after_startup_timer = None
break
last_processed = destinations_to_wake[-1]
destinations_to_wake = [
d
for d in destinations_to_wake
if self._federation_shard_config.should_handle(self._instance_name, d)
]
for last_processed in destinations_to_wake:
for destination in destinations_to_wake:
logger.info(
"Destination %s has outstanding catch-up, waking up.",
last_processed,
)
self.wake_destination(last_processed)
self.wake_destination(destination)
await self.clock.sleep(CATCH_UP_STARTUP_INTERVAL_SEC)