Commit graph

216 commits

Author SHA1 Message Date
Eric Eastwood
944d1e0cc8 Merge branch 'develop' into madlittlemods/17368-bust-_membership_stream_cache 2024-11-12 14:30:55 -06:00
Erik Johnston
c7a1d0aa1a
Fix Twisted tests with latest release (#17911)
c.f. #17906 and #17907
2024-11-07 16:22:09 +00:00
Devon Hudson
eda735e4bb
Remove support for python 3.8 (#17908)
### Pull Request Checklist

<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->

* [X] Pull request is based on the develop branch
* [X] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
  - Use markdown where necessary, mostly for `code blocks`.
  - End with either a period (.) or an exclamation mark (!).
  - Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [X] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))

---------

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2024-11-06 19:36:01 +00:00
Alexander Udovichenko
211c31dbd7
Fix WheelTimer implementation that can expired timeout early (#17850)
When entries insert in the end of timer queue, then unnecessary entry
inserted (with duplicated key).
This can lead to some timeouts expired early and consume memory.
2024-11-05 12:08:17 -06:00
Erik Johnston
83513b75f7
Speed up sliding sync by computing extensions in parallel (#17884)
The main change here is to add a helper function
`gather_optional_coroutines`, which works in a similar way as
`yieldable_gather_results` but takes a set of coroutines rather than a
function
2024-10-30 10:51:04 +00:00
Eric Eastwood
64eb71f574 Merge branch 'develop' into madlittlemods/17368-bust-_membership_stream_cache 2024-10-09 17:26:38 -05:00
Erik Johnston
81e0f57800
Fix perf when streams don't change often (#17767)
There is a bug with the `StreamChangeCache` where it would incorrectly
return that all entities had changed if asked for entities changed
*since* the earliest stream position.

Note that for streams we use the inequalities: `$min_stream_id <
stream_id <= $max_stream_id`, i.e. when we ask the stream change cache
for all things that have changed since `$stream_id` we don't care for
events that happened *at* `$stream_id`.

Specifically: `_earliest_known_stream_pos` is the position at which we
know that we'll have entries for all changes since that point, we can
use the cache for any stream IDs that equal
`_earliest_known_stream_pos`.

`_earliest_known_stream_pos` is set in three places:
- On startup we set it either to:
  - the current maximum stream ID, with not prefilled values; or
  - the minimum of the latest N values we pulled from the DB
- When we evict items from the bottom, we set it to the stream ID of the
evicted items.

This was changed in https://github.com/matrix-org/synapse/pull/14435,
but I think we were overly conservative there.

---------

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2024-09-30 13:52:33 +01:00
Eric Eastwood
2bd0e634ac Bust _membership_stream_cache cache when current state changes
Fix https://github.com/element-hq/synapse/issues/17368
2024-09-18 17:26:52 -05:00
Erik Johnston
d225b6b3eb
Speed up SS room sorting (#17468)
We do this by bulk fetching the latest stream ordering.

---------

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2024-07-23 14:03:14 +01:00
dependabot[bot]
79924aebef
Bump mypy from 1.9.0 to 1.10.1 (#17445) 2024-07-16 12:08:06 +01:00
Erik Johnston
3e6ee8ff88
Add optimisation to StreamChangeCache (#17130)
When there have been lots of changes compared with the number of
entities, we can do a fast(er) path.

Locally I ran some benchmarking, and the comparison seems to give the
best determination of which method we use.
2024-05-06 12:56:52 +01:00
dependabot[bot]
1e68b56a62
Bump black from 23.10.1 to 24.2.0 (#16936) 2024-03-13 16:46:44 +00:00
Erik Johnston
7b4d7429f8
Don't invalidate the entire event cache when we purge history (#16905)
We do this by adding support to the LRU cache for "extra indices" based
on the cached value. This allows us to efficiently map from room ID to
the cached events and only invalidate those.
2024-02-13 13:24:11 +00:00
Erik Johnston
23740eaa3d
Correctly mention previous copyright (#16820)
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
2024-01-23 11:26:48 +00:00
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Erik Johnston
1b238e8837
Speed up persisting large number of outliers (#16649)
Recalculating the roots tuple every iteration could be very expensive, so instead let's do a topological sort.
2023-11-16 14:25:35 +00:00
Patrick Cloke
f2f2c7c1f0
Use full GitHub links instead of bare issue numbers. (#16637) 2023-11-15 08:02:11 -05:00
Patrick Cloke
77dfc1f939
Fix a bug where servers could be marked as up when they were failing (#16506)
After this change a server will only be reported as back online
if they were previously having requests fail.
2023-10-17 07:32:40 -04:00
Patrick Cloke
7ec0a141b4
Convert more cached return values to immutable types (#16356) 2023-09-20 07:48:55 -04:00
Patrick Cloke
aa483cb4c9
Update ruff config (#16283)
Enable additional checks & clean-up unneeded configuration.
2023-09-08 11:24:36 -04:00
Mathieu Velten
501da8ecd8
Task scheduler: add replication notify for new task to launch ASAP (#16184) 2023-08-28 14:03:51 +00:00
David Robertson
e691243e19
Fix typechecking with twisted trunk (#16121) 2023-08-24 14:53:07 +00:00
DeepBlueV7.X
19a1cda084
Properly update retry_last_ts when hitting the maximum retry interval (#16156)
* Properly update retry_last_ts when hitting the maximum retry interval

This was broken in 1.87 when the maximum retry interval got changed from
almost infinite to a week (and made configurable).

fixes #16101

Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>

* Add changelog

* Change fix + add test

* Add comment

---------

Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
2023-08-23 09:35:23 +01:00
Mathieu Velten
358896e1b8
Implements a task scheduler for resumable potentially long running tasks (#15891) 2023-08-21 14:17:13 +02:00
Mathieu Velten
f0a860908b
Allow config of the backoff algorithm for the federation client. (#15754)
Adds three new configuration variables:

* destination_min_retry_interval is identical to before (10mn).
* destination_retry_multiplier is now 2 instead of 5, the maximum value will
  be reached slower.
* destination_max_retry_interval is one day instead of (essentially) infinity.

Capping this will cause destinations to continue to be retried sometimes instead
of being lost forever. The previous value was 2 ^ 62 milliseconds.
2023-08-03 14:36:55 -04:00
Patrick Cloke
ca5c4be921
Add type hints to test_descriptors. (#15659)
Require type hints in test_descriptors and add missing ones.
2023-05-24 14:18:52 +00:00
Patrick Cloke
42aea0d8af
Add final type hint to tests.unittest. (#15072)
Adds a return type to HomeServerTestCase.make_homeserver and deal
with any variables which are no longer Any.
2023-02-14 14:03:35 -05:00
Sean Quah
a302d3ecf7
Remove unnecessary reactor reference from _PerHostRatelimiter (#14842)
Fix up #14812 to avoid introducing a reference to the reactor.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-16 13:16:19 +00:00
Sean Quah
772e8c2385
Fix stack overflow in _PerHostRatelimiter due to synchronous requests (#14812)
When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.

A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.

Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.

Fixes #14480.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-01-13 00:16:21 +00:00
Patrick Cloke
630d0aeaf6
Support RFC7636 PKCE in the OAuth 2.0 flow. (#14750)
PKCE can protect against certain attacks and is enabled by default. Support
can be controlled manually by setting the pkce_method of each oidc_providers
entry to 'auto' (default), 'always', or 'never'.

This is required by Twitter OAuth 2.0 support.
2023-01-04 14:58:08 -05:00
Patrick Cloke
da77720752
Check the stream position before checking if the cache is empty. (#14639)
An empty cache does not mean the entity has no changed, if
it is earlier than the earliest known stream position return that
the entity *has* changed since the cache cannot accurately
answer that query.
2022-12-08 11:35:49 -05:00
Erik Johnston
cee9445884
Better return type for get_all_entities_changed (#14604)
Help callers from using the return value incorrectly by ensuring
that callers explicitly check if there was a cache hit or not.
2022-12-05 15:19:14 -05:00
Patrick Cloke
6a8310f3df
Compare to the earliest known stream pos in the stream change cache. (#14435)
The internal methods of the StreamChangeCache were inconsistently
treating the earliest known stream position as valid. It is now treated as
invalid, meaning the cache cannot determine if an entity at the earliest
known stream position has changed or not.
2022-12-05 09:00:59 -05:00
Patrick Cloke
acea4d7a2f
Add missing types to tests.util. (#14597)
Removes files under tests.util from the ignored by list, then
fully types all tests/util/*.py files.
2022-12-02 17:58:56 +00:00
Patrick Cloke
4ae967cf63
Add missing type hints to test.util.caches (#14529) 2022-11-22 17:35:54 -05:00
Quentin Gliech
8756d5c87e
Save login tokens in database (#13844)
* Save login tokens in database

Signed-off-by: Quentin Gliech <quenting@element.io>

* Add upgrade notes

* Track login token reuse in a Prometheus metric

Signed-off-by: Quentin Gliech <quenting@element.io>
2022-10-26 11:45:41 +01:00
Nick Mills-Barrett
c9dffd5b33
Remove unused @lru_cache decorator (#13595)
* Remove unused `@lru_cache` decorator

Spotted this working on something else.

Co-authored-by: David Robertson <davidr@element.io>
2022-10-25 11:39:25 +01:00
dependabot[bot]
0b7830e457
Bump flake8-bugbear from 21.3.2 to 22.9.23 (#14042)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
2022-10-19 19:38:24 +00:00
David Robertson
6f0c3e669d
Don't require setuptools_rust at runtime (#13952) 2022-09-29 20:16:08 +00:00
Eric Eastwood
29269d9d3f
Fix have_seen_event cache not being invalidated (#13863)
Fix https://github.com/matrix-org/synapse/issues/13856
Fix https://github.com/matrix-org/synapse/issues/13865

> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](https://github.com/matrix-org/complement/pull/214#discussion_r741678240). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to https://github.com/matrix-org/synapse/issues/13625) 
> 
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
> 
> *-- https://github.com/matrix-org/synapse/issues/13856*


### The problem

`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).

Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.

Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
2022-09-27 15:55:43 -05:00
Erik Johnston
0b87eb8e0c
Make DictionaryCache have better expiry properties (#13292) 2022-07-21 17:13:44 +01:00
Quentin Gliech
fe1daad672
Move the "email unsubscribe" resource, refactor the macaroon generator & simplify the access token verification logic. (#12986)
This simplifies the access token verification logic by removing the `rights`
parameter which was only ever used for the unsubscribe link in email
notifications. The latter has been moved under the `/_synapse` namespace,
since it is not a standard API.

This also makes the email verification link more secure, by embedding the
app_id and pushkey in the macaroon and verifying it. This prevents the user
from tampering the query parameters of that unsubscribe link.

Macaroon generation is refactored:

- Centralised all macaroon generation and verification logic to the
  `MacaroonGenerator`
- Moved to `synapse.utils`
- Changed the constructor to require only a `Clock`, hostname, and a secret key
  (instead of a full `Homeserver`).
- Added tests for all methods.
2022-06-14 09:12:08 -04:00
Shay
cde8af9a49
Add config flags to allow for cache auto-tuning (#12701) 2022-05-13 12:32:39 -07:00
Erik Johnston
8dd3e0e084
Immediately retry any requests that have backed off when a server comes back online. (#12500)
Otherwise it can take up to a minute for any in-flight `/send` requests to be retried.
2022-05-10 10:39:54 +01:00
Sean Quah
a50fb411b3
Update delay_cancellation to accept any awaitable (#12468)
This will mainly be useful when dealing with module callbacks, which are
all typed as returning `Awaitable`s instead of coroutines or
`Deferred`s.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-22 18:20:06 +01:00
Sean Quah
79e7c2c426
Fix edge case where a Linearizer could get stuck (#12358)
Just after a task acquires a contended `Linearizer` lock, it sleeps.
If the task is cancelled during this sleep, we need to release the lock.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 17:19:16 +01:00
Sean Quah
9c4c49991d
Update docstrings for ReadWriteLock tests (#12354)
Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 16:54:40 +01:00
Sean Quah
800ba87cc8
Refactor and convert Linearizer to async (#12357)
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.

Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.

Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 15:43:52 +01:00
Sean Quah
41b5f72677
Convert Linearizer tests from inlineCallbacks to async (#12353)
Signed-off-by: Sean Quah <seanq@element.io>
2022-04-05 14:56:09 +01:00
David Robertson
bf9d549e3a
Try to detect borked package installations. (#12244)
* Try to detect borked package installations.

Fixes #12223.

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
2022-03-18 19:03:46 +00:00