There was a bug that meant we would return the full state of the room on
incremental syncs when using lazy loaded members and there were no
entries in the timeline.
This was due to trying to use `state_filter or state_filter.all()` as a
short hand for handling `None` case, however `state_filter` implements
`__bool__` so if the state filter was empty it would be set to full.
c.f. MSC4222 and #17888
Currently, one-time-keys are issued in a somewhat random order. (In
practice, they are issued according to the lexicographical order of
their key IDs.) That can lead to a situation where a client gives up
hope of a given OTK ever being used, whilst it is still on the server.
Related: https://github.com/element-hq/element-meta/issues/2356
When entries insert in the end of timer queue, then unnecessary entry
inserted (with duplicated key).
This can lead to some timeouts expired early and consume memory.
Basically, if the client sets a special query param on `/sync` v2
instead of responding with `state` at the *start* of the timeline, we
instead respond with `state_after` at the *end* of the timeline.
We do this by using the `current_state_delta_stream` table, which is
actually reliable, rather than messing around with "state at" points on
the timeline.
c.f. MSC4222
The main change here is to add a helper function
`gather_optional_coroutines`, which works in a similar way as
`yieldable_gather_results` but takes a set of coroutines rather than a
function
Fixes#17823
While we're at it, makes a change where the redactions are sent as the
admin if the user is not a member of the server (otherwise these fail
with a "User must be our own" message).
Reset `sliding_sync_membership_snapshots` -> `forgotten` status when
membership changes (like rejoining a room).
Fix https://github.com/element-hq/synapse/issues/17781
### What was the problem before?
Previously, if someone used `/forget` on one of their rooms, it would
update `sliding_sync_membership_snapshots` as expected but when someone
rejoined the room (or had any membership change), the upsert didn't
overwrite and reset the `forgotten` status so it remained `forgotten`
and invisible down the Sliding Sync endpoint.
Spawning from @kegsay [pointing
out](https://matrix.to/#/!cnVVNLKqgUzNTOFQkz:matrix.org/$ExOO7J8uPUQSyH-9Uxc_QCa8jlXX9uK4VRtkSC0EI3o?via=element.io&via=matrix.org&via=jki.re)
that the Sliding Sync endpoint doesn't handle a large room with a lot of
state well on initial sync (requesting all state via `required_state: [
["*","*"] ]`) (it just takes forever).
After investigating further, the slow part is just
`get_events_as_list(...)` fetching all of the current state ID's out for
the room (which can be 100k+ events for rooms with a lot of membership).
This is just a slow thing in Synapse in general and the same thing
happens in Sync v2 or the `/state` endpoint.
---
The only idea I had to improve things was to use `batch_iter` to only
try fetching a fixed amount at a time instead of working with large
maps, lists, and sets. This doesn't seem to have much effect though.
There is already a `batch_iter(event_ids, 200)` in
`_fetch_event_rows(...)` for when we actually have to touch the database
and that's inside a queue to deduplicate work.
I did notice one slight optimization to use `get_events_as_list(...)`
directly instead of `get_events(...)`. `get_events(...)` just turns the
result from `get_events_as_list(...)` into a dict and since we're just
iterating over the events, we don't need the dict/map.
Fixes https://github.com/element-hq/synapse/issues/17698
This handles `required_state` changes by checking if new state has been
added to the config, and if so fetching and returning that from the
current state.
This also takes care to ensure that given a state entry S that is added,
removed and then re-added that we do *not* send S down a second time if
there have been no changes to S in the current state. This is fine for
Rust SDK (as it just remembers all state), but we might decide not to do
this behaviour in the MSC. If we decide to always send down S then its
easy enough to rip out all the code.
---------
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
- better validation on user input
- fix an early task completion
- when checking membership in rooms, check for rooms user has been
banned from as well
Two changes: a) use a batch lookup function instead of a loop, b) check
existing data to see if we already have what we need and only fetch what
we don't.
Adds the option to load the Redis password from a file, instead of
giving it in the config directly. The code is similar to how it’s done
for `registration_shared_secret_path`. I changed the example in the
documentation to represent the best practice regarding the handling of
secrets.
Reading secrets from files has the security advantage of separating the
secrets from the config. It also simplifies secrets management in
Kubernetes.
There is a bug with the `StreamChangeCache` where it would incorrectly
return that all entities had changed if asked for entities changed
*since* the earliest stream position.
Note that for streams we use the inequalities: `$min_stream_id <
stream_id <= $max_stream_id`, i.e. when we ask the stream change cache
for all things that have changed since `$stream_id` we don't care for
events that happened *at* `$stream_id`.
Specifically: `_earliest_known_stream_pos` is the position at which we
know that we'll have entries for all changes since that point, we can
use the cache for any stream IDs that equal
`_earliest_known_stream_pos`.
`_earliest_known_stream_pos` is set in three places:
- On startup we set it either to:
- the current maximum stream ID, with not prefilled values; or
- the minimum of the latest N values we pulled from the DB
- When we evict items from the bottom, we set it to the stream ID of the
evicted items.
This was changed in https://github.com/matrix-org/synapse/pull/14435,
but I think we were overly conservative there.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Based on #17765.
Basically the idea is to reduce the overhead of calling
`ObservableDeferred` in a loop. The two gains are: a) just using a list
of deferreds rather than the machinery of `ObservableDeferred`, and b)
only calling `PreseverLoggingContext` once.
`PreseverLoggingContext` in particular is expensive to call a lot as
each time it needs to call `get_thread_resource_usage` twice, so that it
an update the CPU metrics of the log context.
The notifier is quite inefficient when it has to wake up many user
streams all at once
From a silly benchmark this takes the time to notify 1M user streams
from ~30s to ~5s
This works as instead of passing *all* rooms to `record_sent_rooms` we
only need to pass rooms that were previously not in the LIVE state.
This came from a py-spy where we were spending ~10% CPU calling these
functions. Note that `record_sent_rooms` is a no-op for rooms that are
already in the `LIVE` state, so we only need to call them for
`PREVIOUSLY` or `INITIAL` rooms.