As part of the database migration to support threaded receipts, there is
a possible window in between
`73/08thread_receipts_non_null.sql.postgres` removing the original
unique constraints on `receipts_linearized` and `receipts_graph` and the
`reeipts_linearized_unique_index` and `receipts_graph_unique_index`
background updates from `72/08thread_receipts.sql` completing where
the unique constraints on `receipts_linearized` and `receipts_graph` are
missing. Any emulated upserts on these tables must therefore be
performed with a lock held, otherwise duplicate rows can end up in the
tables when there are concurrent emulated upserts. Fix the missing lock.
Note that emulated upserts no longer happen by default on sqlite, since
the minimum supported version of sqlite supports native upserts by
default now.
Finally, clean up any duplicate receipts that may have crept in before
trying to create the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes.
Signed-off-by: Sean Quah <seanq@matrix.org>
PostgreSQL 14 changed the behavior of `websearch_to_tsquery` to
improve some behaviour.
The tests were hitting those edge-cases about handling of hanging double
quotes. This fixes the tests to take into account the PostgreSQL version.
Support a unified search query syntax which leverages more of the full-text
search of each database supported by Synapse.
Supports, with the same syntax across Postgresql 11+ and Sqlite:
- quoted "search terms"
- `AND`, `OR`, `-` (negation) operators
- Matching words based on their stem, e.g. searches for "dog" matches
documents containing "dogs".
This is achieved by
- If on postgresql 11+, pass the user input to `websearch_to_tsquery`
- If on sqlite, manually parse the query and transform it into the sqlite-specific
query syntax.
Note that postgresql 10, which is close to end-of-life, falls back to using
`phraseto_tsquery`, which only supports a subset of the features.
Multiple terms separated by a space are implicitly ANDed.
Note that:
1. There is no escaping of full-text syntax that might be supported by the database;
e.g. `NOT`, `NEAR`, `*` in sqlite. This runs the risk that people might discover this
as accidental functionality and depend on something we don't guarantee.
2. English text is assumed for stemming. To support other languages, either the target
language needs to be known at the time of indexing the message (via room metadata,
or otherwise), or a separate index for each language supported could be created.
Sqlite docs: https://www.sqlite.org/fts3.html#full_text_index_queries
Postgres docs: https://www.postgresql.org/docs/11/textsearch-controls.html
While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place.
Related to
- https://github.com/matrix-org/synapse/issues/13622
- https://github.com/matrix-org/synapse/pull/13635
- https://github.com/matrix-org/synapse/issues/13676
Part of https://github.com/matrix-org/synapse/issues/13356
Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures.
With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now.
For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761
To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.
- `backfill`
- `outgoing-federation-request` `/backfill`
- `_check_sigs_and_hash_and_fetch`
- `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
- ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
- ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
- `_process_pulled_events`
- `_process_pulled_event` for each validated event
- ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
- `_get_state_ids_after_missing_prev_event`
- `outgoing-federation-request` `/state_ids`
- ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
- ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
The root node of a thread (and events related to it) are considered
"part of a thread" when validating receipts. This allows clients which
show the root node in both the main timeline and the threaded timeline
to easily send receipts in either.
Note that threaded notifications are not created for these events, these
events created notifications on the main timeline.
Applies the proper logic for unthreaded and threaded receipts to either
apply to all events in the room or only events in the same thread, respectively.
When retrieving counts of notifications segment the results based on the
thread ID, but choose whether to return them as individual threads or as
a single summed field by letting the client opt-in via a sync flag.
The summarization code is also updated to be per thread, instead of per
room.
* Update mypy and mypy-zope
* Unignore assigning to LogRecord attributes
Presumably https://github.com/python/typeshed/pull/8064 makes this ok
Cherry-picked from #13521
* Remove unused ignores due to mypy ParamSpec fixes
https://github.com/python/mypy/pull/12668
Cherry-picked from #13521
* Remove additional unused ignores
* Fix new mypy complaints related to `assertGreater`
Presumably due to https://github.com/python/typeshed/pull/8077
* Changelog
* Reword changelog
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
There is no need to grab thousands of backfill points when we only need 5 to make the `/backfill` request with. We need to grab a few extra in case the first few aren't visible in the history.
Previously, we grabbed thousands of backfill points from the database, then sorted and filtered them in the app. Fetching the 4.6k backfill points for `#matrix:matrix.org` from the database takes ~50ms - ~570ms so it's not like this saves a lot of time 🤷. But it might save us more time now that `get_backfill_points_in_room`/`get_insertion_event_backward_extremities_in_room` are more complicated after https://github.com/matrix-org/synapse/pull/13635
This PR moves the filtering and limiting to the SQL query so we just have less data to work with in the first place.
Part of https://github.com/matrix-org/synapse/issues/13356
Fix https://github.com/matrix-org/synapse/issues/13856
Fix https://github.com/matrix-org/synapse/issues/13865
> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](https://github.com/matrix-org/complement/pull/214#discussion_r741678240). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to https://github.com/matrix-org/synapse/issues/13625)
>
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
>
> *-- https://github.com/matrix-org/synapse/issues/13856*
### The problem
`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).
Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.
Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
This avoids doing work that will never be used (since the
resulting unread counts will never be sent in a /sync
response).
The negative of doing this is that unread counts will be
incorrect when the feature is initially enabled.
This adds support for the stable identifiers of MSC2285 while
continuing to support the unstable identifiers behind the configuration
flag. These will be removed in a future version.
Avoid blocking on full state in `_resolve_state_at_missing_prevs` and
return a new flag indicating whether the resolved state is partial.
Thread that flag around so that it makes it into the event context.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.
Signed off by Nick @ Beeper (@Fizzadar)
All tests are prefixed with `STALE_` and therefore they are silently
skipped. They were moved to `STALE_` in version `v0.5.0` in commit
2fcce3b3c5 - `Remove stale tests`.
Tests from `RoomEventsStoreTestCase` class are not used for last 8
years, I believe the best would be to remove them entirely.
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
==============================
Bugfixes
--------
- Update the version of the [ldap3 plugin](https://github.com/matrix-org/matrix-synapse-ldap3/) included in the `matrixdotorg/synapse` DockerHub images and the Debian packages hosted on `packages.matrix.org` to 0.2.1. This fixes [a bug](https://github.com/matrix-org/matrix-synapse-ldap3/pull/163) with usernames containing uppercase characters. ([\#13156](https://github.com/matrix-org/synapse/issues/13156))
- Fix a bug introduced in Synapse 1.62.0rc1 affecting unread counts for users on small servers. ([\#13168](https://github.com/matrix-org/synapse/issues/13168))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEgQG31Z317NrSMt0QiISIDS7+X/QFAmLDDVgACgkQiISIDS7+
X/Q+KQ//WuWB9hfAW8XEyYHWox95zaITsAzY/TTG1IXygAMjgEk2+9utdRaX3wbk
YDaZCeEw+vbK/3w/lt1RzI30K3uVCZVcW2DTQr1Qi4B+UWLOlCsVfOT9LcMvNoJe
ww/cOK6RpPgTlqk5ij0MtdjfWkAJeToi7ESMooORxhWFm3Zd8e5BpbNv89WUBZhk
zCqCjIdjSF+Mwk8NwmU1iJi5JQY/+Xl51uk2+wGIAe4vtgPTz7PJmoPF1E6nGGVF
9OYdlWU4H7u6js8n05QL2jKtX34uszCo2hwoW2aFPPmF0B2CFEV6WFBiDOppLZ1g
ZMJv1s/34RXoBu8pAuJnq2BZkWxu99LRmPV+f/R+S0jDT1MH9tdSdhfcGu7iH/Y9
uguGX3OOlxnkUb5o825Xt3mvBcVaTGY+sspFtB12RtXmWRdll/Hq6w11ZN5f6qDy
Nr/DuoPjMAH7kzelFn/GpP6K8zX8iYjf0lLCyrbYV7OYAI6/I+Vao+sT2ctHD1T8
s4aTTx1bEl23mo/RiqH2fRHaPhBjZKW0uv6iRNqDE2ThYPAXinVtt7MiUU0QGco5
vMca/RZBkEj0Lov0AleBx4XRXlBTyq5BX2V1frYLenKp42bDzN9sgsPAOPeKieHW
qjr+Ti9i47wGADXs2GI/mke/C8jlONEKJm/v8mwXItn8Za7wBJc=
=SpI6
-----END PGP SIGNATURE-----
Merge tag 'v1.62.0rc3' into develop
Synapse 1.62.0rc3 (2022-07-04)
==============================
Bugfixes
--------
- Update the version of the [ldap3 plugin](https://github.com/matrix-org/matrix-synapse-ldap3/) included in the `matrixdotorg/synapse` DockerHub images and the Debian packages hosted on `packages.matrix.org` to 0.2.1. This fixes [a bug](https://github.com/matrix-org/matrix-synapse-ldap3/pull/163) with usernames containing uppercase characters. ([\#13156](https://github.com/matrix-org/synapse/issues/13156))
- Fix a bug introduced in Synapse 1.62.0rc1 affecting unread counts for users on small servers. ([\#13168](https://github.com/matrix-org/synapse/issues/13168))
Fixes#11887 hopefully.
The core change here is that `event_push_summary` now holds a summary of counts up until a much more recent point, meaning that the range of rows we need to count in `event_push_actions` is much smaller.
This needs two major changes:
1. When we get a receipt we need to recalculate `event_push_summary` rather than just delete it
2. The logic for deleting `event_push_actions` is now divorced from calculating `event_push_summary`.
In future it would be good to calculate `event_push_summary` while we persist a new event (it should just be a case of adding one to the relevant rows in `event_push_summary`), as that will further simplify the get counts logic and remove the need for us to periodically update `event_push_summary` in a background job.
* Update worker docs to remove group endpoints.
* Removes an unused parameter to `ApplicationService`.
* Break dependency between media repo and groups.
* Avoid copying `m.room.related_groups` state events during room upgrades.
Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing.
The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching).
One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used.
Part of #12684
I've seen a few errors which can only plausibly be explained by the calculated
event id for an event being different from the ID of the event in the
database. It should be cheap to check this, so let's do so and raise an
exception.
When configuring the return values of mocks, prefer awaitables from
`make_awaitable` over `defer.succeed`. `Deferred`s are only awaitable
once, so it is inappropriate for a mock to return the same `Deferred`
multiple times.
Also update `run_in_background` to support functions that return
arbitrary awaitables.
Signed-off-by: Sean Quah <seanq@element.io>
Multiple calls to `EventsWorkerStore._get_events_from_cache_or_db` can
reuse the same database fetch, which is initiated by the first call.
Ensure that cancelling the first call doesn't cancel the other calls
sharing the same database fetch.
Signed-off-by: Sean Quah <seanq@element.io>
When we join a room via the faster-joins mechanism, we end up with "partial
state" at some points on the event DAG. Many parts of the codebase need to
wait for the full state to load. So, we implement a mechanism to keep track of
which events have partial state, and wait for them to be fully-populated.
This is a first step in dealing with #7721.
The idea is basically that rather than calculating the full set of users a device list update needs to be sent to up front, we instead simply record the rooms the user was in at the time of the change. This will allow a few things:
1. we can defer calculating the set of remote servers that need to be poked about the change; and
2. during `/sync` and `/keys/changes` we can avoid also avoid calculating users who share rooms with other users, and instead just look at the rooms that have changed.
However, care needs to be taken to correctly handle server downgrades. As such this PR writes to both `device_lists_changes_in_room` and the `device_lists_outbound_pokes` table synchronously. In a future release we can then bump the database schema compat version to `69` and then we can assume that the new `device_lists_changes_in_room` exists and is handled.
There is a temporary option to disable writing to `device_lists_outbound_pokes` synchronously, allowing us to test the new code path does work (and by implication upgrading to a future release and downgrading to this one will work correctly).
Note: Ideally we'd do the calculation of room to servers on a worker (e.g. the background worker), but currently only master can write to the `device_list_outbound_pokes` table.
There are a bunch of places we call get_success on an immediate value, which is unnecessary. Let's rip them out, and remove the redundant functionality in get_success and friends.
Switching to a sequence means there's no need to track `last_txn` on the
AS state table to generate new TXN IDs. This also means that there is
no longer contention between the AS scheduler and AS handler on updates
to the `application_services_state` table, which will prevent serialization
errors during the complete AS txn transaction.
To handle cancellation, we ensure that `after_callback`s and
`exception_callback`s are always run, since the transaction will
complete on another thread regardless of cancellation.
We also wait until everything is done before releasing the
`CancelledError`, so that logging contexts won't get used after they
have been finished.
Signed-off-by: Sean Quah <seanq@element.io>
The unstable identifiers are still supported if the experimental configuration
flag is enabled. The unstable identifiers will be removed in a future release.
Don't attempt to add non-string `value`s to `event_search` and add a
background update to clear out bad rows from `event_search` when
using sqlite.
Signed-off-by: Sean Quah <seanq@element.io>