* Check required power levels earlier in createRoom handler.
- If a server was configured to reject the creation of rooms with E2EE
enabled (by specifying an unattainably high power level for
"m.room.encryption" in default_power_level_content_override), the 403
error was not being triggered until after the room was created and
before the "m.room.power_levels" was sent. This allowed a user to
access the partially-configured room and complete the setup of E2EE
and power levels manually.
- This change causes the power level overrides to be checked earlier and
the request to be rejected before the user gains access to the room.
- A new `_validate_room_config` method is added to contain checks that
should be run before a room is created.
- The new test case confirms that a user request is rejected by the new
validation method.
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
* Add a changelog file.
* Formatting fix for black.
* Remove unneeded line from test.
---------
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
There appears to be a race where you can end up with entries in
`event_push_summary` with both a `NULL` and `main` thread ID.
Fixes#15736
Introduced in #15597
See https://github.com/matrix-org/synapse/pull/14095#discussion_r990335492
This is useful because when see that a relevant event is an `outlier` or `soft-failed`, then that's a good unexpected indicator explaining why it's not showing up. `filter_events_for_client` is used in `/sync`, `/messages`, `/context` which are all common end-to-end assertion touch points (also notifications, relations).
Implements stable support for MSC3882; this involves updating Synapse's support to
match the MSC / the spec says.
Continue to support the unstable version to allow clients to transition.
The stubs have some issues so this has some generous cast
and ignores in it, but it is better than not having stubs.
Note that confusing that Element is a function which creates
_Element instances (and similarly for Comment).
Updates the database schema to require a thread_id (by adding a
constraint that the column is non-null) for event_push_actions,
event_push_actions_staging, and event_push_actions_summary.
For PostgreSQL we add the constraint as NOT VALID, then
VALIDATE the constraint a background job to avoid locking
the table during an upgrade.
Each table is updated as a separate schema delta to avoid
deadlocks between them.
For SQLite we simply rebuild the table & copy the data.
* Fix#15669: always populate instance map even if it was empty
* Fix some tests
* Fix more tests
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* CI fix: don't forget to update apt repository sources before installing olddeps deps
* Add test testing the backwards compatibility
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
The cached decorators always return a Deferred, which was not
properly propagated. It was close enough when wrapping coroutines,
but failed if a bare function was wrapped.
```
2023-05-21 09:30:09,288 - synapse.logging.opentracing - 940 - ERROR - POST-1 - @trace may not have wrapped StateStorageController.get_state_for_groups correctly! The function is not async but returned a coroutine
```
Tracing instrumentation for these functions originally introduced in https://github.com/matrix-org/synapse/pull/15610
This moves the deactivated user check to the method which
all login types call.
Additionally updates the application service tests to be more
realistic by removing invalid tests and fixing server names.
All the information needed is already in the `instance_map`, so
use that instead of passing the hostname / IP & port manually
for each replication request.
This consolidates logic for future improvements of using e.g.
UNIX sockets for workers.
The event_fields property in filters should use the proper
escape rules, namely backslashes can be escaped with
an additional backslash.
This adds tests (adapted from matrix-js-sdk) and implements
the logic to properly split the event_fields strings.
Instrument `state` and `state_group` storage related things (tracing) so it's a little more clear where these database transactions are coming from as there is a lot of wires crossing in these functions.
Part of `/messages` performance investigation: https://github.com/matrix-org/synapse/issues/13356
R30v2 has been out since 2021-07-19 (https://github.com/matrix-org/synapse/pull/10332)
and we started collecting stats on 2021-08-16. Since it's been over a year now
(almost 2 years), this is enough grace period for us to now rip it out.
Synapse will no longer send (or respond to) the unstable flags
for faster joins. These were only available behind a configuration
flag and handled in parallel with the stable flags.
This change fixes two memory leaks during `trial` test runs.
Garbage collection is disabled during each test case and a gen-0 GC is
run at the end of each test. However, when the gen-0 GC is run, the
`TestCase` object usually still holds references to the `HomeServer`
used during the test. As a result, the `HomeServer` gets promoted to
gen-1 and then never garbage collected.
Fix this by periodically running full GCs.
Additionally, fix `HomeServer`s leaking after tests that touch inbound
federation due to `FederationRateLimiter`s adding themselves to a global
set, by turning the set into a `WeakSet`.
Resolves#15622.
Signed-off-by: Sean Quah <seanq@matrix.org>
If the previous read marker is pointing to an event that no longer exists
(e.g. due to retention) then assume that the newly given read marker
is newer.
To track changes in MSC2666:
- The change from `/mutual_rooms/{user_id}` to `/mutual_rooms?user_id={user_id}`.
- The addition of `next_batch_token` (and logic).
- Unstable flag now being `uk.half-shot.msc2666.query_mutual_rooms`.
- The error code when your own user is requested.
The second argument of `ConfigError` is a path, passed as an optional
`Iterable[str]` and not a `str`. If a string is passed directly,
Synapse unhelpfully emits "Error in configuration at
a.p.p._.s.e.r.v.i.c.e._.c.o.n.f.i.g._.f.i.l.e.s'" when the config
option has the wrong data type.
Signed-off-by: Sean Quah <seanq@matrix.org>
There are two situations which were previously not properly checked:
1. If the requested URL was replaced with an oEmbed URL, then the
oEmbed URL was not checked against url_preview_url_blacklist.
2. Follow-up URLs (either via autodiscovery of oEmbed or to pre-cache
images) were not checked against url_preview_url_blacklist.
* Usage that is compatible with Python 3.8 and 3.11
> Since Python 3.10, instead of passing value and tb, an exception object can
be passed as the first argument. If value and tb are provided, the first
argument is ignored in order to provide backwards compatibility.
>
> -- https://docs.python.org/3/library/traceback.html
* Add changelog
Fix the following `mypy` errors when running `mypy` with Python 3.7:
```
synapse/storage/controllers/stats.py:58: error: "Counter" is not subscriptable, use "typing.Counter" instead [misc]
tests/test_state.py:267: error: "dict" is not subscriptable, use "typing.Dict" instead [misc]
```
Part of https://github.com/matrix-org/synapse/issues/15603
In Python 3.9, `typing` is deprecated and the types are subscriptable (generics) by default, https://peps.python.org/pep-0585/#implementation
* Usage that is compatible with Python 3.8 and 3.11
> Since Python 3.10, instead of passing value and tb, an exception object can
be passed as the first argument. If value and tb are provided, the first
argument is ignored in order to provide backwards compatibility.
>
> -- https://docs.python.org/3/library/traceback.html
* Add changelog
MSC3389 proposes protecting the relation type & parent event ID
from redaction. This keeps the relation information intact after
redaction which helps with some UX flaws (e.g. deleting an
event causes it to no longer be in a thread, which is confusing).
Adds logging for key server requests which include a key ID.
This is technically in violation of the 1.6 spec, but is the only
way to remain backwards compatibly with earlier versions of
Synapse (and possibly other homeservers) which *did* include
the key ID.
I found the error in the **Before** really vague and obtuse and didn't realize port `5432` corresponded to the Postgres port until searching the codebase. It says to check the logs but that wasn't my first instinct. It's just more obvious if we just print the full thing which gives context of the error type and the traceback to the relevant area of code.
#### Before
```
$ poetry run python -m synapse.app.homeserver -c homeserver.yaml
**********************************************************************************
Error during initialisation:
connection to server at "localhost" (::1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
There may be more information in the logs.
**********************************************************************************
```
#### After
```sh
$ poetry run python -m synapse.app.homeserver -c homeserver.yaml
**********************************************************************************
Error during initialisation:
Traceback (most recent call last):
File "/home/eric/Documents/github/element/synapse/synapse/app/homeserver.py", line 352, in setup
hs.setup()
File "/home/eric/Documents/github/element/synapse/synapse/server.py", line 337, in setup
self.datastores = Databases(self.DATASTORE_CLASS, self)
File "/home/eric/Documents/github/element/synapse/synapse/storage/databases/__init__.py", line 65, in __init__
with make_conn(database_config, engine, "startup") as db_conn:
File "/home/eric/Documents/github/element/synapse/synapse/storage/database.py", line 161, in make_conn
native_db_conn = engine.module.connect(**db_params)
File "/home/eric/.cache/pypoetry/virtualenvs/matrix-synapse-xCtC9ulO-py3.10/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "localhost" (::1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
There may be more information in the logs.
**********************************************************************************
```
* Add SSL options to redis config
* fix lint issues
* Add documentation and changelog file
* add missing . at the end of the changelog
* Move client context factory to new file
* Rename ssl to tls and fix typo
* fix lint issues
* Added when redis attributes were added
* Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master.
* Fix typo in drive by.
* Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place)
* Several updates:
1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future.
2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map.
3. Clean up old comments/commented out code.
4. Adjust unit tests to match with new code.
5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template.
* Initial Docs upload
* Changelog
* Missed some commented out code that can go now
* Remove TODO comment that no longer holds true.
* Fix links in docs
* More docs
* Remove debug logging
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Update version to latest, include completeish before/after examples in upgrade notes.
* Fix up and docs too
---------
Co-authored-by: reivilibre <olivier@librepush.net>
Separate out a HTTP client for replication in preparation for
also supporting using UNIX sockets. The major difference from
the base class is that this does not use treq to handle HTTP
requests.
This stops media (and thumbnails) from being accessed from the
listed domains. It does not delete any already locally cached media,
but will prevent accessing it.
Note that admin APIs are unaffected by this change.
m.push_rules, like m.fully_read, is a special account data type that cannot
be set using the normal /account_data endpoint. Return an error instead
of allowing data that will not be used to be stored.
MSC3984 proxies /keys/query requests to appservices, but servers will
can also requests devices / keys from the /user/devices endpoint.
The formats are close enough that we can "proxy" that /user/devices to
appservices (by calling /keys/query) and then change the format of the
returned data before returning it over federation.
Behind a configuration flag this adds + to the list of allowed
characters in Matrix IDs. The main feature this enables is
using full E.164 phone numbers as Matrix IDs.
Add an `is_mine_server_name` method, similar to `is_mine_id`.
Ideally we would use this consistently, instead of sometimes comparing
against `hs.hostname` and other times reaching into
`hs.config.server.server_name`.
Also fix a bug in the tests where `hs.hostname` would sometimes differ
from `hs.config.server.server_name`.
Signed-off-by: Sean Quah <seanq@matrix.org>
A dont_notify action is a no-op (and coalesce is undefined). These are
both considered no-ops by the spec, per MSC3987 and the predefined
push rules were updated to remove dont_notify from the list of actions.
It seems that YouTube Short previews do not work in some
regions, but the oEmbed information for those areas is still
valid.
This causes YouTube Shorts to always use (only) the oEmbed
endpoint which is a minor regression for regions where the URL
preview was already working -- some of the additional video
metadata is lost. It is not likely that clients are using this today
and it is more beneficial to have a limited preview working everywhere
than unused metadata in the Open Graph response.
Enforce that we use index scans (rather than seq scans), which we also do for state queries. The reason to enforce this is that we can't correctly get PostgreSQL to understand the distribution of `stream_ordering` depends on `highlight`, and so it always defaults (on matrix.org) to sequential scans.
#15514 introduced a regression where Synapse would encounter
`PartialDownloadError`s when fetching OpenID metadata for certain
providers on startup. Due to #8088, this prevents Synapse from starting
entirely.
Revert the change while we decide what to do about the regression.
Updates the database schema to require a thread_id (by adding a
constraint that the column is non-null) for event_push_actions,
event_push_actions_staging, and event_push_actions_summary.
For PostgreSQL we add the constraint as NOT VALID, then
VALIDATE the constraint a background job to avoid locking
the table during an upgrade.
For SQLite we simply rebuild the table & copy the data.
Pushers tend to make many connections to the same HTTP host
(e.g. a new event comes in, causes events to be pushed, and then
the homeserver connects to the same host many times). Due to this
the per-host HTTP connection pool size was increased, but this does
not make sense for other SimpleHttpClients.
Add a parameter for the connection pool and override it for pushers
(making a separate SimpleHttpClient for pushers with the increased
configuration).
This returns the HTTP connection pool settings to the default Twisted
ones for non-pusher HTTP clients.
Adds an optional keyword argument to the /relations API which
will recurse a limited number of event relationships.
This will cause the API to return not just the events related to the
parent event, but also events related to those related to the parent
event, etc.
This is disabled by default behind an experimental configuration
flag and is currently implemented using prefixed parameters.
MSC3983 provides a way to request multiple OTKs at once from appservices,
this extends this concept to the Client-Server API.
Note that this will likely be spit out into a separate MSC, but is currently part of
MSC3983.
Cleans-up the schema delta files:
* Removes no-op functions.
* Adds missing type hints to function parameters.
* Fixes any issues with type hints.
This also renames one (very old) schema delta to avoid a conflict
that mypy complains about.
It can be useful to always return the fallback key when attempting to
claim keys. This adds an unstable endpoint for `/keys/claim` which
always returns fallback keys in addition to one-time-keys.
The fallback key(s) are not marked as "used" unless there are no
corresponding OTKs.
This is currently defined in MSC3983 (although likely to be split out
to a separate MSC). The endpoint shape may change or be requested
differently (i.e. a keyword parameter on the current endpoint), but the
core logic should be reasonable.
Before this change:
* `PerspectivesKeyFetcher` and `ServerKeyFetcher` write to `server_keys_json`.
* `PerspectivesKeyFetcher` also writes to `server_signature_keys`.
* `StoreKeyFetcher` reads from `server_signature_keys`.
After this change:
* `PerspectivesKeyFetcher` and `ServerKeyFetcher` write to `server_keys_json`.
* `PerspectivesKeyFetcher` also writes to `server_signature_keys`.
* `StoreKeyFetcher` reads from `server_keys_json`.
This results in `StoreKeyFetcher` now using the results from `ServerKeyFetcher`
in addition to those from `PerspectivesKeyFetcher`, i.e. keys which are directly
fetched from a server will now be pulled from the database instead of refetched.
An additional minor change is included to avoid creating a `PerspectivesKeyFetcher`
(and checking it) if no `trusted_key_servers` are configured.
The overall impact of this should be better usage of cached results:
* If a server has no trusted key servers configured then it should reduce how often keys
are fetched.
* if a server's trusted key server does not have a requested server's keys cached then it
should reduce how often keys are directly fetched.
* Switch InstanceLocationConfig to a pydantic BaseModel, apply Strict* types and add a few helper methods(that will make more sense in follow up work).
Co-authored-by: David Robertson <davidr@element.io>
* More precise type for LoggingTransaction.execute
* Add an annotation for stream_ordering_month_ago
This would have spotted the error that was fixed in "Add comma missing from #15382. (#15429)"
c.f. #15264
The two changes are:
1. Add indexes so that the select / deletes don't do sequential scans
2. Don't repeatedly call `SELECT count(*)` each iteration, as that's slow
The registration fallback is broken and unspecced. This removes it
since there is no plan to spec it.
Note that this does not modify the login fallback code.
* Change `store_server_verify_keys` to take a `Mapping[(str, str), FKR]`
This is because we already can't handle duplicate keys — leads to cardinality violation
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
This moves `redacts` from being a top-level property to
a `content` property in a new room version.
MSC2176 (which was previously implemented) states to not
`redact` this property.
* raise a ConfigError on an invalid app_service_config_files
* changelog
* Move config check to read_config
* Add test
* Ensure list also contains strings
This change fixes a rare bug where initial /syncs would fail with a
`KeyError` under the following circumstances:
1. A user fast joins a remote room.
2. The user is kicked from the room before the room's full state has
been synced.
3. A second local user fast joins the room.
4. Events are backfilled into the room with a higher topological
ordering than the original user's leave. They are assigned a
negative stream ordering. It's not clear how backfill happened here,
since it is expected to be equivalent to syncing the full state.
5. The second local user leaves the room before the room's full state
has been synced. The homeserver does not complete the sync.
6. The original user performs an initial /sync with lazy_load_members
enabled.
* Because they were kicked from the room, the room is included in
the /sync response even though the include_leave option is not
specified.
* To populate the room's timeline, `_load_filtered_recents` /
`get_recent_events_for_room` fetches events with a lower stream
ordering than the leave event and picks the ones with the highest
topological orderings (which are most recent). This captures the
backfilled events after the leave, since they have a negative
stream ordering. These events are filtered out of the timeline,
since the user was not in the room at the time and cannot view
them. The sync code ends up with an empty timeline for the room
that notably does not include the user's leave event.
This seems buggy, but at least we don't disclose events the user
isn't allowed to see.
* Normally, `compute_state_delta` would fetch the state at the
start and end of the room's timeline to generate the sync
response. Since the timeline is empty, it fetches the state at
`min(now, last event in the room)`, which corresponds with the
second user's leave. The state during the entirety of the second
user's membership does not include the membership for the first
user because of partial state.
This part is also questionable, since we are fetching state from
outside the bounds of the user's membership.
* `compute_state_delta` then tries and fails to find the user's
membership in the auth events of timeline events. Because there
is no timeline event whose auth events are expected to contain
the user's membership, a `KeyError` is raised.
Also contains a drive-by fix for a separate unlikely race condition.
Signed-off-by: Sean Quah <seanq@matrix.org>
This uses the specced /_matrix/app/v1/... paths instead of the
"legacy" paths. If the homeserver receives an error it will retry
using the legacy path.
* Add IReactorUNIX to ISynapseReactor type hint.
* Create listen_unix().
Two options, 'path' to the file and 'mode' of permissions(not umask, recommend 666 as default as
nginx/other reverse proxies write to it and it's setup as user www-data)
For the moment, leave the option to always create a PID lockfile turned on by default
* Create UnixListenerConfig and wire it up.
Rename ListenerConfig to TCPListenerConfig, then Union them together into ListenerConfig.
This spidered around a bit, but I think I got it all. Metrics and manhole have been placed
behind a conditional in case of accidental putting them onto a unix socket.
Use new helpers to get if a listener is configured for TLS, and to help create a site tag
for logging.
There are 2 TODO things in parse_listener_def() to finish up at a later point.
* Refactor SynapseRequest to handle logging correctly when using a unix socket.
This prevents an exception when an IP address can not be retrieved for a request.
* Make the 'Synapse now listening on Unix socket' log line a little prettier.
* No silent failures on generic workers when trying to use a unix socket with metrics or manhole.
* Inline variables in app/_base.py
* Update docstring for listen_unix() to remove reference to a hardcoded permission of 0o666 and add a few comments saying where the default IS declared.
* Disallow both a unix socket and a ip/port combo on the same listener resource
* Linting
* Changelog
* review: simplify how listen_unix returns(and get rid of a type: ignore)
* review: fix typo from ConfigError in app/homeserver.py
* review: roll conditional for http_options.tag into get_site_tag() helper(and add docstring)
* review: enhance the conditionals for checking if a port or path is valid, remove a TODO line
* review: Try updating comment in get_client_ip_if_available to clarify what is being retrieved and why
* Pretty up how 'Synapse now listening on Unix Socket' looks by decoding the byte string.
* review: In parse_listener_def(), raise ConfigError if neither socket_path nor port is declared(and fix a typo)
* Revert "Fix registering a device on an account with lots of devices (#15348)"
This reverts commit f0d8f66eaa.
* Revert "Delete stale non-e2e devices for users, take 3 (#15183)"
This reverts commit 78cdb72cd6.
Clean-up from adding the thread_id column, which was initially
null but backfilled with values. It is desirable to require it to now
be non-null.
In addition to altering this column to be non-null, we clean up
obsolete background jobs, indexes, and just-in-time updating
code.
If enabled, for users which are exclusively owned by an application
service then the appservice will be queried for devices in addition
to any information stored in the Synapse database.
Previously, we would spin in a tight loop until
`update_state_for_partial_state_event` stopped raising
`FederationPullAttemptBackoffError`s. Replace the spinloop with a wait
until the backoff period has expired.
Signed-off-by: Sean Quah <seanq@matrix.org>
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
* Fix joining rooms you have been unbanned from
Since forever synapse did not allow you to join a room after you have
been unbanned from it over federation. This was not actually because of
the unban event not federating. Synapse simply used outdated state to
validate the join transition. This skips the validation if we are not in
the room and for that reason won't have the current room state.
Fixes#1563
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
* Add changelog
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
* Update changelog.d/15323.bugfix
---------
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Experimental support for MSC3983 is behind a configuration flag.
If enabled, for users which are exclusively owned by an application
service then the appservice will be queried for one-time keys *if*
there are none uploaded to Synapse.
This makes it so that we rely on the `device_id` to delete pushers on logout,
instead of relying on the `access_token_id`. This ensures we're not removing
pushers on token refresh, and prepares for a world without access token IDs
(also known as the OIDC).
This actually runs the `set_device_id_for_pushers` background update, which
was forgotten in #13831.
Note that for backwards compatibility it still deletes pushers based on the
`access_token` until the background update finishes.
Invalid mentions data received over the Client-Server API should
be rejected with a 400 error. This will hopefully stop clients from
sending invalid data, although does not help with data received
over federation.
* Add `event_stream_ordering` column to membership state tables
Specifically this adds the column to `current_state_events`,
`local_current_membership` and `room_memberships`. Each of these tables
is regularly joined with the `events` table to get the stream ordering
and denormalising this into each table will yield significant query
performance improvements once used.
* Make denormalised `event_stream_ordering` columns foreign keys
* Add comment in schema file explaining new denormalised columns
* Add triggers to enforce consistency of `event_stream_ordering` columns
* Re-order purge room tables to account for foreign keys
* Bump schema version to 75
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Additionally:
* Consistently use `freeze()` in test
---------
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: 6543 <6543@obermui.de>
* Have replication clients remove _INT_STREAM_POS
Suppose worker A makes an internal http request from worker B. B may
make changes that A later learns about over replication. We want A's
request to block until it has seen those changes—mainly to ensure A's
caches are invalidated promptly. This helps provide read-after-write
consistency, eliminating entire categories of races and test flakes.
To implement this, B includes a top-level field `_INT_STREAM_POS` in its
response JSON. Roughly speaking, the field's value tells A what to wait
for. But we weren't removing that internal field before A's request
completed!
Introduced in https://github.com/matrix-org/synapse/pull/14820.
Fixes#15308.
* Changelog
When a room is deleted in Synapse we remove the event forward
extremities in the room, so if (say a bot) tries to send a message into
the room we error out due to not being able to calculate prev events for
the new event *before* we check if the sender is in the room.
Fixes#8094
With Redis commands do not need to be re-issued by the main
process (they fan-out to all processes at once) and thus it is no
longer necessary to worry about them reflecting recursively forever.
* Scaffolding for background process to refresh profiles
* Add scaffolding for background process to refresh profiles for a given server
* Implement the code to select servers to refresh from
* Ensure we don't build up multiple looping calls
* Make `get_profile` able to respect backoffs
* Add logic for refreshing users
* When backing off, schedule a refresh when the backoff is over
* Wake up the background processes when we receive an interesting state event
* Add tests
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add comment about 1<<62
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Remove special-case method for new memberships only, use more generic method
* Only collect profiles from state events in public rooms
* Add a table to track stale remote user profiles
* Add store methods to set and delete rows in this new table
* Mark remote profiles as stale when a member state event comes in to a private room
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Simplify by removing Optionality of `event_id`
* Replace names and avatars with None if they're set to dodgy things
I think this makes more sense anyway.
* Move schema delta to 74 (I missed the boat?)
* Turns out these can be None after all
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
It is not necessary to reach out to the database to check some
parameters if the auto-join rooms are not configured, or (in some cases)
if auto-create rooms is not configured.
* Tweak docstring and type hint
* Flip logic and provide better name
* Separate decision from action
* Track a set of strings, not EventBases
* Require explicit boolean options from callers
* Add explicit option for partial state rooms
* Changelog
* Rename param
When pushing events in partial state rooms down incremental /sync, we
try to find the `m.room.member` state event for their senders by digging
through their auth events, so that we can present the membership to the
client. Events usually have a membership event in their auth events,
with the exception of the `m.room.create` event and a user's first join
into the room.
When implementing #13477, we took the case of a user's first join into
account, but forgot to handle the `m.room.create` case. This change
fixes that.
Signed-off-by: Sean Quah <seanq@matrix.org>
This removes the experimental configuration option and
always escapes the push rule condition keys.
Also escapes any (experimental) push rule condition keys
in the base rules which contain dot in a field name.
Enables MSC3925 support by default, which:
* Includes the full edit event in the bundled aggregations of an
edited event.
* Stops modifying the original event's content to return the new
content from the edit event.
This is a backwards-incompatible change that is considered to be
"correct" by the spec.
AbstractStreamIdTracker (now) has only a single sub-class: AbstractStreamIdGenerator,
combine them to simplify some code and remove any direct references to
AbstractStreamIdTracker.
This replaces the specific `is_user_mention` push rule condition
used in MSC3952 with the generic `exact_event_property_contains`
push rule condition from MSC3966.
It turns out that no clients rely on server-side aggregation of `m.annotation`
relationships: it's just not very useful as currently implemented.
It's also non-trivial to calculate.
I want to remove it from MSC2677, so to keep the implementation in line, let's
remove it here.
* Admin api to delete event report
* lint + tests
* newsfile
* Apply suggestions from code review
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* revert changes - move to WorkerStore
* update unit test
* Note that timestamp is in millseconds
---------
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Removes the `v1` directory from `test.rest.media.v1`.
* Moves the non-REST code from `synapse.rest.media.v1` to `synapse.media`.
* Flatten the `v1` directory from `synapse.rest.media`, but leave compatiblity
with 3rd party media repositories and spam checkers.
* Fix a long-standing bug where non-ASCII characters in search terms,
including accented letters, would not match characters in a different
case.
* Fix a long-standing bug where search terms using combining accents
would not match display names using precomposed accents and vice
versa.
To fully take effect, the user directory must be rebuilt after this
change.
Fixes#14630.
Signed-off-by: Sean Quah <seanq@matrix.org>
Previously if an autodiscovered oEmbed request failed (e.g. the
oEmbed endpoint is down or does not exist) then the entire URL
preview would fail. Instead we now return everything we can, even
if this additional request fails.
Ideally we would replace this with parsing of the Accept header
or something else, but for now just make Synapse spec compliant
by ignoring the unspecced parameter.
It does not seem that this is ever sent by a client, and even if it is
there's a reasonable fallback.
* Change `create_room` return type
* Don't return room alias from /createRoom
* Update other callsites
* Fix up mypy complaints
It looks like new_room_user_id is None iff new_room_id is None. It's a
shame we haven't expressed this in a way that mypy can understand.
* Changelog
* Sort BOOLEAN_COLUMNS and APPEND_ONLY_TABLES
So I can see if a given table is present in logarithmic time, rather
than linear.
* Teach portdb about `un_partial_stated_event_streams`
* Comments comments comments
* Changelog
Previously, when creating a join event in /make_join, we would decide
whether to include additional fields to satisfy restricted room checks
based on the current state of the room. Then, when building the event,
we would capture the forward extremities of the room to use as prev
events.
This is subject to race conditions. For example, when leaving and
rejoining a room, the following sequence of events leads to a misleading
403 response:
1. /make_join reads the current state of the room and sees that the user
is still in the room. It decides to omit the field required for
restricted room joins.
2. The leave event is persisted and the room's forward extremities are
updated.
3. /make_join builds the event, using the post-leave forward extremities.
The event then fails the restricted room checks.
To mitigate the race, we move the read of the forward extremities closer
to the read of the current state. Ideally, we would compute the state
based off the chosen prev events, but that can involve state resolution,
which is expensive.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Update mypy and mypy-zope
* Remove unused ignores
These used to suppress
```
synapse/storage/engines/__init__.py:28: error: "__new__" must return a
class instance (got "NoReturn") [misc]
```
and
```
synapse/http/matrixfederationclient.py:1270: error: "BaseException" has no attribute "reasons" [attr-defined]
```
(note that we check `hasattr(e, "reasons")` above)
* Avoid empty body warnings, sometimes by marking methods as abstract
E.g.
```
tests/handlers/test_register.py:58: error: Missing return statement [empty-body]
tests/handlers/test_register.py:108: error: Missing return statement [empty-body]
```
* Suppress false positive about `JaegerConfig`
Complaint was
```
synapse/logging/opentracing.py:450: error: Function "Type[Config]" could always be true in boolean context [truthy-function]
```
* Fix not calling `is_state()`
Oops!
```
tests/rest/client/test_third_party_rules.py:428: error: Function "Callable[[], bool]" could always be true in boolean context [truthy-function]
```
* Suppress false positives from ParamSpecs
````
synapse/logging/opentracing.py:971: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]" [arg-type]
synapse/logging/opentracing.py:1017: error: Argument 2 to "_custom_sync_async_decorator" has incompatible type "Callable[[Arg(Callable[P, R], 'func'), **P], _GeneratorContextManager[None]]"; expected "Callable[[Callable[P, R], **P], _GeneratorContextManager[None]]" [arg-type]
````
* Drive-by improvement to `wrapping_logic` annotation
* Workaround false "unreachable" positives
See https://github.com/Shoobx/mypy-zope/issues/91
```
tests/http/test_proxyagent.py:626: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:762: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:826: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:838: error: Statement is unreachable [unreachable]
tests/http/test_proxyagent.py:845: error: Statement is unreachable [unreachable]
tests/http/federation/test_matrix_federation_agent.py:151: error: Statement is unreachable [unreachable]
tests/http/federation/test_matrix_federation_agent.py:452: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:60: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:93: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:127: error: Statement is unreachable [unreachable]
tests/logging/test_remote_handler.py:152: error: Statement is unreachable [unreachable]
```
* Changelog
* Tweak DBAPI2 Protocol to be accepted by mypy 1.0
Some extra context in:
- https://github.com/matrix-org/python-canonicaljson/pull/57
- https://github.com/python/mypy/issues/6002
- https://mypy.readthedocs.io/en/latest/common_issues.html#covariant-subtyping-of-mutable-protocol-members-is-rejected
* Pull in updated canonicaljson lib
so the protocol check just works
* Improve comments in opentracing
I tried to workaround the ignores but found it too much trouble.
I think the corresponding issue is
https://github.com/python/mypy/issues/12909. The mypy repo has a PR
claiming to fix this (https://github.com/python/mypy/pull/14677) which
might mean this gets resolved soon?
* Better annotation for INTERACTIVE_AUTH_CHECKERS
* Drive-by AUTH_TYPE annotation, to remove an ignore
This replaces the specific `is_room_mention` push rule condition
used in MSC3952 with the generic `exact_event_match` push rule
condition from MSC3758.
No functionality changes due to this.
Previously we would give up upon receiving a 404 from the first server,
instead of trying the rest of the servers in the list.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Fix order of partial state tables when purging
`partial_state_rooms` has an FK on `events` pointing to the join event we
get from `/send_join`, so we must delete from that table before deleting
from `events`.
**NB:** It would be nice to cancel any resync processes for the room
being purged. We do not do this at present. To do so reliably we'd need
an internal HTTP "replication" endpoint, because the worker doing the
resync process may be different to that handling the purge request.
The first time the resync process tries to write data after the deletion
it will fail because we have deleted necessary data e.g. auth
events. AFAICS it will not retry the resync, so the only downside to
not cancelling the resync is a scary-looking traceback.
(This is presumably extremely race-sensitive.)
* Changelog
* admist(?) -> between
* Warn about a race
* Fix typo, thanks Sean
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
---------
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
...when lazy loading of members is not enabled. It's weird to notify
a client that another user's device list has changed when the client
doesn't think that they share a room.
Note that when a room is un-partial stated, device list updates are
emitted for every member in that room over /sync.
Signed-off-by: Sean Quah <seanq@matrix.org>
Fixes#12801.
Complement tests are at
https://github.com/matrix-org/complement/pull/567.
Avoid blocking on full state when handling a subsequent join into a
partial state room.
Also always perform a remote join into partial state rooms, since we do
not know whether the joining user has been banned and want to avoid
leaking history to banned users.
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
It's important that collections returned from `@cached` methods are not
modified, otherwise future retrievals from the cache will return the
modified collection.
This applies to the return values from `@cached` methods and the values
inside the dictionaries returned by `@cachedList` methods. It's not
necessary for the dictionaries returned by `@cachedList` methods
themselves to be read-only.
Signed-off-by: Sean Quah <seanq@matrix.org>
Co-authored-by: David Robertson <davidr@element.io>
This specifies to search for an exact value match, instead of
string globbing. It only works across non-compound JSON values
(null, boolean, integer, and strings).
The per-room account data is no longer unconditionally
fetched, even if all rooms will be filtered out.
Global account data will not be fetched if it will all be
filtered out.
The previous version of the code could mutate a cached value,
but only if the input requested all devices of a user *and* a specific
device.
To avoid this nonsensical situation we no longer fetch a specific
device ID if all of a user's devices are returned.
This disambiguates keys which attempt to match fields
with a dot in them (e.g. m.relates_to).
Disabled by default behind an experimental configuration flag.
* Fix MediaStorage type hint
* Typecheck tests.rest.media.v1.test_media_storage
* Changelog
* Remove assert and make the comment succinct
* Fix syntax for olddeps
* Tweak http types in Synapse
AFACIS these are correct, and they make mypy happier on tests.http.
* Type hints for test_proxyagent
* type hints for test_srv_resolver
* test_matrix_federation_agent
* tests.http.server._base
* tests.http.__init__
* tests.http.test_additional_resource
* tests.http.test_client
* tests.http.test_endpoint
* tests.http.test_matrixfederationclient
* tests.http.test_servlet
* tests.http.test_simple_client
* tests.http.test_site
* One fixup in tests.server
* Untyped defs
* Changelog
* Fixup syntax for Python 3.7
* Fix olddeps syntax
* Use a twisted IPv4 addr for dummy_address
* Fix typo, thanks Sean
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Remove redundant `Optional`
---------
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
This adds an `event_stream_ordering` column to `current_state_events`,
`local_current_membership` and `room_memberships`. Each of these tables
is regularly joined with the `events` table to get the stream ordering
and denormalising this into each table will yield significant query
performance improvements once used. Includes a background job to
populate these values from the `events` table.
Same idea as https://github.com/matrix-org/synapse/pull/13703.
Signed off by Nick @ Beeper (@fizzadar).
* Accept a Sequence of events in synapse.appservice
This avoids some casts/ignores in the tests I'm about to fixup. It seems
that `List[Mock]` is not a subtype of `List[EventBase]`, but
`Sequence[Mock]` is a subtype of `Sequence[EventBase]`. So presumably
`Mock` is considered a subtype of anything, much like `Any`.
* make tests.appservice.test_scheduler pass mypy
* Extra hints in tests.appservice.test_scheduler
* Extra hints in tests.appservice.test_api
* Extra hints in tests.appservice.test_appservice
* Disallow untyped defs
* Changelog
Co-authored-by: Brad Murray <brad@beeper.com>
Co-authored-by: Nick Barrett <nick@beeper.com>
Copy the suppress_edits push rule from Beeper to implement MSC3958.
9415a1284b/rust/src/push/base_rules.rs (L98-L114)
Ensure that the list of servers in a partial state room always contains
the server we joined off.
Also refactor `get_partial_state_servers_at_join` to return `None` when
the given room is no longer partial stated, to explicitly indicate when
the room has partial state. Otherwise it's not clear whether an empty
list means that the room has full state, or the room is partial stated,
but the server we joined off told us that there are no servers in the
room.
Signed-off-by: Sean Quah <seanq@matrix.org>
Since pyo3-log is initialized very early in the Python start-up
it caches the state of the loggers before they're fully initialized
(and thus are essentially disabled). Whenever we reload the
logging configuration we now also tell pyo3-log to discard
any cached logging configuration it has; it will refetch the
current logging configuration from Python at the next point
it logs.
This fixes Rust log lines not appearing in the homeserver logs.
If a sync request does not need to calculate per-room entries &
is not generating presence & is not generating device list data
(e.g. during initial sync) avoid the expensive calculation of room
specific data.
This is a micro-optimisation for clients syncing simply to receive
to-device information.
This expands the previous optimisation from being only for initial
sync to being for all sync requests.
It also inverts some of the logic to be inclusive instead of exclusive.
The `parse_enum` helper pulls an enum value from the query string
(by delegating down to the parse_string helper with values generated
from the enum).
This is used to pull out "f" and "b" in most places and then we thread
the resulting Direction enum throughout more code.
The previous assumption was that the stream_id column was unique
(for a room ID, receipt type, user ID tuple), but this turned out to be
incorrect.
Now find the max stream ID, then map this back to a database-specific
row identifier and delete other rows which match the (room ID, receipt type,
user ID) tuple, but *not* the row ID.
`run_in_background` calls re-use the current logging context. When they
are not awaited, they can complete after the current logging context has
been marked as finished, which leads to log spam. Use
`run_as_background_process` instead.
Fixes one of the instances of #13090.
Signed-off-by: Sean Quah <seanq@matrix.org>
#14910 fixed the regression introduced by #13873 where sqlite database
migrations would no longer run inside a transaction. However, it
committed the transaction before Synapse updated its bookkeeping of
which migrations have been run, which means that migrations may be run
again after they have completed successfully.
Leave the transaction open at the end of `executescript`, to restore the
old, correct behaviour. Also make the PostgreSQL behaviour consistent
with SQLite.
Fixes#14909.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Better test for bad values in power levels events
The previous test only checked that Synapse didn't raise an exception,
but didn't check that we had correctly interpreted the value of the
dodgy power level.
It also conflated two things: bad room notification levels, and bad user
levels. There _is_ logic for converting the latter to integers, but we
should test it separately.
* Check we ignore types that don't convert to int
* Handle `None` values in `notifications.room`
* Changelog
* Also test that bad values are rejected by event auth
* Docstring
* linter scripttttttttt
* Test boolean values in PL content
* Reject boolean power levels
* Changelog
* Perfer `type(x) is int` to `isinstance(x, int)`
This covered all additional instances I could see where `x` was
user-controlled.
The remaining cases are
```
$ rg -s 'isinstance.*[^_]int'
tests/replication/_base.py
576: if isinstance(obj, int):
synapse/util/caches/stream_change_cache.py
136: assert isinstance(stream_pos, int)
214: assert isinstance(stream_pos, int)
246: assert isinstance(stream_pos, int)
267: assert isinstance(stream_pos, int)
synapse/replication/tcp/external_cache.py
133: if isinstance(result, int):
synapse/metrics/__init__.py
100: if isinstance(calls, (int, float)):
synapse/handlers/appservice.py
262: assert isinstance(new_token, int)
synapse/config/_util.py
62: if isinstance(p, int):
```
which cover metrics, logic related to `jsonschema`, and replication and
data streams. AFAICS these are all internal to Synapse
* Changelog
* Better test for bad values in power levels events
The previous test only checked that Synapse didn't raise an exception,
but didn't check that we had correctly interpreted the value of the
dodgy power level.
It also conflated two things: bad room notification levels, and bad user
levels. There _is_ logic for converting the latter to integers, but we
should test it separately.
* Check we ignore types that don't convert to int
* Handle `None` values in `notifications.room`
* Changelog
* Also test that bad values are rejected by event auth
* Docstring
* linter scripttttttttt
MSC3952 defines push rules which searches for mentions in a list of
Matrix IDs in the event body, instead of searching the entire event
body for display name / local part.
This is implemented behind an experimental configuration flag and
does not yet implement the backwards compatibility pieces of the MSC.
The `/relations` endpoint was not properly handle "live tokens"
(i.e sync tokens), to do this properly we abstract the code that
`/messages` has and re-use it.
* Batch look-ups to see if rooms are partial stated.
* Fix issues found in linting.
* Fix typo.
* Apply suggestions from code review
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Clarify comments.
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Also improve the cache size while we're at it
* is_partial_state_rooms -> is_partial_state_room_batched
* Run `black`
* Improve annotation for `simple_select_many_batch`
* Fix is_partial_state_room_batched impl
* Okay, _actually_ fix impl
* Update description.
* Update synapse/storage/databases/main/room.py
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Run black.
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: David Robertson <davidr@element.io>
On startup, the `_device_list_id_gen` stream id generator is initialized
using the maximum stream id seen in a list of tables. When we started
populating the `device_list_remote_pending` table in #13913, we forgot
to add it to the aforementioned list of tables, so the stream id
generator can hand out old stream ids after a restart. The end result is
that Synapse can fail to handle device list update EDUs after a restart
when a partial state join is in progress.
Add the `device_list_remote_pending` table to the list of tables to
consider when initializing the `_device_list_id_gen` stream id generator.
Signed-off-by: Sean Quah <seanq@matrix.org>
Destination was being used incorrectly (a single destination instead
of a list of destinations was being passed).
This also updates some of the types in the area to not use Collection[str],
which is a footgun.
* Bump the client-side timeout for /state
to allow faster joins resyncs the chance to complete for large rooms.
We have seen this fair poorly (~90s for Matrix HQ's /state) in testing,
causing the resync to advance to another HS who hasn't seen our join yet.
* Changelog
* Milliseconds!!!!
#13873 introduced a regression which causes sqlite database migrations
to no longer run inside a transaction. Wrap them in a transaction again,
to avoid database corruption when migrations are interrupted.
Fixes#14909.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Request partial joins by default
This is a little sloppy, but we are trying to gain confidence in faster
joins in the upcoming RC.
Admins can still opt out by adding the following to their Synapse
config:
```yaml
experimental:
faster_joins: false
```
We may revert this change before the release proper, depending on how
testing in the wild goes.
* Changelog
* Try to fix the backfill test failures
* Upgrade notes
* Postgres compat?
* Allow `AbstractSet` in `StrCollection`
Or else frozensets are excluded. This will be useful in an upcoming
commit where I plan to change a function that accepts `List[str]` to
accept `StrCollection` instead.
* `rooms_to_exclude` -> `rooms_to_exclude_globally`
I am about to make use of this exclusion mechanism to exclude rooms for
a specific user and a specific sync. This rename helps to clarify the
distinction between the global config and the rooms to exclude for a
specific sync.
* Better function names for internal sync methods
* Track a list of excluded rooms on SyncResultBuilder
I plan to feed a list of partially stated rooms for this sync to ignore
* Exclude partial state rooms during eager sync
using the mechanism established in the previous commit
* Track un-partial-state stream in sync tokens
So that we can work out which rooms have become fully-stated during a
given sync period.
* Fix mutation of `@cached` return value
This was fouling up a complement test added alongside this PR.
Excluding a room would mean the set of forgotten rooms in the cache
would be extended. This means that room could be erroneously considered
forgotten in the future.
Introduced in #12310, Synapse 1.57.0. I don't think this had any
user-visible side effects (until now).
* SyncResultBuilder: track rooms to force as newly joined
Similar plan as before. We've omitted rooms from certain sync responses;
now we establish the mechanism to reintroduce them into future syncs.
* Read new field, to present rooms as newly joined
* Force un-partial-stated rooms to be newly-joined
for eager incremental syncs only, provided they're still fully stated
* Notify user stream listeners to wake up long polling syncs
* Changelog
* Typo fix
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Unnecessary list cast
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Rephrase comment
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Another comment
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Fixup merge(?)
* Poke notifier when receiving un-partial-stated msg over replication
* Fixup merge whoops
Thanks MV :)
Co-authored-by: Mathieu Velen <mathieuv@matrix.org>
Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Faster joins: Update room stats and user directory on workers when done
When finishing a partial state join to a room, we update the current
state of the room without persisting additional events. Workers receive
notice of the current state update over replication, but neglect to wake
the room stats and user directory updaters, which then get incidentally
triggered the next time an event is persisted or an unrelated event
persister sends out a stream position update.
We wake the room stats and user directory updaters at the appropriate
time in this commit.
Part of #12814 and #12815.
Signed-off-by: Sean Quah <seanq@matrix.org>
* fixup comment
Signed-off-by: Sean Quah <seanq@matrix.org>
* Enable Complement tests for Faster Remote Room Joins on worker-mode
* (dangerous) Add an override to allow Complement to use FRRJ under workers
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Fix race where we didn't send out replication notification
* MORE HACKS
* Fix get_un_partial_stated_rooms_token to take instance_name
* Fix bad merge
* Remove warning
* Correctly advance un_partial_stated_room_stream
* Fix merge
* Add another notify_replication
* Fixups
* Create a separate ReplicationNotifier
* Fix test
* Fix portdb
* Create a separate ReplicationNotifier
* Fix test
* Fix portdb
* Fix presence test
* Newsfile
* Apply suggestions from code review
* Update changelog.d/14752.misc
Co-authored-by: Erik Johnston <erik@matrix.org>
* lint
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Erik Johnston <erik@matrix.org>
* Avoid clearing out forward extremities when doing a second remote join
When joining a restricted room where the local homeserver does not have
a user able to issue invites, we perform a second remote join. We want
to avoid clearing out forward extremities in this case because the
forward extremities we have are up to date and clearing out forward
extremities creates a window in which the room can get bricked if
Synapse crashes.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Do a full join when doing a second remote join into a full state room
We cannot persist a partial state join event into a joined full state
room, so we perform a full state join for such rooms instead. As a
future optimization, we could always perform a partial state join and
compute or retrieve the full state ourselves if necessary.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Add lock around partial state flag for rooms
Signed-off-by: Sean Quah <seanq@matrix.org>
* Preserve partial state info when doing a second partial state join
Signed-off-by: Sean Quah <seanq@matrix.org>
* Add newsfile
* Add a TODO(faster_joins) marker
Signed-off-by: Sean Quah <seanq@matrix.org>
Now that we wait for stream positions whenever we do a HTTP replication
hit, we need to be less brutal in the case where we do timeout (as we
have bugs around this).
Currently, we will try to start a new partial state sync every time we
perform a remote join, which is undesirable if there is already one
running for a given room.
We intend to perform remote joins whenever additional local users wish
to join a partial state room, so let's ensure that we do not start more
than one concurrent partial state sync for any given room.
------------------------------------------------------------------------
There is a race condition where the homeserver leaves a room and later
rejoins while the partial state sync from the previous membership is
still running. There is no guarantee that the previous partial state
sync will process the latest join, so we restart it if needed.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Change Documentation to have v10 as default room version
* Change Default Room version to 10
* Add changelog entry for default room version swap
* Add changelog entry for v10 default room version in docs
* Clarify doc changelog entry
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Improve Documentation changes.
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Update Changelog entry to have correct format
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Update Spec Version to 1.5
* Only need 1 changelog.
* Fix test.
* Update "Changed in" line
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <patrickc@matrix.org>
Serving partial join responses is no longer experimental. They will only be served under the stable identifier if the the undocumented config flag experimental.msc3706_enabled is set to true.
Synapse continues to request a partial join only if the undocumented config flag experimental.faster_joins is set to true; this setting remains present and unaffected.
We were incorrectly checking if the *local* token had been advanced, rather than the token for the remote instance.
In practice, I don't think this has caused any bugs due to where we use `wait_for_stream_position`, as critically we don't use it on instances that also write to the given streams (and so the local token will lag behind all remote tokens).
When the local homeserver is already joined to a room and wants to
perform another remote join, we may find it useful to do a non-partial
state join if we already have the full state for the room.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Also use stable name in SendJoinResponse struct
follow-up to #14832
* Changelog
* Fix a rename I missed
* Run black
* Update synapse/federation/federation_client.py
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Use new query param when requesting a partial join
* Read new query param when serving partial join
* Provide new field names when serving partial joins
* Read new field names from partial join response
* Changelog
When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.
A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.
Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.
Fixes#14480.
Signed-off-by: Sean Quah <seanq@matrix.org>
Two parts to this:
* Bundle the whole of the replacement with any edited events. This is backwards-compatible so I haven't put it behind a flag.
* Optionally, inhibit server-side replacement of edited events. This has scope to break things, so it is currently disabled by default.
It doesn't seem valid that HTML entities should appear in
the title field of oEmbed responses, but a popular WordPress
plug-in seems to do it.
There should not be harm in unescaping these.
This has two related changes:
* It enables fast-path processing for an empty filter (`[]`) which was
previously only used for wildcard not-filters (`["*"]`).
* It special cases a `/sync` filter with no-rooms to skip all room
processing, previously we would partially skip processing, but would
generally still calculate intermediate values for each room which were
then unused.
Future changes might consider further optimizations:
* Skip calculating per-room account data when all rooms are filtered (currently
this is thrown away).
* Make similar improvements to other endpoints which support filters.
* Fixes#12277 :Disable sending confirmation email when 3pid is disabled
* Fix test_add_email_if_disabled test case to reflect changes to enable_3pid_changes flag
* Add changelog file
* Rename newsfragment.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
PKCE can protect against certain attacks and is enabled by default. Support
can be controlled manually by setting the pkce_method of each oidc_providers
entry to 'auto' (default), 'always', or 'never'.
This is required by Twitter OAuth 2.0 support.
OpenID specifies the format of the user info endpoint and some
OAuth 2.0 IdPs do not follow it, e.g. NextCloud and Twitter.
This adds subject_template and picture_template options to the
default mapping provider for more flexibility in matching those user
info responses.
This creates a new store method, `process_replication_position` that
is called after `process_replication_rows`. By moving stream ID advances
here this guarantees any relevant cache invalidations will have been
applied before the stream is advanced.
This avoids race conditions where Python switches between threads mid
way through processing the `process_replication_rows` method where stream
IDs may be advanced before caches are invalidated due to class resolution
ordering.
See this comment/issue for further discussion:
https://github.com/matrix-org/synapse/issues/14158#issuecomment-1344048703
Then adapts calling code to retry when needed so it doesn't 500
to clients.
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
if a Synapse deployment upgraded (from < 1.62.0 to >= 1.70.0) then it
is possible for schema deltas to run before background updates causing
drift in the database schema due to:
1. A delta registered a background update to create an index.
2. A delta dropped the above index if it exists (but it yet exist won't since
the background job hasn't run).
3. The code assumed the index was dropped.
To fix this we:
1. Cancel the background update which could create the index.
2. Drop the index again.
3. Drop a related index which is dropped by the background update.
This avoids pulling additional state information (and events) from
the database for each item returned in the hierarchy response.
The room type might be out of date until a background update finishes
running, the worst impact of this would be spaces being treated as rooms
in the hierarchy response. This should self-heal once the background
update finishes.
* Declare new config
* Parse new config
* Read new config
* Don't use trial/our TestCase where it's not needed
Before:
```
$ time trial tests/events/test_utils.py > /dev/null
real 0m2.277s
user 0m2.186s
sys 0m0.083s
```
After:
```
$ time trial tests/events/test_utils.py > /dev/null
real 0m0.566s
user 0m0.508s
sys 0m0.056s
```
* Helper to upsert to event fields
without exceeding size limits.
* Use helper when adding invite/knock state
Now that we allow admins to include events in prejoin room state with
arbitrary state keys, be a good Matrix citizen and ensure they don't
accidentally create an oversized event.
* Changelog
* Move StateFilter tests
should have done this in #14668
* Add extra methods to StateFilter
* Use StateFilter
* Ensure test file enforces typed defs; alphabetise
* Workaround surprising get_current_state_ids
* Whoops, fix mypy
* Enable `--warn-redundant-casts` option in mypy
Doesn't do much but helps me sleep better at night.
* Changelog
* Fix name of the ignore
* Fix one more missed cast
Not sure why I didn't see this one locally, maybe I needed a poetry update
* Remove old comment
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
#11915 introduced the `@cached` `is_interested_in_room` method in
Synapse 1.55.0, which depends upon `get_aliases_for_room`. Add a missing
cache invalidation callback so that the `is_interested_in_room` cache is
invalidated when `get_aliases_for_room` is invalidated.
#13787 made `get_rooms_for_user` `@cached`. Add a missing cache
invalidation callback so that the `is_interested_in_presence` cache is
invalidated when `get_rooms_for_user` is invalidated.
Signed-off-by: Sean Quah <seanq@matrix.org>
Fixes#13655
This change uses ICU (International Components for Unicode) to improve boundary detection in user search.
This change also adds a new dependency on libicu-dev and pkg-config for the Debian packages, which are available in all supported distros.
When Synapse is terminated while running the background update to create
the `receipts_graph` or `receipts_linearized` indexes, the indexes may
be successfully created (or marked as invalid on postgres) while the
background update remains unfinished. When Synapse next starts up, the
background update will fail because the index already exists, or exists
but is invalid on postgres.
Use the existing code to create indices in background updates, since it
handles these edge cases.
Signed-off-by: Sean Quah <seanq@matrix.org>
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
This PR changes http-based image URLs to be https in html templates.
This impacts the Synapse SSO error page, where browsers report mixed
media content warnings.
Also, https://matrix.org/img/vector-logo-email.png is currently broken
but the URL has been updated to be https anyway.
Signed-off-by: Ashish Kumar <ashfame@users.noreply.github.com>
Due to the various fixes to the StreamChangeCache it is not
safe to trust the information in the user directory or room/user
stats tables. Rebuild them as background jobs.
In particular see da77720752 (#14639),
and 6a8310f3df (#14435).
Maybe also be related to fac8a38525
(#14592).
An empty cache does not mean the entity has no changed, if
it is earlier than the earliest known stream position return that
the entity *has* changed since the cache cannot accurately
answer that query.
A batch of changes intended to make it easier to trace to-device messages through the system.
The intention here is that a client can set a property org.matrix.msgid in any to-device message it sends. That ID is then included in any tracing or logging related to the message. (Suggestions as to where this field should be documented welcome. I'm not enthusiastic about speccing it - it's very much an optional extra to help with debugging.)
I've also generally improved the data we send to opentracing for these messages.
The internal methods of the StreamChangeCache were inconsistently
treating the earliest known stream position as valid. It is now treated as
invalid, meaning the cache cannot determine if an entity at the earliest
known stream position has changed or not.
Add logic to ClientRestResource to decide whether to mount servlets
or not based on whether the current process is a worker.
This is clearer to see what a worker runs than the completely separate /
copy & pasted list of servlets being mounted for workers.
StreamChangeCache.get_all_changed_entities can return None to signify
it does not have information at the given stream position. Two callers (related
to device lists and presence) were treating this response the same as an empty
list (i.e. there being no updates).
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Support MSC1767's `content.body` behaviour in push rules
* Add the base rules from MSC3933
* Changelog entry
* Flip condition around for finding `m.markup`
* Remove forgotten import
* Add support for MSC3931: Room Version Supports push rule condition
* Create experimental flag for future work, and use it to gate MSC3931
* Changelog entry
* Use `device_one_time_keys_count` to match MSC3202
Rename the `device_one_time_key_counts` key in responses to
`device_one_time_keys_count` to match the name specified by MSC3202.
Also change related variable/class names for consistency.
Signed-off-by: Andrew Ferrazzutti <andrewf@element.io>
* Update changelog.d/14565.misc
* Revert name change for `one_time_key_counts` key
as this is a different key altogether from `device_one_time_keys_count`,
which is used for `/sync` instead of appservice transactions.
Signed-off-by: Andrew Ferrazzutti <andrewf@element.io>
`setup()` is run under the sentinel context manager, so we wrap the
initial update in a background process. Before this change, Synapse
would log two warnings on startup:
Starting db txn 'count_daily_users' from sentinel context
Starting db connection from sentinel context: metrics will be lost
Signed-off-by: Sean Quah <seanq@matrix.org>
Include the thread_id field when sending read receipts over
federation. This might result in the same user having multiple
read receipts per-room, meaning multiple EDUs must be sent
to encapsulate those receipts.
This restructures the PerDestinationQueue APIs to support
multiple receipt EDUs, queue_read_receipt now becomes linear
time in the number of queued threaded receipts in the room for
the given user, it is expected this is a small number since receipt
EDUs are sent as filler in transactions.
To perform an emulated upsert into a table safely, we must either:
* lock the table,
* be the only writer upserting into the table
* or rely on another unique index being present.
When the 2nd or 3rd cases were applicable, we previously avoided locking
the table as an optimization. However, as seen in #14406, it is easy to
slip up when adding new schema deltas and corrupt the database.
The only time we lock when performing emulated upserts is while waiting
for background updates on postgres. On sqlite, we do no locking at all.
Let's remove the option to skip locking tables, so that we don't shoot
ourselves in the foot again.
Signed-off-by: Sean Quah <seanq@matrix.org>
This commit adds support for handling a provided avatar picture URL
when logging in via SSO.
Signed-off-by: Ashish Kumar <ashfame@users.noreply.github.com>
Fixes#9357.
This was the last untyped handler from the HomeServer object. Since
it was being treated as Any (and thus unchecked) it was being used
incorrectly in a few places.
When a local device list change is added to
`device_lists_changes_in_room`, the `converted_to_destinations` flag is
set to `FALSE` and the `_handle_new_device_update_async` background
process is started. This background process looks for unconverted rows
in `device_lists_changes_in_room`, copies them to
`device_lists_outbound_pokes` and updates the flag.
To update the `converted_to_destinations` flag, the database performs a
`DELETE` and `INSERT` internally, which fragments the table. To avoid
this, track unconverted rows using a `(stream ID, room ID)` position
instead of the flag.
From now on, the `converted_to_destinations` column indicates rows that
need converting to outbound pokes, but does not indicate whether the
conversion has already taken place.
Closes#14037.
Signed-off-by: Sean Quah <seanq@matrix.org>
Avoid an n+1 query problem and fetch the bundled aggregations for
m.reference relations in a single query instead of a query per event.
This applies similar logic for as was previously done for edits in
8b309adb43 (#11660; threads
in b65acead42 (#11752); and
annotations in 1799a54a54 (#14491).
Avoid an n+1 query problem and fetch the bundled aggregations for
m.annotation relations in a single query instead of a query per event.
This applies similar logic for as was previously done for edits in
8b309adb43 (#11660) and threads
in b65acead42 (#11752).
* Add tests for StreamIdGenerator
* Drive-by: annotate all defs
* Revert "Revert "Remove slaved id tracker (#14376)" (#14463)"
This reverts commit d63814fd73, which in
turn reverted 36097e88c4. This restores
the latter.
* Fix StreamIdGenerator not handling unpersisted IDs
Spotted by @erikjohnston.
Closes#14456.
* Changelog
Co-authored-by: Nick Mills-Barrett <nick@fizzadar.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.
Also adds some missing type hints which were simple to fill in.
As part of the database migration to support threaded receipts, there is
a possible window in between
`73/08thread_receipts_non_null.sql.postgres` removing the original
unique constraints on `receipts_linearized` and `receipts_graph` and the
`reeipts_linearized_unique_index` and `receipts_graph_unique_index`
background updates from `72/08thread_receipts.sql` completing where
the unique constraints on `receipts_linearized` and `receipts_graph` are
missing. Any emulated upserts on these tables must therefore be
performed with a lock held, otherwise duplicate rows can end up in the
tables when there are concurrent emulated upserts. Fix the missing lock.
Note that emulated upserts no longer happen by default on sqlite, since
the minimum supported version of sqlite supports native upserts by
default now.
Finally, clean up any duplicate receipts that may have crept in before
trying to create the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes.
Signed-off-by: Sean Quah <seanq@matrix.org>
We don't filter state usually, so doing so here is a waste of time. This is not much of an issue for clients that enable lazy loading of members, since there will be fewer state events.
This matches the multi instance writer ID generator class which can
both handle advancing the current token over replication and by calling
the database.
This code was factored out to a method, but also left in-place.
Calling this twice in a row makes no sense: the first call will reduce
the size appropriately, but the loop will immediately exit since the
cache size was already reduced.
PostgreSQL may underestimate the number of distinct `room_id`s in
`event_search`, which can cause it to use table scans for queries for
multiple rooms.
Fix this by setting `n_distinct` on the column.
Resolves#14402.
Signed-off-by: Sean Quah <seanq@matrix.org>
When this background update did its last batch, it would try to update all the
events that had been inserted since the bgupdate started, which could cause a
table-scan. Make sure we limit the update correctly.
For forward compatibility, Synapse needs to ignore fields it does not
recognise instead of raising an error.
Fixes#14365.
Signed-off-by: Sean Quah <seanq@matrix.org>
==============================
Please note that, as announced in the release notes for Synapse 1.69.0, legacy Prometheus metric names are now disabled by default.
They will be removed altogether in Synapse 1.73.0.
If not already done, server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.71/upgrade.html#upgrading-to-v1710) for more details.
Improved Documentation
----------------------
- Document the changes to monthly active user metrics due to deprecation of legacy Prometheus metric names. ([\#14358](https://github.com/matrix-org/synapse/issues/14358), [\#14360](https://github.com/matrix-org/synapse/issues/14360))
Deprecations and Removals
-------------------------
- Disable legacy Prometheus metric names by default. They can still be re-enabled for now, but they will be removed altogether in Synapse 1.73.0. ([\#14353](https://github.com/matrix-org/synapse/issues/14353))
Internal Changes
----------------
- Run unit tests against Python 3.11. ([\#13812](https://github.com/matrix-org/synapse/issues/13812))
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmNlD5YACgkQ+84Kzgcy
GG8T0Qv+PkTt9+lUbDAsqWnqWtSYCit4DEZU/g0G+/xUVdZypo+pIHYzzEHEEPoB
Siinh5sLqUagoOn0i+YU+D6WZw+20F5+qdQi6QSkpPDorReYBl+4MvyRNqjn3lAE
a/bfyHY3+88SfVMza+GRQS7pU06CCMD5aqUZABnYwV5fpKWdF2hOESSuzqEElQDR
7zaZuc7I/8d/VNz79FmyDfGzYnvWS+tr0BcNIyye6OIzZ2AS+q8loBzQiJ5rd1kY
ZiGeAHfeu/r05X25kzD1E1qr04ycM7mFjDgBa/aSdf5kkimNNUISo424sawoCVOU
oDm+QVefH4zKK9b+DVAJ7eaHIaQsRjJir+h/umXrX09I/x/eMB7Zll1wy1T+KzUS
Zf4GCVCI1kKY3FzcPWh2Jf1j899JtbkrKasA+UYf3+PEeFxQ4LyjYlVHC+J1sCXL
NGn5YDZkvhfpTOR300eDisUsxKIBFPxL6FjzW5whOnBddKC+VVnlSQkAbn6yHQGE
LU1xKhta
=m6ze
-----END PGP SIGNATURE-----
Merge tag 'v1.71.0rc2' into develop
Synapse 1.71.0rc2 (2022-11-04)
==============================
Please note that, as announced in the release notes for Synapse 1.69.0, legacy Prometheus metric names are now disabled by default.
They will be removed altogether in Synapse 1.73.0.
If not already done, server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.71/upgrade.html#upgrading-to-v1710) for more details.
Improved Documentation
----------------------
- Document the changes to monthly active user metrics due to deprecation of legacy Prometheus metric names. ([\#14358](https://github.com/matrix-org/synapse/issues/14358), [\#14360](https://github.com/matrix-org/synapse/issues/14360))
Deprecations and Removals
-------------------------
- Disable legacy Prometheus metric names by default. They can still be re-enabled for now, but they will be removed altogether in Synapse 1.73.0. ([\#14353](https://github.com/matrix-org/synapse/issues/14353))
Internal Changes
----------------
- Run unit tests against Python 3.11. ([\#13812](https://github.com/matrix-org/synapse/issues/13812))
If configured an OIDC IdP can log a user's session out of
Synapse when they log out of the identity provider.
The IdP sends a request directly to Synapse (and must be
configured with an endpoint) when a user logs out.
* Introduce a test for the old behaviour which we want to restore
* Reintroduce the old behaviour in a simpler way
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Use 1 credit instead of 2 for creating a room: be more lenient than before
Notably, the UI in Element Web was still broken after restoring to prior behaviour.
After discussion, we agreed that it would be sensible to increase the limit.
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
PostgreSQL 14 changed the behavior of `websearch_to_tsquery` to
improve some behaviour.
The tests were hitting those edge-cases about handling of hanging double
quotes. This fixes the tests to take into account the PostgreSQL version.
* Add workers settings to configuration manual
* Update `pusher_instances`
* update url to python logger
* update headlines
* update links after headline change
* remove link from `daemon process`
There is no docs in Synapse for this
* extend example for `federation_sender_instances` and `pusher_instances`
* more infos about stream writers
* add link to DAG
* update `pusher_instances`
* update `worker_listeners`
* update `stream_writers`
* Update `worker_name`
Co-authored-by: David Robertson <davidr@element.io>
1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper.
2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern.
3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
Fixes check_avatar_size_and_mime_type() to successfully update avatars on homeservers running on non-default ports which it would mistakenly treat as remote homeserver while validating the avatar's size and mime type.
Signed-off-by: Ashish Kumar ashfame@users.noreply.github.com
Support a unified search query syntax which leverages more of the full-text
search of each database supported by Synapse.
Supports, with the same syntax across Postgresql 11+ and Sqlite:
- quoted "search terms"
- `AND`, `OR`, `-` (negation) operators
- Matching words based on their stem, e.g. searches for "dog" matches
documents containing "dogs".
This is achieved by
- If on postgresql 11+, pass the user input to `websearch_to_tsquery`
- If on sqlite, manually parse the query and transform it into the sqlite-specific
query syntax.
Note that postgresql 10, which is close to end-of-life, falls back to using
`phraseto_tsquery`, which only supports a subset of the features.
Multiple terms separated by a space are implicitly ANDed.
Note that:
1. There is no escaping of full-text syntax that might be supported by the database;
e.g. `NOT`, `NEAR`, `*` in sqlite. This runs the risk that people might discover this
as accidental functionality and depend on something we don't guarantee.
2. English text is assumed for stemming. To support other languages, either the target
language needs to be known at the time of indexing the message (via room metadata,
or otherwise), or a separate index for each language supported could be created.
Sqlite docs: https://www.sqlite.org/fts3.html#full_text_index_queries
Postgres docs: https://www.postgresql.org/docs/11/textsearch-controls.html
This implements a fake OIDC server, which intercepts calls to the HTTP client.
Improves accuracy of tests by covering more internal methods.
One particular example was the ID token validation, which previously mocked.
This uncovered an incorrect dependency: Synapse actually requires at least
authlib 0.15.1, not 0.14.0.
* Return NOT_JSON if decode fails and defer set_timeline_upper_limit call until after check_valid_filter. Fixes#13661. Signed-off-by: Ryan Miguel <miguel.ryanj@gmail.com>.
* Reword changelog
Use a base template to create a cohesive feel across the HTML
templates provided by Synapse.
Adds basic styling to the base template for a more user-friendly
look and feel.
When the last event in a thread is redacted we need to update
the threads table:
* Find the new latest event in the thread and store it into the table; or
* Remove the thread from the table if it is no longer a thread (i.e. all
events in the thread were redacted).
* Show erasure status when listing users in the Admin API
* Use USING when joining erased_users
* Add changelog entry
* Revert "Use USING when joining erased_users"
This reverts commit 30bd2bf106415caadcfdbdd1b234ef2b106cc394.
* Make the erased check work on postgres
* Add a testcase for showing erased user status
* Appease the style linter
* Explicitly convert `erased` to bool to make SQLite consistent with Postgres
This also adds us an easy way in to fix the other accidentally integered columns.
* Move erasure status test to UsersListTestCase
* Include user erased status when fetching user info via the admin API
* Document the erase status in user_admin_api
* Appease the linter and mypy
* Signpost comments in tests
Co-authored-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Fix MSC3030 `/timestamp_to_event` endpoint returning `outliers` that it has no idea whether are near a gap or not (and therefore unable to determine whether it's actually the closest event). The reason Synapse doesn't know whether an `outlier` is next to a gap is because our gap checks rely on entries in the `event_edges`, `event_forward_extremeties`, and `event_backward_extremities` tables which is [not the case for `outliers`](2c63cdcc3f/docs/development/room-dag-concepts.md (outliers)).
Also fixes MSC3030 Complement `can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint` test flake. Although this acted flakey in Complement, if `sync_partial_state` raced and beat us before `/timestamp_to_event`, then even if we retried the failing `/context` request it wouldn't work until we made this Synapse change. With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
Fix https://github.com/matrix-org/synapse/issues/13944
### Why did this fail before? Why was it flakey?
Sleuthing the server logs on the [CI failure](https://github.com/matrix-org/synapse/actions/runs/3149623842/jobs/5121449357#step:5:5805), it looks like `hs2:/timestamp_to_event` found `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` event locally. Then when we went and asked for it via `/context`, since it's an `outlier`, it was filtered out of the results -> `You don't have permission to access that event.`
This is reproducible when `sync_partial_state` races and persists `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` before we evaluate `get_event_for_timestamp(...)`. To consistently reproduce locally, just add a delay at the [start of `get_event_for_timestamp(...)`](cb20b885cb/synapse/handlers/room.py (L1470-L1496)) so it always runs after `sync_partial_state` completes.
```py
from twisted.internet import task as twisted_task
d = twisted_task.deferLater(self.hs.get_reactor(), 3.5)
await d
```
In a run where it passes, on `hs2`, `get_event_for_timestamp(...)` finds a different event locally which is next to a gap and we request from a closer one from `hs1` which gets backfilled. And since the backfilled event is not an `outlier`, it's returned as expected during `/context`.
With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
* Fix `track_memory_usage` on poetry-core 1.3.x installations
The same kind of problem as discussed in #14085:
1. we defined an extra with an underscore
2. we look it up at runtime with an underscore
3. but poetry-core 1.3.x. installs it with a dash, causing (2) to fail.
Fix by using a dash everywhere.
* Changelog
Spawned while investigating https://github.com/matrix-org/synapse/issues/13944
This way we might get some more context whenever an `403 Forbidden - body: {"errcode":"M_FORBIDDEN","error":"You don't have permission to access that event."}` error is produced.
`log_config.yaml`
```yaml
loggers:
synapse:
level: INFO
synapse.visibility:
level: DEBUG
```
This should fix a race where the event notification comes in over
replication before the state replication, leaving a window during
which a sync may get an incorrect list of rooms for the user.
While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place.
Related to
- https://github.com/matrix-org/synapse/issues/13622
- https://github.com/matrix-org/synapse/pull/13635
- https://github.com/matrix-org/synapse/issues/13676
Part of https://github.com/matrix-org/synapse/issues/13356
Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures.
With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now.
For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761
To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.
- `backfill`
- `outgoing-federation-request` `/backfill`
- `_check_sigs_and_hash_and_fetch`
- `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
- ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
- ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
- `_process_pulled_events`
- `_process_pulled_event` for each validated event
- ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
- `_get_state_ids_after_missing_prev_event`
- `outgoing-federation-request` `/state_ids`
- ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
- ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
The root node of a thread (and events related to it) are considered
"part of a thread" when validating receipts. This allows clients which
show the root node in both the main timeline and the threaded timeline
to easily send receipts in either.
Note that threaded notifications are not created for these events, these
events created notifications on the main timeline.
The callers either set a default limit or manually handle a None-limit
later on (by setting a default value).
Update the callers to always instantiate PaginationConfig with a default
limit and then assume the limit is non-None.
Stabilize the threads API (MSC3856) by supporting (only) the v1
path for the endpoint.
This also marks the API as safe for workers since it is a read-only
API.