Tracks presence on an individual per-device basis and combine
the per-device state into a per-user state. This should help in
situations where a user has multiple devices with conflicting status
(e.g. one is syncing with unavailable and one is syncing with online).
The tie-breaking is done by priority:
BUSY > ONLINE > UNAVAILABLE > OFFLINE
* Fix rare bug that broke looping calls
We can't interact with the reactor from the main thread via looping
call.
Introduced in v1.90.0 / #15791.
* Newsfile
Refactoring to pass the device ID (in addition to the user ID) through
the presence handler (specifically the `user_syncing`, `set_state`,
and `bump_presence_active_time` methods and their replication
versions).
Simplify some of the presence code by reducing duplicated code between
worker & non-worker modes.
The main change is to push some of the logic from `user_syncing` into
`set_state`. This is done by passing whether the user is setting the presence
via a `/sync` with a new `is_sync` flag to `set_state`. If this is `true` some
additional logic is performed:
* Don't override `busy` presence.
* Update the `last_user_sync_ts`.
* Never update the status message.
* Properly update retry_last_ts when hitting the maximum retry interval
This was broken in 1.87 when the maximum retry interval got changed from
almost infinite to a week (and made configurable).
fixes#16101
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
* Add changelog
* Change fix + add test
* Add comment
---------
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
This should only be called on HomeServer objects which are configured
to run background tasks, which is automatically (and properly) done via
the call to setup().
If we don't have all the auth events in a room then not all state events will have a chain cover index. Even so, we can still use the chain cover index on the events that do have it, rather than bailing and using the slower functions.
This situation should not arise for newly persisted rooms, as we check we have the full auth chain for each event, but can happen for existing rooms.
c.f. #15245
We were seeing serialization errors when taking out multiple read locks.
The transactions were retried, so isn't causing any failures.
Introduced in #15782.
Adds three new configuration variables:
* destination_min_retry_interval is identical to before (10mn).
* destination_retry_multiplier is now 2 instead of 5, the maximum value will
be reached slower.
* destination_max_retry_interval is one day instead of (essentially) infinity.
Capping this will cause destinations to continue to be retried sometimes instead
of being lost forever. The previous value was 2 ^ 62 milliseconds.
The location of the redacts field changes in room version 11. Ensure
it is copied to the *new* location for *old* room versions for
forwards-compatibility with clients.
Note that copying it to the *old* location for the *new* room version
was previously handled.
* Updates the rule ID.
* Use `event_property_is` instead of `event_match`.
This updates the implementation of MSC3958 to match the latest
text from the MSC.
Signed-off-by: Nicolas Werner <n.werner@famedly.com>
Co-authored-by: Nicolas Werner <n.werner@famedly.com>
Co-authored-by: Nicolas Werner <89468146+nico-famedly@users.noreply.github.com>
Co-authored-by: Hubert Chathi <hubert@uhoreg.ca>
And fix a bug in the implementation of the updated redaction
format (MSC2174) where the top-level redacts field was not
properly added for backwards-compatibility.
Allow configuring the set of workers to proxy outbound federation traffic through (`outbound_federation_restricted_to`).
This is useful when you have a worker setup with `federation_sender` instances responsible for sending outbound federation requests and want to make sure *all* outbound federation traffic goes through those instances. Before this change, the generic workers would still contact federation themselves for things like profile lookups, backfill, etc. This PR allows you to set more strict access controls/firewall for all workers and only allow the `federation_sender`'s to contact the outside world.
Unix socket support for `federation` and `client` Listeners has existed now for a little while(since [1.81.0](https://github.com/matrix-org/synapse/pull/15353)), but there was one last hold out before it could be complete: HTTP Replication communication. This should finish it up. The Listeners would have always worked, but would have had no way to be talked to/at.
---------
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
Allow configuring the set of workers to proxy outbound federation traffic through (`outbound_federation_restricted_to`).
This is useful when you have a worker setup with `federation_sender` instances responsible for sending outbound federation requests and want to make sure *all* outbound federation traffic goes through those instances. Before this change, the generic workers would still contact federation themselves for things like profile lookups, backfill, etc. This PR allows you to set more strict access controls/firewall for all workers and only allow the `federation_sender`'s to contact the outside world.
The original code is from @erikjohnston's branches which I've gotten in-shape to merge.
If you leave a room and forget it, then rejoin it, the room would be
missing from the next initial sync.
fixes#13262
Signed-off-by: Nicolas Werner <n.werner@famedly.com>
* Check required power levels earlier in createRoom handler.
- If a server was configured to reject the creation of rooms with E2EE
enabled (by specifying an unattainably high power level for
"m.room.encryption" in default_power_level_content_override), the 403
error was not being triggered until after the room was created and
before the "m.room.power_levels" was sent. This allowed a user to
access the partially-configured room and complete the setup of E2EE
and power levels manually.
- This change causes the power level overrides to be checked earlier and
the request to be rejected before the user gains access to the room.
- A new `_validate_room_config` method is added to contain checks that
should be run before a room is created.
- The new test case confirms that a user request is rejected by the new
validation method.
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
* Add a changelog file.
* Formatting fix for black.
* Remove unneeded line from test.
---------
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
See https://github.com/matrix-org/synapse/pull/14095#discussion_r990335492
This is useful because when see that a relevant event is an `outlier` or `soft-failed`, then that's a good unexpected indicator explaining why it's not showing up. `filter_events_for_client` is used in `/sync`, `/messages`, `/context` which are all common end-to-end assertion touch points (also notifications, relations).
Implements stable support for MSC3882; this involves updating Synapse's support to
match the MSC / the spec says.
Continue to support the unstable version to allow clients to transition.
The stubs have some issues so this has some generous cast
and ignores in it, but it is better than not having stubs.
Note that confusing that Element is a function which creates
_Element instances (and similarly for Comment).
* Fix#15669: always populate instance map even if it was empty
* Fix some tests
* Fix more tests
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* CI fix: don't forget to update apt repository sources before installing olddeps deps
* Add test testing the backwards compatibility
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
The cached decorators always return a Deferred, which was not
properly propagated. It was close enough when wrapping coroutines,
but failed if a bare function was wrapped.
This moves the deactivated user check to the method which
all login types call.
Additionally updates the application service tests to be more
realistic by removing invalid tests and fixing server names.
Fix https://github.com/matrix-org/synapse/issues/15618
### Before
```
2023-05-17 22:51:36-0500 [-] 2023-05-17 22:51:36,889 - synapse.server - 338 - INFO - sentinel - Finished setting up.
```
### After
```
2023-05-19 18:16:20-0500 [-] synapse.server - 338 - INFO - sentinel - Finished setting up.
```
### Dev notes
The `Twisted.Logger` controls the `2023-05-19 18:16:20-0500 [-]` prefix, see : [`twisted/twisted` -> `src/twisted/logger/_format.py#L362-L374`](34b161e66b/src/twisted/logger/_format.py (L362-L374))
And we delegate our logs to the Twisted Logger for the tests which puts it in `_trial_temp/test.log`
The event_fields property in filters should use the proper
escape rules, namely backslashes can be escaped with
an additional backslash.
This adds tests (adapted from matrix-js-sdk) and implements
the logic to properly split the event_fields strings.
...to try to control memory usage. `HomeServerConfig`s hold on to
many Jinja2 objects, which come out to over 0.5 MiB per config.
Over the course of a full test run, the cache grows to ~360 entries.
Limit it to 8 entries.
Part of #15622.
Signed-off-by: Sean Quah <seanq@matrix.org>
R30v2 has been out since 2021-07-19 (https://github.com/matrix-org/synapse/pull/10332)
and we started collecting stats on 2021-08-16. Since it's been over a year now
(almost 2 years), this is enough grace period for us to now rip it out.
Synapse will no longer send (or respond to) the unstable flags
for faster joins. These were only available behind a configuration
flag and handled in parallel with the stable flags.
This change fixes two memory leaks during `trial` test runs.
Garbage collection is disabled during each test case and a gen-0 GC is
run at the end of each test. However, when the gen-0 GC is run, the
`TestCase` object usually still holds references to the `HomeServer`
used during the test. As a result, the `HomeServer` gets promoted to
gen-1 and then never garbage collected.
Fix this by periodically running full GCs.
Additionally, fix `HomeServer`s leaking after tests that touch inbound
federation due to `FederationRateLimiter`s adding themselves to a global
set, by turning the set into a `WeakSet`.
Resolves#15622.
Signed-off-by: Sean Quah <seanq@matrix.org>
If the previous read marker is pointing to an event that no longer exists
(e.g. due to retention) then assume that the newly given read marker
is newer.
To track changes in MSC2666:
- The change from `/mutual_rooms/{user_id}` to `/mutual_rooms?user_id={user_id}`.
- The addition of `next_batch_token` (and logic).
- Unstable flag now being `uk.half-shot.msc2666.query_mutual_rooms`.
- The error code when your own user is requested.
There are two situations which were previously not properly checked:
1. If the requested URL was replaced with an oEmbed URL, then the
oEmbed URL was not checked against url_preview_url_blacklist.
2. Follow-up URLs (either via autodiscovery of oEmbed or to pre-cache
images) were not checked against url_preview_url_blacklist.
Fix the following `mypy` errors when running `mypy` with Python 3.7:
```
synapse/storage/controllers/stats.py:58: error: "Counter" is not subscriptable, use "typing.Counter" instead [misc]
tests/test_state.py:267: error: "dict" is not subscriptable, use "typing.Dict" instead [misc]
```
Part of https://github.com/matrix-org/synapse/issues/15603
In Python 3.9, `typing` is deprecated and the types are subscriptable (generics) by default, https://peps.python.org/pep-0585/#implementation
Fix:
```
tests/test_state.py:267: error: "dict" is not subscriptable, use "typing.Dict" instead [misc]
```
In Python 3.9, `typing` is deprecated and the types are subscriptable (generics) by default,
https://peps.python.org/pep-0585/#implementation
MSC3389 proposes protecting the relation type & parent event ID
from redaction. This keeps the relation information intact after
redaction which helps with some UX flaws (e.g. deleting an
event causes it to no longer be in a thread, which is confusing).
* Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master.
* Fix typo in drive by.
* Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place)
* Several updates:
1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future.
2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map.
3. Clean up old comments/commented out code.
4. Adjust unit tests to match with new code.
5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template.
* Initial Docs upload
* Changelog
* Missed some commented out code that can go now
* Remove TODO comment that no longer holds true.
* Fix links in docs
* More docs
* Remove debug logging
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Update version to latest, include completeish before/after examples in upgrade notes.
* Fix up and docs too
---------
Co-authored-by: reivilibre <olivier@librepush.net>
Separate out a HTTP client for replication in preparation for
also supporting using UNIX sockets. The major difference from
the base class is that this does not use treq to handle HTTP
requests.
This stops media (and thumbnails) from being accessed from the
listed domains. It does not delete any already locally cached media,
but will prevent accessing it.
Note that admin APIs are unaffected by this change.
MSC3984 proxies /keys/query requests to appservices, but servers will
can also requests devices / keys from the /user/devices endpoint.
The formats are close enough that we can "proxy" that /user/devices to
appservices (by calling /keys/query) and then change the format of the
returned data before returning it over federation.
Behind a configuration flag this adds + to the list of allowed
characters in Matrix IDs. The main feature this enables is
using full E.164 phone numbers as Matrix IDs.
Add an `is_mine_server_name` method, similar to `is_mine_id`.
Ideally we would use this consistently, instead of sometimes comparing
against `hs.hostname` and other times reaching into
`hs.config.server.server_name`.
Also fix a bug in the tests where `hs.hostname` would sometimes differ
from `hs.config.server.server_name`.
Signed-off-by: Sean Quah <seanq@matrix.org>
#15514 introduced a regression where Synapse would encounter
`PartialDownloadError`s when fetching OpenID metadata for certain
providers on startup. Due to #8088, this prevents Synapse from starting
entirely.
Revert the change while we decide what to do about the regression.
Pushers tend to make many connections to the same HTTP host
(e.g. a new event comes in, causes events to be pushed, and then
the homeserver connects to the same host many times). Due to this
the per-host HTTP connection pool size was increased, but this does
not make sense for other SimpleHttpClients.
Add a parameter for the connection pool and override it for pushers
(making a separate SimpleHttpClient for pushers with the increased
configuration).
This returns the HTTP connection pool settings to the default Twisted
ones for non-pusher HTTP clients.
Adds an optional keyword argument to the /relations API which
will recurse a limited number of event relationships.
This will cause the API to return not just the events related to the
parent event, but also events related to those related to the parent
event, etc.
This is disabled by default behind an experimental configuration
flag and is currently implemented using prefixed parameters.
MSC3983 provides a way to request multiple OTKs at once from appservices,
this extends this concept to the Client-Server API.
Note that this will likely be spit out into a separate MSC, but is currently part of
MSC3983.
It can be useful to always return the fallback key when attempting to
claim keys. This adds an unstable endpoint for `/keys/claim` which
always returns fallback keys in addition to one-time-keys.
The fallback key(s) are not marked as "used" unless there are no
corresponding OTKs.
This is currently defined in MSC3983 (although likely to be split out
to a separate MSC). The endpoint shape may change or be requested
differently (i.e. a keyword parameter on the current endpoint), but the
core logic should be reasonable.
Before this change:
* `PerspectivesKeyFetcher` and `ServerKeyFetcher` write to `server_keys_json`.
* `PerspectivesKeyFetcher` also writes to `server_signature_keys`.
* `StoreKeyFetcher` reads from `server_signature_keys`.
After this change:
* `PerspectivesKeyFetcher` and `ServerKeyFetcher` write to `server_keys_json`.
* `PerspectivesKeyFetcher` also writes to `server_signature_keys`.
* `StoreKeyFetcher` reads from `server_keys_json`.
This results in `StoreKeyFetcher` now using the results from `ServerKeyFetcher`
in addition to those from `PerspectivesKeyFetcher`, i.e. keys which are directly
fetched from a server will now be pulled from the database instead of refetched.
An additional minor change is included to avoid creating a `PerspectivesKeyFetcher`
(and checking it) if no `trusted_key_servers` are configured.
The overall impact of this should be better usage of cached results:
* If a server has no trusted key servers configured then it should reduce how often keys
are fetched.
* if a server's trusted key server does not have a requested server's keys cached then it
should reduce how often keys are directly fetched.
These two lines:
```
config_obj = HomeServerConfig()
config_obj.parse_config_dict(config, "", "")
```
are called many times with the exact same value for `config`.
As the test suite is CPU-bound and non-negligeably time is spent in
`parse_config_dict`, this saves ~5% on the overall runtime of the Trial
test suite (tested with both `-j2` and `-j12` on a 12t CPU).
This is sadly rather limited, as the cache cannot be shared between
processes (it contains at least jinja2.Template and RLock objects which
aren't pickleable), and Trial tends to run close tests in different
processes.
* Change `store_server_verify_keys` to take a `Mapping[(str, str), FKR]`
This is because we already can't handle duplicate keys — leads to cardinality violation
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
This moves `redacts` from being a top-level property to
a `content` property in a new room version.
MSC2176 (which was previously implemented) states to not
`redact` this property.
* raise a ConfigError on an invalid app_service_config_files
* changelog
* Move config check to read_config
* Add test
* Ensure list also contains strings
This uses the specced /_matrix/app/v1/... paths instead of the
"legacy" paths. If the homeserver receives an error it will retry
using the legacy path.
* Revert "Fix registering a device on an account with lots of devices (#15348)"
This reverts commit f0d8f66eaa.
* Revert "Delete stale non-e2e devices for users, take 3 (#15183)"
This reverts commit 78cdb72cd6.
If enabled, for users which are exclusively owned by an application
service then the appservice will be queried for devices in addition
to any information stored in the Synapse database.
Previously, we would spin in a tight loop until
`update_state_for_partial_state_event` stopped raising
`FederationPullAttemptBackoffError`s. Replace the spinloop with a wait
until the backoff period has expired.
Signed-off-by: Sean Quah <seanq@matrix.org>
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
Experimental support for MSC3983 is behind a configuration flag.
If enabled, for users which are exclusively owned by an application
service then the appservice will be queried for one-time keys *if*
there are none uploaded to Synapse.
This makes it so that we rely on the `device_id` to delete pushers on logout,
instead of relying on the `access_token_id`. This ensures we're not removing
pushers on token refresh, and prepares for a world without access token IDs
(also known as the OIDC).
This actually runs the `set_device_id_for_pushers` background update, which
was forgotten in #13831.
Note that for backwards compatibility it still deletes pushers based on the
`access_token` until the background update finishes.
Invalid mentions data received over the Client-Server API should
be rejected with a 400 error. This will hopefully stop clients from
sending invalid data, although does not help with data received
over federation.
Additionally:
* Consistently use `freeze()` in test
---------
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: 6543 <6543@obermui.de>
When a room is deleted in Synapse we remove the event forward
extremities in the room, so if (say a bot) tries to send a message into
the room we error out due to not being able to calculate prev events for
the new event *before* we check if the sender is in the room.
Fixes#8094
With Redis commands do not need to be re-issued by the main
process (they fan-out to all processes at once) and thus it is no
longer necessary to worry about them reflecting recursively forever.
* Scaffolding for background process to refresh profiles
* Add scaffolding for background process to refresh profiles for a given server
* Implement the code to select servers to refresh from
* Ensure we don't build up multiple looping calls
* Make `get_profile` able to respect backoffs
* Add logic for refreshing users
* When backing off, schedule a refresh when the backoff is over
* Wake up the background processes when we receive an interesting state event
* Add tests
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add comment about 1<<62
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Tweak docstring and type hint
* Flip logic and provide better name
* Separate decision from action
* Track a set of strings, not EventBases
* Require explicit boolean options from callers
* Add explicit option for partial state rooms
* Changelog
* Rename param
This removes the experimental configuration option and
always escapes the push rule condition keys.
Also escapes any (experimental) push rule condition keys
in the base rules which contain dot in a field name.
Enables MSC3925 support by default, which:
* Includes the full edit event in the bundled aggregations of an
edited event.
* Stops modifying the original event's content to return the new
content from the edit event.
This is a backwards-incompatible change that is considered to be
"correct" by the spec.
This replaces the specific `is_user_mention` push rule condition
used in MSC3952 with the generic `exact_event_property_contains`
push rule condition from MSC3966.
It turns out that no clients rely on server-side aggregation of `m.annotation`
relationships: it's just not very useful as currently implemented.
It's also non-trivial to calculate.
I want to remove it from MSC2677, so to keep the implementation in line, let's
remove it here.
Internally the push rules module uses a `pattern_type` property for `event_match`
conditions (and `related_event_match`) to mark the condition as matching the
current user's Matrix ID or localpart.
This is leaky to the Client-Server API where a user can successfully set a condition
which provides `pattern_type` instead of `pattern` (note that there's no benefit to
doing this -- the user can just use their own Matrix ID or localpart instead). When
serializing back to the client the `pattern_type` property is converted into a proper
`pattern`.
The following changes are made to avoid this:
* Separate the `KnownCondition::EventMatch` enum value into `EventMatch`
and `EventMatchType`, each with their own expected properties. (Note that a
similar change is made for `RelatedEventMatch`.)
* Make it such that the `pattern_type` variants serialize to the same condition kind,
but cannot be deserialized (since they're only provided by base rules).
* As a final tweak, convert `user_id` vs. `user_localpart` values into an enum.
* Admin api to delete event report
* lint + tests
* newsfile
* Apply suggestions from code review
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* revert changes - move to WorkerStore
* update unit test
* Note that timestamp is in millseconds
---------
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>