Fix bug where we don't get new to-device from remote if they resent a
message we've already persisted and have recorded in the DB twice.
`device_federation_inbox` table doesn't have a unique index, and so we
can race and store an entry in there twice. If we do so then
`simple_select_one_txn` will throw an error due to the query returning
more than one row. We should add an unique index, but it doesn't really
matter so lets just handle the case of multiple rows correctly for now.
Fixes up #17333, where we failed to actually send less data (the
`DISTINCT` didn't work due to `stream_id` being different).
We fix this by making it so that every device list outbound poke for a
given user ID has the same stream ID. We can't change the query to only
return e.g. max stream ID as the receivers look up the destinations to
send to by doing `SELECT WHERE stream_id = ?`
A simple change to update the docs where default values were missing.
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [X] Pull request is based on the develop branch
* [X] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [X] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Kim Brose <2803622+HarHarLinks@users.noreply.github.com>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
This is #17291 (which got reverted), with some added fixups, and change
so that tests actually pick up the error.
The problem was that we were not calculating any new chain IDs due to a
missing `not` in a condition.
Reduce the replication traffic of device lists, by not sending every
destination that needs to be sent the device list update over
replication. Instead a "hosts to send to have been calculated"
notification over replication, and then federation senders read the
destinations from the DB.
For non federation senders this should heavily reduce the impact of a
user in many large rooms changing a device.
The parse_integer function was previously made to reject negative values by
default in https://github.com/element-hq/synapse/pull/16920, but the
documentation stated otherwise. This fixes the documentation and also:
- Removes explicit negative=False parameters from call sites.
- Brings the negative default of parse_integer_from_args in alignment with
parse_integer.
Tests / lint-clippy-nightly (push) Has been cancelled
Tests / lint-rustfmt (push) Has been cancelled
Deploy the documentation / GitHub Pages (push) Has been cancelled
Build release artifacts / Build .deb packages (push) Has been cancelled
Tests / lint-pydantic (push) Has been cancelled
Build release artifacts / Attach assets to release (push) Has been cancelled
Tests / check-sampleconfig (push) Has been cancelled
Tests / check-schema-delta (push) Has been cancelled
Tests / lint (push) Has been cancelled
Tests / Typechecking (push) Has been cancelled
Tests / linting-done (push) Has been cancelled
Tests / calculate-test-jobs (push) Has been cancelled
Tests / trial (push) Has been cancelled
Tests / trial-olddeps (push) Has been cancelled
Tests / trial-pypy (all, pypy-3.8) (push) Has been cancelled
Tests / sytest (push) Has been cancelled
Tests / export-data (push) Has been cancelled
Tests / portdb (11, 3.8) (push) Has been cancelled
Tests / portdb (15, 3.11) (push) Has been cancelled
Tests / complement (monolith, Postgres) (push) Has been cancelled
Tests / complement (monolith, SQLite) (push) Has been cancelled
Tests / complement (workers, Postgres) (push) Has been cancelled
Tests / cargo-test (push) Has been cancelled
Tests / cargo-bench (push) Has been cancelled
This reverts commit bdf82efea5 (#17291)
This seems to have stopped persisting auth chains for new events, and so
is causing state res to fall back to the slow methods
We calculate the auth chain links outside of the main persist event
transaction to ensure that we do not block other event sending during
the calculation.
Sort is no longer configurable and we always sort rooms by the `stream_ordering` of the last event in the room or the point where the user can see up to in cases of leave/ban/invite/knock.
Add `event.internal_metadata.instance_name` (the worker instance that persisted the event) to go alongside the existing `event.internal_metadata.stream_ordering`.
`instance_name` is useful to properly compare and query for events with a token since you need to compare both the `stream_ordering` and `instance_name` against the vector clock/`instance_map` in the `RoomStreamToken`.
This is pre-requisite work and may be used in https://github.com/element-hq/synapse/pull/17293
Adding `event.internal_metadata.instance_name` was first mentioned in the initial Sliding Sync PR while pairing with @erikjohnston, see 09609cb0db (diff-5cd773fb307aa754bd3948871ba118b1ef0303f4d72d42a2d21e38242bf4e096R405-R410)
Build release artifacts / Build sdist (push) Waiting to run
Build release artifacts / Attach assets to release (push) Blocked by required conditions
Tests / lint (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / portdb (15, 3.11) (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.8) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (11, 3.8) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
/ Check locked dependencies have sdists (push) Has been cancelled
PR where this was introduced: https://github.com/matrix-org/synapse/pull/14817
### What does this affect?
`get_last_event_in_room_before_stream_ordering(...)` is used in Sync v2 in a lot of different state calculations.
`get_last_event_in_room_before_stream_ordering(...)` is also used in `/rooms/{roomId}/members`
Build release artifacts / Build wheels on ${{ matrix.os }} for ${{ matrix.arch }} (x86_64, ${{ startsWith(github.ref, 'refs/pull/') }}, ubuntu-20.04) (push) Waiting to run
Build release artifacts / Build sdist (push) Waiting to run
Build release artifacts / Attach assets to release (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / lint (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.8) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (11, 3.8) (push) Blocked by required conditions
Tests / portdb (15, 3.11) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
https://github.com/matrix-org/matrix-spec-proposals/pull/4151
This is intended to be enabled by default for immediate use. When FCP is
complete, the unstable endpoint will be dropped and stable endpoint
supported instead - no backwards compatibility is expected for the
unstable endpoint.
Build release artifacts / Build wheels on ${{ matrix.os }} for ${{ matrix.arch }} (x86_64, ${{ startsWith(github.ref, 'refs/pull/') }}, ubuntu-20.04) (push) Waiting to run
Build release artifacts / Build sdist (push) Waiting to run
Build release artifacts / Attach assets to release (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / lint (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.8) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (11, 3.8) (push) Blocked by required conditions
Tests / portdb (15, 3.11) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
Spawning from https://github.com/element-hq/synapse/pull/17187#discussion_r1619492779 around wanting to put `SlidingSyncBody` (parse the request in the rest layer), `SlidingSyncConfig` (from the rest layer, pass to the handler), `SlidingSyncResponse` (pass the response from the handler back to the rest layer to respond) somewhere that doesn't contaminate the imports and cause circular import issues.
- Moved Pydantic parsing models to `synapse/types/rest`
- Moved handler types to `synapse/types/handlers`
Fixes: #17013
Add logging for whether room keys are replaced
This is motivated by the Crypto team who need to diagnose crypto issues.
The existing opentracing logging is not enough because it is not enabled
for all users.
Build release artifacts / Build wheels on ${{ matrix.os }} for ${{ matrix.arch }} (x86_64, ${{ startsWith(github.ref, 'refs/pull/') }}, ubuntu-20.04) (push) Waiting to run
Build release artifacts / Build sdist (push) Waiting to run
Build release artifacts / Attach assets to release (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / lint (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.8) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (11, 3.8) (push) Blocked by required conditions
Tests / portdb (15, 3.11) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Based on [MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575): Sliding Sync
This iteration only focuses on returning the list of room IDs in the sliding window API (without sorting/filtering).
Rooms appear in the Sliding sync response based on:
- `invite`, `join`, `knock`, `ban` membership events
- Kicks (`leave` membership events where `sender` is different from the `user_id`/`state_key`)
- `newly_left` (rooms that were left during the given token range, > `from_token` and <= `to_token`)
- In order for bans/kicks to not show up, you need to `/forget` those rooms. This doesn't modify the event itself though and only adds the `forgotten` flag to `room_memberships` in Synapse. There isn't a way to tell when a room was forgotten at the moment so we can't factor it into the from/to range.
### Example request
`POST http://localhost:8008/_matrix/client/unstable/org.matrix.msc3575/sync`
```json
{
"lists": {
"foo-list": {
"ranges": [ [0, 99] ],
"sort": [ "by_notification_level", "by_recency", "by_name" ],
"required_state": [
["m.room.join_rules", ""],
["m.room.history_visibility", ""],
["m.space.child", "*"]
],
"timeline_limit": 100
}
}
}
```
Response:
```json
{
"next_pos": "s58_224_0_13_10_1_1_16_0_1",
"lists": {
"foo-list": {
"count": 1,
"ops": [
{
"op": "SYNC",
"range": [0, 99],
"room_ids": [
"!MmgikIyFzsuvtnbvVG:my.synapse.linux.server"
]
}
]
}
},
"rooms": {},
"extensions": {}
}
```
Build release artifacts / Build wheels on ${{ matrix.os }} for ${{ matrix.arch }} (x86_64, ${{ startsWith(github.ref, 'refs/pull/') }}, macos-11) (push) Waiting to run
Build release artifacts / Build wheels on ${{ matrix.os }} for ${{ matrix.arch }} (x86_64, ${{ startsWith(github.ref, 'refs/pull/') }}, ubuntu-20.04) (push) Waiting to run
Build release artifacts / Attach assets to release (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / lint (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.8) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (11, 3.8) (push) Blocked by required conditions
Tests / portdb (15, 3.11) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Use fully-qualified `PersistedEventPosition` (`instance_name` and `stream_ordering`) when returning `RoomsForUser` to facilitate proper comparisons and `RoomStreamToken` generation.
Spawning from https://github.com/element-hq/synapse/pull/17187 where we want to utilize this change
Tests / calculate-test-jobs (push) Has been cancelled
Tests / trial (push) Has been cancelled
Tests / trial-olddeps (push) Has been cancelled
Tests / trial-pypy (all, pypy-3.8) (push) Has been cancelled
Tests / sytest (push) Has been cancelled
Deploy the documentation / GitHub Pages (push) Has been cancelled
Build release artifacts / Build .deb packages (push) Has been cancelled
Build release artifacts / Attach assets to release (push) Has been cancelled
Tests / check-sampleconfig (push) Has been cancelled
Tests / check-schema-delta (push) Has been cancelled
Tests / lint (push) Has been cancelled
Tests / Typechecking (push) Has been cancelled
Tests / lint-pydantic (push) Has been cancelled
Tests / lint-clippy (push) Has been cancelled
Tests / lint-clippy-nightly (push) Has been cancelled
Tests / lint-rustfmt (push) Has been cancelled
Tests / linting-done (push) Has been cancelled
Tests / export-data (push) Has been cancelled
Tests / portdb (11, 3.8) (push) Has been cancelled
Tests / portdb (15, 3.11) (push) Has been cancelled
Tests / complement (monolith, Postgres) (push) Has been cancelled
Tests / complement (monolith, SQLite) (push) Has been cancelled
Tests / complement (workers, Postgres) (push) Has been cancelled
Tests / cargo-test (push) Has been cancelled
Tests / cargo-bench (push) Has been cancelled
Tests / tests-done (push) Has been cancelled
Otherwise things will get confused.
An alternative would be to make sure that for lagging stream we don't
return anything (and make sure the returned next_batch token doesn't go
backwards). But that is a faff.
We try and deduplicate in two places: 1) really early on, and 2) just
before we persist the event. The first case was broken due to it
occuring before the profile information was added, and so it thought the
event contents were different.
The second case did catch it and handle it correctly, however doing so
creates a redundant state group leading to bloat.
Fixes#3791
Fixes up #17239
We need to keep the spam check within the `try/except` block. Also makes
it so that we don't enter the top span twice.
Also also ensures that we get the right thumbnail length.
Build release artifacts / Build wheels on ${{ matrix.os }} for ${{ matrix.arch }} (x86_64, ${{ startsWith(github.ref, 'refs/pull/') }}, ubuntu-20.04) (push) Waiting to run
Build release artifacts / Build sdist (push) Waiting to run
Build release artifacts / Attach assets to release (push) Blocked by required conditions
Tests / check-lockfile (push) Waiting to run
Tests / lint (push) Blocked by required conditions
Tests / calculate-test-jobs (push) Blocked by required conditions
Tests / changes (push) Waiting to run
Tests / check-sampleconfig (push) Blocked by required conditions
Tests / check-schema-delta (push) Blocked by required conditions
Tests / Typechecking (push) Blocked by required conditions
Tests / lint-crlf (push) Waiting to run
Tests / lint-newsfile (push) Waiting to run
Tests / lint-pydantic (push) Blocked by required conditions
Tests / lint-clippy (push) Blocked by required conditions
Tests / lint-clippy-nightly (push) Blocked by required conditions
Tests / lint-rustfmt (push) Blocked by required conditions
Tests / linting-done (push) Blocked by required conditions
Tests / trial (push) Blocked by required conditions
Tests / trial-olddeps (push) Blocked by required conditions
Tests / trial-pypy (all, pypy-3.8) (push) Blocked by required conditions
Tests / sytest (push) Blocked by required conditions
Tests / export-data (push) Blocked by required conditions
Tests / portdb (11, 3.8) (push) Blocked by required conditions
Tests / portdb (15, 3.11) (push) Blocked by required conditions
Tests / complement (monolith, Postgres) (push) Blocked by required conditions
Tests / complement (monolith, SQLite) (push) Blocked by required conditions
Tests / complement (workers, Postgres) (push) Blocked by required conditions
Tests / cargo-test (push) Blocked by required conditions
Tests / cargo-bench (push) Blocked by required conditions
Tests / tests-done (push) Blocked by required conditions
There is a problem with `StreamIdGenerator` where it can go backwards
over restarts when a stream ID is requested but then not inserted into
the DB. This is problematic if we want to land #17215, and is generally
a potential cause for all sorts of nastiness.
Instead of trying to fix `StreamIdGenerator`, we may as well move to
`MultiWriterIdGenerator` that does not suffer from this problem (the
latest positions are stored in `stream_positions` table). This involves
adding SQLite support to the class.
This only changes id generators that were already using
`MultiWriterIdGenerator` under postgres, a separate PR will move the
rest of the uses of `StreamIdGenerator` over.
Currently sending a to-device message to a user ID with a dodgy
destination is accepted, but then ends up spamming the logs when we try
and send to the destination.
An alternative would be to reject the request, but I'm slightly nervous
that could break things.
When a module rejects a piece of media we end up trying to close the
same logging context twice.
Instead of fixing the existing code we refactor to use an async context
manager, which is easier to write correctly.
The log format is the same as the request log format, except:
- fields that are specific to HTTP requests have been removed
- the task's params are included at the end of the log line.
These log lines are emitted:
- when the task function finishes — both completion and failure (and I
suppose it is possible for a task to become schedulable again?)
- every 5 minutes whilst it is running
Closes#17217.
---------
Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
This PR ports the logic from the
[synapse_auto_accept_invite](https://github.com/matrix-org/synapse-auto-accept-invite)
module into synapse.
I went with the naive approach of injecting the "module" next to where
third party modules are currently loaded. If there is a better/preferred
way to handle this, I'm all ears. It wasn't obvious to me if there was a
better location to add this logic that would cleanly apply to all
incoming invite events.
Relies on https://github.com/element-hq/synapse/pull/17166 to fix linter
errors.
Re-introduces #17191, and includes #17197 and #17214
The basic idea is to stop calling `get_rooms_for_user` everywhere, and
instead use the table `device_lists_changes_in_room`.
Commits reviewable one-by-one.
Removed `request_key` from the `SyncConfig` (moved outside as its own function parameter) so it doesn't have to flow into `_generate_sync_entry_for_xxx` methods. This way we can separate the concerns of caching from generating the response and reuse the `_generate_sync_entry_for_xxx` functions as we see fit. Plus caching doesn't really have anything to do with the config of sync.
Split from https://github.com/element-hq/synapse/pull/17167
Spawning from https://github.com/element-hq/synapse/pull/17167#discussion_r1601497279
It's almost always more efficient to query the rooms that have device
list changes, rather than looking at the list of all users whose devices
have changed and then look for shared rooms.
This is to allow clients to query the configured federation whitelist.
Disabled by default.
---------
Co-authored-by: Devon Hudson <devonhudson@librem.one>
Co-authored-by: devonh <devon.dmytro@gmail.com>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Linter errors are showing up in #17147 that are unrelated to that PR.
The errors do not currently show up on develop.
This PR aims to resolve the linter errors separately from #17147.
This version change requires a migration to a new API. See
https://pyo3.rs/v0.21.2/migration#from-020-to-021
This will fix the annoying warnings added when using the recent rust
nightly:
> warning: non-local `impl` definition, they should be avoided as they
go against expectation
When there have been lots of changes compared with the number of
entities, we can do a fast(er) path.
Locally I ran some benchmarking, and the comparison seems to give the
best determination of which method we use.
This change will apply the `email` & `picture` provided by OIDC to the
new user account when registering a new user via OIDC. If the user is
directed to the account details form, this change makes sure they have
been selected before applying them, otherwise they are omitted. In
particular, this change ensures the values are carried through when
Synapse has consent configured, and the redirect to the consent form/s
are followed.
I have tested everything manually. Including:
- with/without consent configured
- allowing/not allowing the use of email/avatar (via
`sso_auth_account_details.html`)
- with/without automatic account detail population (by un/commenting the
`localpart_template` option in synapse config).
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [X] Pull request is based on the develop branch
* [X] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [X] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
... when workers are unreachable, etc.
Fixes https://github.com/element-hq/synapse/issues/17117.
The general principle is just to make sure that we propagate any
exceptions to the JsonResource, so that we return an error code to the
sending server. That means that the sending server no longer considers
the message safely sent, so it will retry later.
In the issue, Erik mentions that an alternative solution would be to
persist the to-device messages into a table so that they can be retried.
This might be an improvement for performance, but even if we did that,
we still need this mechanism, since we might be unable to reach the
database. So, if we want to do that, it can be a later follow-up.
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
This makes it easy to store UNIX sockets with correct permissions. Those
would be located in /run/synapse which is the directory used in many
examples in Synapse configuration manual. Additionally, the directory
and sockets are deleted when Synapse is shut down.
This adds functions to transform a Twisted request to the
`http::Request`, and then to send back an `http::Response` through it.
It also imports the SynapseError exception so that we can throw that
from Rust code directly
Example usage of this would be:
```rust
use crate::http::{http_request_from_twisted, http_response_to_twisted, HeaderMapPyExt};
fn handler(twisted_request: &PyAny) -> PyResult<()> {
let request = http_request_from_twisted(twisted_request)?;
let ua: headers::UserAgent = request.headers().typed_get_required()?;
if whatever {
return Err((crate::errors::SynapseError::new(
StatusCode::UNAUTHORIZED,
"Whatever".to_owned
"M_UNAUTHORIZED",
None,
None,
)));
}
let response = Response::new("hello".as_bytes());
http_response_to_twisted(twisted_request, response)?;
Ok(())
}
```
Resurrecting https://github.com/matrix-org/synapse/pull/13918.
This should reduce IOPs incurred by joining to the events table to
lookup stream ordering, which happens in many receipt handling code
paths. Like the previous PR I believe sufficient time has passed between
the original migration in DB schema 72 and now to merge this as-is. It's
highly unlikely that both the migration is still ongoing AND (active)
users still have any receipts prior to that date.
In the unlikely event there is a receipt without a populated
`event_stream_ordering` synapse will behave just as it does now when
receipts exist for events that don't (yet): for push action calculation
the receipts are just ignored.
I've removed the validation on event IDs as this is already covered
here:
59ceabcb97/synapse/handlers/receipts.py (L189-L192)
PR #16942 removed an invalid optimisation that avoided pulling out state
for non-gappy syncs. This causes a large increase in DB usage. c.f.
#16941 for why that optimisation was wrong.
However, we can still optimise in the simple case where the events in
the timeline are a linear chain without any branching/merging of the
DAG.
cc. @richvdh
Before we were pulling out *all* read receipts for a user for every
event we pushed. Instead let's only pull out the relevant receipts.
This also pulled out the event rows for each receipt, causing load on
the events table.
This PR fixes a very, very niche edge-case, but I've got some more work
coming which will otherwise make the problem worse.
The bug happens when the syncing user leaves a room, and has a sync
filter which includes "left" rooms, but sets the timeline limit to 0. In
that case, the state returned in the `state` section is calculated
incorrectly.
The fix is to pass a token corresponding to the point that the user
leaves the room through to `compute_state_delta`.
Requests may require a User-Agent header, and the change in #16972
accidentally removed it, resulting in requests getting rejected causing
login to fail.
Fixes https://github.com/element-hq/synapse/issues/16680, as well as a
related bug, where servers which we had *never* successfully sent an
event to would not be retried.
In order to fix the case of pending to-device messages, we hook into the
existing `wake_destinations_needing_catchup` process, by extending it to
look for destinations that have pending to-device messages. The
federation transmission loop then attempts to send the pending to-device
messages as normal.
When running unit tests, we patch the database connection pool so that
it runs queries "synchronously". This is ok, except that if any queries
are launched before we do the patching, those queries get left in limbo
and never complete.
To fix this, let's change the way we do the switcheroo, by patching out
the method which creates the connection pool in the first place.
I have a use case where I'd like the Synapse image to start up a
postgres instance that I can use, but don't want to force Synapse to use
postgres as well.
This commit prevents postgres from being started when it has already
been explicitly enabled elsewhere.
When a lot of locks are waiting for a single lock, notifying all locks
independently with `call_later` on each release is really costly and
incurs some kind of async contention, where the CPU is spinning a lot
for not much.
The included test is taking around 30s before the change, and 0.5s
after.
It was found following failing tests with
https://github.com/element-hq/synapse/pull/16827.
Background: we have a `matrixdotorg/synapse-workers` docker image, which
is intended for running multiple workers within the same container. That
image includes a `prefix-log` script which, for each line printed to
stdout or stderr by one of the processes, prepends the name of the
process.
This commit disables buffering in that script, so that lines are logged
quickly after they are printed. This makes it much easier to understand
the output, since they then come out in a natural order.
This PR aims to fix#16895, caused by a regression in #7 and not fixed
by #16903. The PR #16903 only fixes a starvation issue, where the CPU
isn't released. There is a second issue, where the execution is blocked.
This theory is supported by the flame graphs provided in #16895 and the
fact that I see the CPU usage reducing and far below the limit.
Since the changes in #7, the method `check_state_independent_auth_rules`
is called with the additional parameter `batched_auth_events`:
6fa13b4f92/synapse/handlers/federation_event.py (L1741-L1743)
It makes the execution enter this if clause, introduced with #151956fa13b4f92/synapse/event_auth.py (L178-L189)
There are two issues in the above code snippet.
First, there is the blocking issue. I'm not entirely sure if this is a
deadlock, starvation, or something different. In the beginning, I
thought the copy operation was responsible. It wasn't. Then I
investigated the nested `store.get_events` inside the function `update`.
This was also not causing the blocking issue. Only when I replaced the
set difference operation (`-` ) with a list comprehension, the blocking
was resolved. Creating and comparing sets with a very large amount of
events seems to be problematic.
This is how the flamegraph looks now while persisting outliers. As you
can see, the execution no longer locks up in the above function.
![output_2024-02-28_13-59-40](https://github.com/element-hq/synapse/assets/13143850/6db9c9ac-484f-47d0-bdde-70abfbd773ec)
Second, the copying here doesn't serve any purpose, because only a
shallow copy is created. This means the same objects from the original
dict are referenced. This fails the intention of protecting these
objects from mutation. The review of the original PR
https://github.com/matrix-org/synapse/pull/15195 had an extensive
discussion about this matter.
Various approaches to copying the auth_events were attempted:
1) Implementing a deepcopy caused issues due to
builtins.EventInternalMetadata not being pickleable.
2) Creating a dict with new objects akin to a deepcopy.
3) Creating a dict with new objects containing only necessary
attributes.
Concluding, there is no easy way to create an actual copy of the
objects. Opting for a deepcopy can significantly strain memory and CPU
resources, making it an inefficient choice. I don't see why the copy is
necessary in the first place. Therefore I'm proposing to remove it
altogether.
After these changes, I was able to successfully join these rooms,
without the main worker locking up:
- #synapse:matrix.org
- #element-android:matrix.org
- #element-web:matrix.org
- #ecips:matrix.org
- #ipfs-chatter:ipfs.io
- #python:matrix.org
- #matrix:matrix.org
Since Synapse 1.76.0, any module which registers a `on_new_event`
callback would brick the ability to join remote rooms.
This is because this callback tried to get the full state of the room,
which would end up in a deadlock.
Related:
https://github.com/matrix-org/synapse-auto-accept-invite/issues/18
The following module would brick the ability to join remote rooms:
```python
from typing import Any, Dict, Literal, Union
import logging
from synapse.module_api import ModuleApi, EventBase
logger = logging.getLogger(__name__)
class MyModule:
def __init__(self, config: None, api: ModuleApi):
self._api = api
self._config = config
self._api.register_third_party_rules_callbacks(
on_new_event=self.on_new_event,
)
async def on_new_event(self, event: EventBase, _state_map: Any) -> None:
logger.info(f"Received new event: {event}")
@staticmethod
def parse_config(_config: Dict[str, Any]) -> None:
return None
```
This is technically a breaking change, as we are now passing partial
state on the `on_new_event` callback.
However, this callback was broken for federated rooms since 1.76.0, and
local rooms have full state anyway, so it's unlikely that it would
change anything.
This adds a counter `synapse_emails_sent_total` for emails sent. They
are broken down by `type`, which are `password_reset`, `registration`,
`add_threepid`, `notification` (matching the methods of `Mailer`).
We do this by adding support to the LRU cache for "extra indices" based
on the cached value. This allows us to efficiently map from room ID to
the cached events and only invalidate those.
List of users not to send out device list updates for when they register
new devices. This is useful to handle bot accounts.
This is undocumented as its mostly a hack to test on matrix.org.
Note: This will still send out device list updates if the device is
later updated, e.g. end to end keys are added.
This basically reverts a change that was in
https://github.com/element-hq/synapse/pull/16833, where we reduced the
batching.
The smaller batching can cause performance issues on busy servers and
databases.
Partially reverts #16796
This is causing errors of the form:
```
Error: Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run
```
for the debs and wheels stages.
There were breaking changes that weren't included in the dependabot
changelog (:/):
https://github.com/actions/upload-artifact#breaking-changes
<!--
Fixes: # <!-- -->
<!--
Supersedes: # <!-- -->
<!--
Follows: # <!-- -->
<!--
Part of: # <!-- -->
Base: `release-v1.100` <!-- git-stack-base-branch:release-v1.100 -->
<!--
This pull request is commit-by-commit review friendly. <!-- -->
<!--
This pull request is intended for commit-by-commit review. <!-- -->
Original commit schedule, with full messages:
<ol>
<li>
Downgrade the `upload-artifact` and `download-artifact` actions to v3
</li>
</ol>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
Pulled out of #16803 since the drive-by cleanup was maybe not as
drive-by as I had hoped.
<!--
Fixes: # <!-- -->
<!--
Supersedes: # <!-- -->
<!--
Follows: # <!-- -->
<!--
Part of: # <!-- -->
Base: `develop` <!-- git-stack-base-branch:develop -->
<!--
This pull request is commit-by-commit review friendly. <!-- -->
<!--
This pull request is intended for commit-by-commit review. <!-- -->
Original commit schedule, with full messages:
<ol>
<li>
Add a --generate-only option
</li>
</ol>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Prior to this PR, if a request to create a public (public as in
published to the rooms directory) room violated the room list
publication rules set in the
[config](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#room_list_publication_rules),
the request to create the room was denied and the room was not created.
This PR changes the behavior such that when a request to create a room
published to the directory violates room list publication rules, the
room is still created but the room is not published to the directory.
The current query supports passing in a list of users, which generates a
query using `user_id = ANY(..)`. This is generates a less efficient
query plan that is notably slower than a simple `user_id = ?` condition.
Note: The new function is mostly a copy and paste and then a
simplification of the existing function.
The crux of the change is to try and make the queries simpler and pull
out fewer rows. Before, there were quite a few joins against subqueries,
which caused postgres to pull out more rows than necessary.
Instead, let's simplify the query and do some of the filtering out in
Python instead, letting Postgres do better optimizations now that it
doesn't have to deal with joins against subqueries.
Review note: this is a complete rewrite of the function, so not sure how
useful the diff is.
---------
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
In previous versions of authlib using `client_secret_basic` without a
`client_secret` would result in an invalid auth header. Since authlib
1.3 it throws an exception.
The configuration may be accepted in by very lax servers, so we don't
want to deny it outright. Instead, let's default the
`client_auth_method` to `none`, which does the right thing. If the
config specifies `client_auth_method` and no `client_secret` then that
is going to be bogus and we should reject it
Sometimes we fail to fetch events during backfill due to missing state,
and we often end up querying the same bad events periodically (as people
backpaginate). In such cases its likely we will continue to fail to get
the state, and therefore we should try *before* loading the state that
we have from the DB (as otherwise it's wasted DB and memory).
---------
Co-authored-by: reivilibre <oliverw@matrix.org>
There are two changes here:
1. Only pull out the required state when handling the request.
2. Change the get filtered state return type to check that we're only
querying state that was requested
---------
Co-authored-by: reivilibre <oliverw@matrix.org>
Instead of persisting outliers in a bunch of batches, let's just do them
all at once.
This is fine because all `_auth_and_persist_outliers_inner` is doing is
checking the auth rules for each event, which requires the events to be
topologically sorted by the auth graph.
The idea here being that the directory server shouldn't advertise rooms
to a requesting server is the requesting server would not be allowed to
join or participate in the room.
<!--
Fixes: # <!-- -->
<!--
Supersedes: # <!-- -->
<!--
Follows: # <!-- -->
<!--
Part of: # <!-- -->
Base: `develop` <!-- git-stack-base-branch:develop -->
<!--
This pull request is commit-by-commit review friendly. <!-- -->
<!--
This pull request is intended for commit-by-commit review. <!-- -->
Original commit schedule, with full messages:
<ol>
<li>
Pass `from_federation_origin` down into room list retrieval code
</li>
<li>
Don't cache /publicRooms response for inbound federated requests
</li>
<li>
fixup! Don't cache /publicRooms response for inbound federated requests
</li>
<li>
Cap the number of /publicRooms entries to 100
</li>
<li>
Simplify code now that you can't request unlimited rooms
</li>
<li>
Filter out rooms from federated requests that don't have the correct ACL
</li>
<li>
Request a handful more when filtering ACLs so that we can try to avoid
shortchanging the requester
</li>
</ol>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
There are a couple of things we need to be careful of here:
1. The current python code does no validation when loading from the DB,
so we need to be careful to ignore such errors (at least on jki.re there
are some old events with internal metadata fields of the wrong type).
2. We want to be memory efficient, as we often have many hundreds of
thousands of events in the cache at a time.
---------
Co-authored-by: Quentin Gliech <quenting@element.io>
We remove these fields as they're just duplicating data the event
already stores, and (for reasons 🤫) I'd like to simplify
the class to only store simple types.
I'm not entirely convinced that we shouldn't instead add helper methods
to the event class to generate stream tokens, but I don't really think
that's where they belong either
This is an extra response parameter just added to MSC3981. In the
current impl, the recursion depth is always 3, so this just returns a
static 3 if the recurse parameter is supplied.
Previously, the response status of `HTMLResource` was hardcoded as
`200`. However, for proper redirection after the user verifies their
email, we require the status to be `302`. This PR addresses that issue
by using `code` as response status.
Added in https://github.com/matrix-org/synapse/pull/16533, this workflow
was intended to be run once to add the version picker to all historical
versions of the https://matrix-org.github.io/synapse documentation
website.
Note that the latest version of the docs built from this repo now exist
at https://element-hq.github.io/synapse/.
The workflow has been run successfully and the version picker was added
to the documentation. Thus we can now delete this workflow.
---
Note: Do not confuse this PR with
https://github.com/matrix-org/synapse/issues/9453. This PR was made
while we were populating this repo with "Dummy issues" after the
changeover from matrix-org/synapse to element-hq/synapse - therefore
referencing this PR may cause some confusion.
Closes:
- https://github.com/matrix-org/synapse/issues/10397
- #10397
An administrator should know whether he wants to set a password or not.
There are many uses cases where a blank password is required.
- Use of only some users with SSO.
- Use of bots with password, users with SSO
* Add `ALTER TABLE ... REPLICA IDENTITY ...` for individual tables
We can't combine them into one file as it makes it likely to hit a deadlock
if Synapse is running, as it only takes one other transaction to access two
tables in a different order to the schema delta.
* Add notes
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Re-introduce REPLICA IDENTITY test
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
poetry-core 1.8.x includes a fix which properly moves the generate
synapse_rust.abi3.so file to the synapse directory when using an
editable install.
Without this change developers are left with a confusing experience
of the synapse.synapse_rust module not being found after installation.
Implement MSC3860 to follow redirects for federated media downloads.
Note that the Client-Server API doesn't support this (yet) since the media
repository in Synapse doesn't have a way of supporting redirects.
* Describe `insert_client_ip`
* Pull out client_ips and MAU tracking to BaseAuth
* Define HAS_AUTHLIB once in tests
sick of copypasting
* Track ips and token usage when delegating auth
* Test that we track MAU and user_ips
* Don't track `__oidc_admin`
Keeping track of a lower bound of stream ID where we've deleted everything below makes the queries much faster. Otherwise, every time we scan for rows to delete we'd re-scan across all the rows that have previously deleted (until the next table VACUUM).
If a worker reconnects to Redis we send out the current positions of all our streams. However, if we're also trying to send out a backlog of RDATA at the same time then we can end up sending a `POSITION` with the current token *before* we've sent all the RDATA before the current token.
This doesn't cause actual bugs as the receiving servers see the POSITION, fetch the relevant rows from the DB, and then ignore the old RDATA as they come in. However, this is inefficient so it'd be better if we didn't send out-of-order positions
* Fix the CI query that did not detect all cases of missing primary keys
* Add more missing REPLICA IDENTITY entries
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add Postgres replica identities to tables that don't have an implicit one
Fixes#16224
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Move the delta to version 83 as we missed the boat for 82
* Add a test that all tables have a REPLICA IDENTITY
* Extend the test to include when indices are deleted
* isort
* black
* Fully qualify `oid` as it is a 'hidden attribute' in Postgres 11
* Update tests/storage/test_database.py
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Add missed tables
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
If simple_{insert,upsert,update}_many_txn is called without any data
to modify then return instead of executing the query.
This matches the behavior of simple_{select,delete}_many_txn.
Fetch information needed for push rule evaluation in parallel.
Ideally this would use query pipelining, but this is not
available in psycopg2.
Due to the database thread pool this may result in little
to no parallelization.
Previously only Twisted's EPollReactor was compatible with the
reactor timing metric, notably not working when asyncio was used.
After this change, the following configurations support the reactor
timing metric:
* poll, epoll, or select reactors
* asyncio reactor with a poll, epoll, select, /dev/poll, or kqueue event loop.
The event persistence code used to handle multiple rooms
at a time, but was simplified to only ever be called with a
single room at a time (different rooms are now handled in
parallel). The code is still generic to multiple rooms causing
a lot of work that is unnecessary (e.g. unnecessary loops, and
partitioning data by room).
This strips out the ability to handle multiple rooms at once, greatly
simplifying the code.
Just to standardize on the normal helpers, it might also have
a slight perf improvement on PostgreSQL which will now use
`ANY (?)` instead of `IN (?, ?, ...)`.
* complement: enable dirty runs
* Add changelog
* Set a low connpool limit when running in Complement
Dirty runs can cause many containers to be running concurrently,
which seems to easily exhaust resources on the host. The increased
speedup from dirty runs also seems to use more db connections on
workers, which are misconfigured currently to have
`SUM(workers * cp_max) > max_connections`, causing
```
FATAL: sorry, too many clients already
```
which results in tests failing.
* Try p=2 concurrency to restrict slowness of servers which causes partial state join tests to flake
* Debug logging
* Only run flakey tests
* Only adjust connection pool limits in worker mode
* Move cp vars to somewhere where they get executed in CI
* Move cp values back to where they actually work
* Debug logging
* Try p=1 to see if this makes worker mode happier
* Remove debug logging
This is mostly useful for federated rooms where some users
would get stuck in the invite or knock state when the room
was purged from their homeserver.