* Remove `on_timeout_cancel` from `timeout_deferred`
The `on_timeout_cancel` param to `timeout_deferred` wasn't always called on a
timeout (in particular if the canceller raised an exception), so it was
unreliable. It was also only used in one place, and to be honest it's easier to
do what it does a different way.
* Fix handling of connection timeouts in outgoing http requests
Turns out that if we get a timeout during connection, then a different
exception is raised, which wasn't always handled correctly.
To fix it, catch the exception in SimpleHttpClient and turn it into a
RequestTimedOutError (which is already a documented exception).
Also add a description to RequestTimedOutError so that we can see which stage
it failed at.
* Fix incorrect handling of timeouts reading federation responses
This was trapping the wrong sort of TimeoutError, so was never being hit.
The effect was relatively minor, but we should fix this so that it does the
expected thing.
* Fix inconsistent handling of `timeout` param between methods
`get_json`, `put_json` and `delete_json` were applying a different timeout to
the response body to `post_json`; bring them in line and test.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
This table was created in #8034 (1.20.0). It references
`ui_auth_sessions`, which is ignored, so this one should be too.
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
* Fix test_verify_json_objects_for_server_awaits_previous_requests
It turns out that this wasn't really testing what it thought it was testing
(in particular, `check_context` was turning failures into success, which was
making the tests pass even though it wasn't clear they should have been.
It was also somewhat overcomplex - we can test what it was trying to test
without mocking out perspectives servers.
* Fix warnings about finished logcontexts in the keyring
We need to make sure that we finish the key fetching magic before we run the
verifying code, to ensure that we don't mess up our logcontexts.
Co-authored-by: Benjamin Koch <bbbsnowball@gmail.com>
This adds configuration flags that will match a user to pre-existing users
when logging in via OpenID Connect. This is useful when switching to
an existing SSO system.
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.
We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:
1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.
The valid operations are then:
1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.
(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
I'd like to get a better insight into what we are doing with respect to state
res. The list of state groups we are resolving across should be short (if it
isn't, that's a massive problem in itself), so it should be fine to log it in
ite entiretly.
I've done some grepping and found approximately zero cases in which the
"shortcut" code delivered the result, so I've ripped that out too.
When updating the `room_stats_state` table, we try to check for null bytes slipping in to the content for state events. It turns out we had added `guest_access` as a field to room_stats_state without including it in the null byte check.
Lo and behold, a null byte in a `m.room.guest_access` event then breaks `room_stats_state` updates.
This PR adds the check for `guest_access`.
This change adds a note and a few lines of configuration settings for Apache users to disable ModSecurity for Synapse's virtual hosts. With ModSecurity enabled and running with its default settings, Matrix clients are unable to send chat messages through the Synapse installation. With this change, ModSecurity can be disabled only for the Synapse virtual hosts.
Fixes: #8359
Trying to reactivate a user with the admin API (`PUT /_synapse/admin/v2/users/<user_name>`) causes an internal server error.
Seems to be a regression in #8033.
* Create a new function to verify that the length of a device name is
under a certain threshold.
* Refactor old code and tests to use said function.
* Verify device name length during registration of device
* Add a test for the above
Signed-off-by: Dionysis Grigoropoulos <dgrig@erethon.com>
==============================
In addition to the below, Synapse 1.20.0rc5 also includes the bug fix that was included in 1.19.3.
Features
--------
- Add flags to the `/versions` endpoint for whether new rooms default to using E2EE. ([\#8343](https://github.com/matrix-org/synapse/issues/8343))
Bugfixes
--------
- Fix rate limiting of federation `/send` requests. ([\#8342](https://github.com/matrix-org/synapse/issues/8342))
- Fix a longstanding bug where back pagination over federation could get stuck if it failed to handle a received event. ([\#8349](https://github.com/matrix-org/synapse/issues/8349))
Internal Changes
----------------
- Blacklist [MSC2753](https://github.com/matrix-org/matrix-doc/pull/2753) SyTests until it is implemented. ([\#8285](https://github.com/matrix-org/synapse/issues/8285))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl9kzk8ACgkQM/xY9qcR
MEim6A//aERkhyLGRlGpLd37lCyFQCeffTMH1rTvu04iIBQBaUZ6g7CYWOpK43zT
U8kt379+5OShjdAXs/X4XP+ucdHVbrwsRSP3hBS/fFLiDT0fJgP8uiSf5QqO6NnT
OqDyXYjcXvj/c6tMKglVtsdh8u4hFwNZjGPMGG68IzJu14uEhnD100cL9jSB9bLB
ongWpsQzzdGBpJPSFRjv9dCUSeRbzyUdl1t0uqzrNqyN9s/JnzFTn7ZYo6y3lnSS
dHGVMMo/12M2PkbBHnbJVvDY5Q/R7ZxyXlpz0gvSNOQIw8FqYFnuB0Niy5dQhXSR
Sy5h4qbczLxqbql1x+lmzeQm4ZMORsW/Tl4C3z6yK6OYaOCJHIf9en4DplTSTqp1
t+85JxWR2wH10d99YHBpaYKmkVovpwgchrO4YWrtXljUFAhhavzf+YiAdOHYT52s
RDsDLsvjMbxEHsz4cHfycmshYhjzjb340wkoDXuQpj0zrO99d+Zd83xdK8pS0UQn
OaljLRAd/5iBjTSyZPSrB1U5141OzlM3QZVJzaYAnP12yhR9eaX2twSCk+lPYOWd
nhLJjNnj1B1XSGArthuE5NLyEiCPz6KyN2RhO0EOx5YjZN9TwH7LS9upyNFe1nN1
GIhO5gz+jWLuBZE3xzRNjJyCx/I/LolpCwGMvKDu6638rpsbrPs=
=tT5/
-----END PGP SIGNATURE-----
Merge tag 'v1.20.0rc5' into develop
Synapse 1.20.0rc5 (2020-09-18)
==============================
In addition to the below, Synapse 1.20.0rc5 also includes the bug fix that was included in 1.19.3.
Features
--------
- Add flags to the `/versions` endpoint for whether new rooms default to using E2EE. ([\#8343](https://github.com/matrix-org/synapse/issues/8343))
Bugfixes
--------
- Fix rate limiting of federation `/send` requests. ([\#8342](https://github.com/matrix-org/synapse/issues/8342))
- Fix a longstanding bug where back pagination over federation could get stuck if it failed to handle a received event. ([\#8349](https://github.com/matrix-org/synapse/issues/8349))
Internal Changes
----------------
- Blacklist [MSC2753](https://github.com/matrix-org/matrix-doc/pull/2753) SyTests until it is implemented. ([\#8285](https://github.com/matrix-org/synapse/issues/8285))
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Fix _set_destination_retry_timings
This came about because the code assumed that retry_interval
could not be NULL — which has been challenged by catch-up.
Add ability for ASes to /login using the `uk.half-shot.msc2778.login.application_service` login `type`.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Instead of just using the most recent extremities let's pick the
ones that will give us results that the pagination request cares about,
i.e. pick extremities only if they have a smaller depth than the
pagination token.
This is useful when we fail to backfill an extremity, as we no longer
get stuck requesting that same extremity repeatedly.
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
Some Linux distros have begun disabling TLSv1.0 and TLSv1.1 by default
for security reasons, for example in Fedora 33 onwards:
https://fedoraproject.org/wiki/Changes/StrongCryptoSettings2
Use TLSv1.2 for the fake TLS servers created in the test suite, to avoid
failures due to OpenSSL disallowing TLSv1.0:
<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines',
'ssl_choose_client_version', 'unsupported protocol')]>
Signed-off-by: Dan Callaghan <djc@djc.id.au>
This PR adds a information about forwarding `/_synapse/client` endpoints through your reverse proxy. The first of these endpoints are introduced in https://github.com/matrix-org/synapse/pull/8004.
The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out *when* the max stream position has caught up to the event and we can notify people about it.
This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens.
The valid operations here are:
1. Is a position before or after a token
2. Fetching all events between two tokens
3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A *or* before B, then P is before C.
Future PR will change the token type to a dedicated type.
This PR adds a confirmation step to resetting your user password between clicking the link in your email and your password actually being reset.
This is to better align our password reset flow with the industry standard of requiring a confirmation from the user after email validation.
If a file cannot be thumbnailed for some reason (e.g. the file is empty), then
catch the exception and convert it to a reasonable error message for the client.
`pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed.
I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.
This fixes an issue where different methods (crop/scale) overwrite each other.
This first tries the new path. If that fails and we are looking for a
remote thumbnail, it tries the old path. If that still isn't found, it
continues as normal.
This should probably be removed in the future, after some of the newer
thumbnails were generated with the new path on most deployments. Then
the overhead should be minimal if the other thumbnails need to be
regenerated.
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
The intention here is to change `StreamToken.room_key` to be a `RoomStreamToken` in a future PR, but that is a big enough change without this refactoring too.
This is a config option ported over from DINUM's Sydent: https://github.com/matrix-org/sydent/pull/285
They've switched to validating 3PIDs via Synapse rather than Sydent, and would like to retain this functionality.
This original purpose for this change is phishing prevention. This solution could also potentially be replaced by a similar one to https://github.com/matrix-org/synapse/pull/8004, but across all `*/submit_token` endpoint.
This option may still be useful to enterprise even with that safeguard in place though, if they want to be absolutely sure that their employees don't follow links to other domains.
This removes `SourcePaginationConfig` and `get_pagination_rows`. The reasoning behind this is that these generic classes/functions erased the types of the IDs it used (i.e. instead of passing around `StreamToken` it'd pass in e.g. `token.room_key`, which don't have uniform types).
By importing from canonicaljson the simplejson module was still being used
in some situations. After this change the std lib json is consistenty used
throughout Synapse.
Fixes https://github.com/matrix-org/synapse/issues/8238
Alongside the delta file, some changes were also necessary to the codebase to remove references to the now defunct `populate_stats_process_rooms_2` background job. Thankfully the latter doesn't seem to have made it into any documentation yet :)
The version 1.3.0 has a bug with unicode charecters:
```
>>> from canonicaljson import encode_pretty_printed_json
>>> encode_pretty_printed_json({'a': 'à'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/erdnaxeli/.pyenv/versions/3.6.7/lib/python3.6/site-packages/canonicaljson.py", line 96, in encode_pretty_printed_json
return _pretty_encoder.encode(json_object).encode("ascii")
UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 12: ordinal not in range(128)
```
Signed-off-by: Alexandre Morignot <erdnaxeli@cervoi.se>
Co-authored-by: Alexandre Morignot <erdnaxeli@cervoi.se>
* Fixup `ALTER TABLE` database queries
Make the new columns nullable, because doing otherwise can wedge a
server with a big database, as setting a default value rewrites the
table.
* Switch back to using the notifications count in the push badge
Clients are likely to be confused if we send a push but the badge count
is the unread messages one, and not the notifications one.
* Changelog
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Add shared_rooms api
* Add changelog
* Add .
* Wrap response in {"rooms": }
* linting
* Add unstable_features key
* Remove options from isort that aren't part of 5.x
`-y` and `-rc` are now default behaviour and no longer exist.
`dont-skip` is no longer required
https://timothycrosley.github.io/isort/CHANGELOG/#500-penny-july-4-2020
* Update imports to make isort happy
* Add changelog
* Update tox.ini file with correct invocation
* fix linting again for isort
* Vendor prefix unstable API
* Fix to match spec
* import Codes
* import Codes
* Use FORBIDDEN
* Update changelog.d/7785.feature
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Implement get_shared_rooms_for_users
* a comma
* trailing whitespace
* Handle the easy feedback
* Switch to using runInteraction
* Add tests
* Feedback
* Seperate unstable endpoint from v2
* Add upgrade node
* a line
* Fix style by adding a blank line at EOF.
* Update synapse/storage/databases/main/user_directory.py
Co-authored-by: Tulir Asokan <tulir@maunium.net>
* Update synapse/storage/databases/main/user_directory.py
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Update UPGRADE.rst
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Fix UPGRADE/CHANGELOG unstable paths
unstable unstable unstable
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>
* Move `get_devices_with_keys_by_user` to `EndToEndKeyWorkerStore`
this seems a better fit for it.
This commit simply moves the existing code: no other changes at all.
* Rename `get_devices_with_keys_by_user`
to better reflect what it does.
* get_device_stream_token abstract method
To avoid referencing fields which are declared in the derived classes, make
`get_device_stream_token` abstract, and define that in the classes which define
`_device_list_id_gen`.
... and to show that it does something slightly different to
`_get_e2e_device_keys_txn`.
`include_all_devices` and `include_deleted_devices` were never used (and
`include_deleted_devices` was broken, since that would cause `None`s in the
result which were not handled in the loop below.
Add some typing too.
This fixes a bug where having multiple callers waiting on the same
stream and position will cause it to try and compare two deferreds,
which fails (due to the sorted list having an entry of `Tuple[int,
Deferred]`).
This is split out from https://github.com/matrix-org/synapse/pull/7438, which had gotten rather large.
`LoginRestServlet` has a couple helper methods, `login_submission_legacy_convert` and `login_id_thirdparty_from_phone`. They're primarily used for converting legacy user login submissions to "identifier" dicts ([see spec](https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-login)). Identifying information such as usernames or 3PID information used to be top-level in the login body. They're now supposed to be put inside an [identifier](https://matrix.org/docs/spec/client_server/r0.6.1#identifier-types) parameter instead.
#7438's purpose is to allow using the new identifier parameter during User-Interactive Authentication, which is currently handled in AuthHandler. That's why I've moved these helper methods there. I also moved the refactoring of these method from #7438 as they're relevant.
#8174 removed the `is_guest` parameter from `get_room_data`, at the same time that #8157 was merged using it, colliding together to break unit tests on develop.
This PR removes the `is_guest` parameter from the call in the broken test.
Uses the same changelog as #8174.
Small cleanup PR.
* Removed the unused `is_guest` argument
* Added a safeguard to a (currently) impossible code path, fixing static checking at the same time.
Add new method ratelimiter.can_requester_do_action and ensure that appservices are exempt from being ratelimited.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
* Don't raise session_id errors on submit_token if request_token_inhibit_3pid_errors is set
* Changelog
* Also wait some time before responding to /requestToken
* Incorporate review
* Update synapse/storage/databases/main/registration.py
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Incorporate review
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Add new method ratelimiter.can_requester_do_action and ensure that appservices are exempt from being ratelimited.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
It's just a thin wrapper around two ID gens to make `get_current_token`
and `get_next` return tuples. This can easily be replaced by calling the
appropriate methods on the underlying ID gens directly.
The function is used for two purposes: 1) for subscribers of streams to
get a token they can use to get further updates with, and 2) for
replication to track position of the writers of the stream.
For streams with a single writer the two scenarios produce the same
result, however the situation becomes complicated for streams with
multiple writers. The current `MultiWriterIdGenerator` does not
correctly handle the first case (which is not an issue as its only used
for the `caches` stream which nothing subscribes to outside of
replication).
If we got an error persisting an event, we would try to remove the push actions
asynchronously, which would lead to a 'Re-starting finished log context'
warning.
I don't think there's any need for this to be asynchronous.
We do this to prevent foot guns. The default config uses a MemoryFilter,
but users are free to change to logging to files directly. If they do
then they have to ensure to set the `filters: [context]` on the right
handler, otherwise records get written with the wrong context.
Instead we move the logic to happen when we generate a record, which is
when we *log* rather than *handle*.
(It's possible to add filters to loggers in the config, however they
don't apply to descendant loggers and so they have to be manually set on
*every* logger used in the code base)
c.f. #8021
A lot of the code here is to change the `Completed 200 OK` logging to include the request URI so that we can drop the `Sending request...` log line.
Some notes:
1. We won't log retries, which may be confusing considering the time taken log line includes retries and sleeps.
2. The `_send_request_with_optional_trailing_slash` will always be logged *without* the forward slash, even if it succeeded only with the forward slash.
* Change default log config to buffer by default.
This batches up writes to the filesystem, which is more efficient for
disk I/O. This means that it can take some time for logs to get written
to disk. Note that ERROR logs (and above) immediately flush the buffer.
This only effects new installs, as we only write the log config if
started with `--generate-config` (in the same way we do for generating
signing keys).
* Default to keeping last 4 days of logs.
This hopefully reduces the amount of logs kept for new servers. Keeping
the last 1GB of logs is likely overkill for new servers, but equally may
not be enough for busy ones.
Instead, we keep the last four days worth of logs, enough so that admins
can investigate any problems that happened over e.g. a long weekend.
This PR:
* Reduces the amount of noise in the `check-newsfragment` CI output by hiding the dependency installation output by default.
* Prints a link to the changelog/debian changelog section of the contributing guide if an error is found.
This has long been something I've wanted to do. Basically the `Daemonize` code
is both too flexible and not flexible enough, in that it offers a bunch of
features that we don't use (changing UID, closing FDs in the child, logging to
syslog) and doesn't offer a bunch that we could do with (redirecting stdout/err
to a file instead of /dev/null; having the parent not exit until the child is
running).
As a first step, I've lifted the Daemonize code and removed the bits we don't
use. This should be a non-functional change. Fixing everything else will come
later.
We've [decided](https://github.com/matrix-org/synapse/issues/5253#issuecomment-665976308) to remove the signature check for v1 lookups.
The signature check has been removed in v2 lookups. v1 lookups are currently deprecated. As mentioned in the above linked issue, this verification was causing deployments for the vector.im and matrix.org IS deployments, and this change is the simplest solution, without being unjustified.
Implementations are encouraged to use the v2 lookup API as it has [increased privacy benefits](https://github.com/matrix-org/matrix-doc/pull/2134).
`StatsHandler` handles updates to the `current_state_delta_stream`, and updates room stats such as the amount of state events, joined users, etc.
However, it counts every new join membership as a new user entering a room (and that user being in another room), whereas it's possible for a user's membership status to go from join -> join, for instance when they change their per-room profile information.
This PR adds a check for join->join membership transitions, and bails out early, as none of the further checks are necessary at that point.
Due to this bug, membership stats in many rooms have ended up being wildly larger than their true values. I am not sure if we also want to include a migration step which recalculates these statistics (possibly using the `_populate_stats_process_rooms` bg update).
Bug introduced in the initial implementation https://github.com/matrix-org/synapse/pull/4338.
Thanks to some slightly overzealous cleanup in the
`delete_old_current_state_events`, it's possible to end up with no
`event_forward_extremities` in a room where we have outstanding local
invites. The user would then get a "no create event in auth events" when trying
to reject the invite.
We can hack around it by using the dangling invite as the prev event.
If there are *no* files with CRLF line endings, then the xargs exits with a
non-zero exit code (as expected), but then, since that is the last thing to
happen in the script, the script as a whole exits non-zero, making the whole
thing fail.
using `if/then/fi` instead of `&& (...)` means that the script exits with a
zero exit code.
IIRC this doesn't break tests because its only hit on reconnection, or something.
Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.
Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.
Handling of incoming typing stream updates from replication was not
hooked up on master, effecting set ups where typing was handled on a
different worker.
This is really only a problem if the master process is also handling
sync requests, which is unlikely for those that are at the stage of
moving typing off.
The other observable effect is that if a worker restarts or a
replication connect drops then the typing worker will issue a
`POSITION typing`, triggering master process to try and stream *all*
typing updates from position 0.
Fixes#7907
Converts tests/rest/admin/test_room.py to have unix file endings after they were accidentally changed in #7613.
Keeping the same changelog as #7613 as it hasn't gone out in a release yet.
If we send out an event which refers to `prev_events` which other servers in
the federation are missing, then (after a round or two of backfill attempts),
they will end up asking us for `/state_ids` at a particular point in the DAG.
As per https://github.com/matrix-org/synapse/issues/7893, this is quite
expensive, and we tend to see lots of very similar requests around the same
time.
We can therefore handle this much more efficiently by using a cache, which (a)
ensures that if we see the same request from multiple servers (or even the same
server, multiple times), then they share the result, and (b) any other servers
that miss the initial excitement can also benefit from the work.
[It's interesting to note that `/state` has a cache for exactly this
reason. `/state` is now essentially unused and replaced with `/state_ids`, but
evidently when we replaced it we forgot to add a cache to the new endpoint.]
For inbound federation requests, if a given remote server makes too many
requests at once, we start stacking them up rather than processing them
immediatedly.
However, that means that there is a fair chance that the requesting server will
disconnect before we start processing the request. In that case, if it was a
read-only request (ie, a GET request), there is absolutely no point in
building a response (and some requests are quite expensive to handle).
Even in the case of a POST request, one of two things will happen:
* Most likely, the requesting server will retry the request and we'll get the
information anyway.
* Even if it doesn't, the requesting server has to assume that we didn't get
the memo, and act accordingly.
In short, we're better off aborting the request at this point rather than
ploughing on with what might be a quite expensive request.
Run `isort`, `flake8` and `black` over the `contrib/` directory and `synctl` script. The latter was already being done in CI, but now the linting script does it too.
Fixes https://github.com/matrix-org/synapse/issues/7910
The [postgres setup docs](https://github.com/matrix-org/synapse/blob/develop/docs/postgres.md#set-up-database) recommend setting up your database with user `synapse_user`.
However, uncommenting the postgres defaults in the sample config leave you with user `synapse`.
This PR switches the sample config to recommend `synapse_user`. Took a me a second to figure this out, so assume this will beneficial to others.
As mentioned in #7397, switching to a debian base should help with multi-arch work to save time on compiling. This is unashamedly based on #6373, but without the extra functionality. Switch python version back to generic 3.7 to always pull the latest. Essentially, keeping this as small as possible. The image is bigger though unfortunately.
It serves no purpose and updating everytime we write to the device inbox
stream means all such transactions will conflict, causing lots of
transaction failures and retries.
When we get behind on replication, we tend to stack up background processes
behind a linearizer. Bg processes are heavy (particularly with respect to
prometheus metrics) and linearizers aren't terribly efficient once the queue
gets long either.
A better approach is to maintain a queue of requests to be processed, and
nominate a single process to work its way through the queue.
Fixes: #7444
When considering rooms to clean up in `delete_old_current_state_events`, skip
rooms which we are creating, which otherwise look a bit like rooms we have
left.
Fixes#7834.
As far as I can tell from the sentry logs, the only time this has actually done
anything in the last two years is when we had two master workers running at
once, and even then, it made a bit of a mess of it (see
https://github.com/matrix-org/synapse/issues/7845#issuecomment-658238739).
Generally I feel like this code is doing more harm than good.
The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.
When fetching the state of a room over federation we receive the event
IDs of the state and auth chain. We then fetch those events that we
don't already have.
However, we used a function that recursively fetched any missing auth
events for the fetched events, which can lead to a lot of recursion if
the server is missing most of the auth chain. This work is entirely
pointless because would have queued up the missing events in the auth
chain to be fetched already.
Let's just diable the recursion, since it only gets called from one
place anyway.
Fixes#2181.
The basic premise is that, when we
fail to reject an invite via the remote server, we can generate our own
out-of-band leave event and persist it as an outlier, so that we have something
to send to the client.
* Starting with apt 1.6, https support has moved into the main package and apt-transport-https has become a transitional dummy package.
Signed-off-by: Dirk Heinrichs <dirk.heinrichs@altum.de>
This table is no longer used, so we may as well stop populating it. Removing it
would prevent people rolling back to older releases of Synapse, so that can
happen in a future release.
* Fix spec compliance; tweaks without values are valid
(default to True, which is only concretely specified for
`highlight`, but it seems only reasonable to generalise)
* Changelog for 7766.
* Add documentation to `tweaks_for_actions`
May as well tidy up when I'm here.
* Add a test for `tweaks_for_actions`
Fixes https://github.com/matrix-org/synapse/issues/7641
The package was pinned to <0.8.0 without an obvious reasoning with
7ad1d7635
in https://github.com/matrix-org/synapse/pull/5636
while the version selection looks to just try to exclude an arbitrary
next minor version number that might introduce API breaking changes.
Selecting the next minor number might be a good conservative selection.
Downstream distributions already reported success patching out the version
requirements.
This also fixes the integration of upgraded packages into openSUSE packages,
e.g. for openSUSE Tumbleweed which already ships prometheus_client >= 0.8 .
Signed-off-by: Oliver Kurz <okurz@suse.de>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
==============================
Synapse 1.16.0rc2 includes the security fixes released with Synapse 1.15.2.
Please see [below](https://github.com/matrix-org/synapse/blob/master/CHANGES.md#synapse-1152-2020-07-02) for more details.
Improved Documentation
----------------------
- Update postgres image in example `docker-compose.yaml` to tag `12-alpine`. ([\#7696](https://github.com/matrix-org/synapse/issues/7696))
Internal Changes
----------------
- Add some metrics for inbound and outbound federation latencies: `synapse_federation_server_pdu_process_time` and `synapse_event_processing_lag_by_event`. ([\#7771](https://github.com/matrix-org/synapse/issues/7771))
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEF3tZXk38tRDFVnUIM/xY9qcRMEgFAl79+qgACgkQM/xY9qcR
MEhcaRAAjWLW3ojN1F0DUfE85jziZK2VdnMQC3g+uEOLX6QRbfqFNaNNMjLdK+vl
K/+2ZoHkRsg6g8noSPhPmI1z1+hb5xDJaxjltzHxonIipW8XSU8o2PQMkf8O/BAy
VS58y3GyLkhEgzWC+/hcII+LBgcqXpLuNM0xrKTHmxclIjdewlwe1v+hxkP+6wsX
9Whhn1f4sNHrCtyFVK9uzMFcVyzcQaiWZRjEDMj2uR7rWT6UbCUifN/G4fWmtGbY
xWoNoC4Qv8xiqXOG4U7juPp9T3bRyWMKyjBFM5PWO6Ec2zfafDyFzhBxJhlQhODG
g21tS4PowX/dM/pBpJFEOPh1BVrPZzzTD+YMmTcd3NO79HeaQGqEX/+tzFCFUyPp
0daJK3Y85+l5w/M09WU8DDN8CiR3PFJyGDIZp+nweMsiJZkbEbLOkh1tx6TL+5/6
zwewU6cq8nTVGrn53Tn58l8C7Sj4w+Qk+1XDzymAoidyoWqAKW9Y/fw53PaViUSx
voDu0rpsEUXR1OzCBG8SAPQCFy9gdEWV04OvIpzHuq2uojkz66f7NAXy+Wz+Occ9
AYb/s6Ei80bGCLgRd5jg+myqavwRbzCyv+LIC6dxpopxZJ3AzrFuD11eXKtrIxOC
FZYf3U4KeBk4Q9TV5IFV1xcGFrq5aK36LdmP6WOsEl3PXVT9p/Q=
=YaJn
-----END PGP SIGNATURE-----
Merge tag 'v1.16.0rc2' into develop
Synapse 1.16.0rc2 (2020-07-02)
==============================
Synapse 1.16.0rc2 includes the security fixes released with Synapse 1.15.2.
Please see [below](https://github.com/matrix-org/synapse/blob/master/CHANGES.md#synapse-1152-2020-07-02) for more details.
Improved Documentation
----------------------
- Update postgres image in example `docker-compose.yaml` to tag `12-alpine`. ([\#7696](https://github.com/matrix-org/synapse/issues/7696))
Internal Changes
----------------
- Add some metrics for inbound and outbound federation latencies: `synapse_federation_server_pdu_process_time` and `synapse_event_processing_lag_by_event`. ([\#7771](https://github.com/matrix-org/synapse/issues/7771))
- Remove the requirement for a specific version of Python
- Move dep comment to a separate line, Tox 3.7.0 like trailing ones
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
State res v2 across large data sets can be very CPU intensive, and if
all the relevant events are in the cache the algorithm will run from
start to finish within a single reactor tick. This can result in
blocking the reactor tick for several seconds, which can have major
repercussions on other requests.
To fix this we simply add the occaisonal `sleep(0)` during iterations to
yield execution until the next reactor tick. The aim is to only do this
for large data sets so that we don't impact otherwise quick resolutions.=
HTTP requires the response to contain a Content-Length header unless chunked encoding is being used.
Prometheus metrics endpoint did not set this, causing software such as prometheus-proxy to not be able to scrape synapse for metrics.
Signed-off-by: Christian Svensson <blue@cmd.nu>
Older versions of `parameterized` package have no `parameterized_class` decorator. This decorator is used in tests.
Signed-off-by: Oleg Girko <ol@infoserver.lv>
* Always return an unread_count in get_unread_event_push_actions_by_room_for_user
* Don't always expect unread_count to be there so we don't take out sync entirely if something goes wrong
This requires a new config option to specify which media repo should be
responsible for running background jobs to e.g. clear out expired URL
preview caches.
The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.
This ended up being a bit more invasive than I'd hoped for (not helped by
generic_worker duplicating some of the code from homeserver), but hopefully
it's an improvement.
The idea is that, rather than storing unstructured `dict`s in the config for
the listener configurations, we instead parse it into a structured
`ListenerConfig` object.
Fixes https://github.com/matrix-org/synapse/issues/7683
Broke in: #7649
We had a `yield` acting on a coroutine. To be fair this one is a bit difficult to notice as there's a function in the middle that just passes the coroutine along.
The spec [states](https://matrix.org/docs/spec/client_server/r0.6.1#phone-number) that `m.id.phone` requires the field `country` and `phone`.
In Synapse, we've been enforcing `country` and `number`.
I am not currently sure whether this affects any client implementations.
This issue was introduced in #1994.
Fixes https://github.com/matrix-org/synapse/issues/2431
Adds config option `encryption_enabled_by_default_for_room_type`, which determines whether encryption should be enabled with the default encryption algorithm in private or public rooms upon creation. Whether the room is private or public is decided based upon the room creation preset that is used.
Part of this PR is also pulling out all of the individual instances of `m.megolm.v1.aes-sha2` into a constant variable to eliminate typos ala https://github.com/matrix-org/synapse/pull/7637
Based on #7637
* Ensure account data stream IDs are unique.
The account data stream is shared between three tables, and the maximum
allocated ID was tracked in a dedicated table. Updating the max ID
happened outside the transaction that allocated the ID, leading to a
race where if the server was restarted then the same ID could be
allocated but the max ID failed to be updated, leading it to be reused.
The ID generators have support for tracking across multiple tables, so
we may as well use that instead of a dedicated table.
* Fix bug in account data replication stream.
If the same stream ID was used in both global and room account data then
the getting updates for the replication stream would fail due to
`heapq.merge(..)` trying to compare a `str` with a `None`. (This is
because you'd have two rows like `(534, '!room')` and `(534, None)` from
the room and global account data tables).
Fix is just to order by stream ID, since we don't rely on the ordering
beyond that. The bug where stream IDs can be reused should be fixed now,
so this case shouldn't happen going forward.
Fixes#7617
While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:
* Rather undocumented, and
* causing a *lot* of config checks
This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation.
Best to be reviewed commit-by-commit.
Calls `self.get_success` on all deferred methods instead of abusing `self.pump()`. This has the benefit of working with coroutines, as well as checking that method execution completed successfully.
There are also a few small cleanups that I made in the process.
* Expose `return_html_error`, and allow it to take a Jinja2 template instead of a raw string
* Clean up exception handling in SAML2ResponseResource
* use the existing code in `return_html_error` instead of re-implementing it
(giving it a jinja2 template rather than inventing a new form of template)
* do the exception-catching in the REST layer rather than in the handler
layer, to make sure we catch all exceptions.
It looks like `user_device_resync` was ignoring cross-signing keys from the results received from the remote server. This patch fixes this, by processing these keys using the same process `_handle_signing_key_updates` does (and effectively factor that part out of that function).
The query keeps showing up in my slow query log.
This changes the plan under the top-level Sort node from
```
WindowAgg (cost=280335.88..292963.15 rows=561212 width=80) (actual time=138.651..160.562 rows=27112 loops=1)
-> Sort (cost=280335.88..281738.91 rows=561212 width=84) (actual time=138.597..140.622 rows=27112 loops=1)
Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group
Sort Method: quicksort Memory: 4581kB
-> Nested Loop (cost=2.83..226745.22 rows=561212 width=84) (actual time=21.548..47.657 rows=27112 loops=1)
-> HashAggregate (cost=2.27..3.28 rows=101 width=8) (actual time=21.526..21.535 rows=20 loops=1)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..2.02 rows=101 width=8) (actual time=21.280..21.493 rows=20 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.005..0.991 rows=1356 loops=20)
Index Cond: (state_group = state.state_group)
```
to
```
Nested Loop (cost=2.83..226745.22 rows=561212 width=84) (actual time=24.194..52.834 rows=27112 loops=1)
-> HashAggregate (cost=2.27..3.28 rows=101 width=8) (actual time=24.130..24.138 rows=20 loops=1)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..2.02 rows=101 width=8) (actual time=23.887..24.113 rows=20 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.016..1.159 rows=1356 loops=20)
Index Cond: (state_group = state.state_group)
```
This cuts the execution time from ~190ms to ~130ms, i.e. a reduction
of ~30%.
The full plans are visualised at https://explain.depesz.com/s/WpbT and
https://explain.depesz.com/s/KlEk
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Without this patch, if an error happens which isn't caught by `user_device_resync`, then `_maybe_retry_device_resync` would fail, without retrying the next users in the iteration. This patch fixes this so that it now only logs an error in this case.
Synapse was added to the ports tree in Nov, 2019 by Renaud Allard (https://marc.info/?l=openbsd-ports&m=157417848805329).
With the release of OpenBSD 6.7 on May 22, 2020 a pre-compiled binary is available as well.
'client_auth_method' commented out value was erronously 'client_auth_basic',
when code and docstring says it should be 'client_secret_basic'.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
The bg update never managed to complete, because it kept being interrupted by
transactions which want to take a lock.
Just doing it in the foreground isn't that bad, and is a good deal simpler.
A couple of changes of significance:
* remove the `_last_ack < federation_position` condition, so that
updates will still be correctly processed after restart
* Correctly wire up send_federation_ack to the right class.
we can use `make_in_list_sql_clause` rather than doing our own half-baked
equivalent, which has the benefit of working just fine with empty lists.
(This has quite a lot of tests, so I think it's pretty safe)
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).
Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.
People probably want to look at this commit by commit.
Instead of doing a complicated dance of deleting and moving aliases one
by one, which sends a canonical alias update into the old room for each
one, lets do it all in one go.
This also changes the function to move *all* local alias events to the new
room, however that happens later on anyway.
PyPy's gc.get_stats() returns an object containing detailed allocator statistics
which could be beneficial to collect as metrics.
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it.
https://github.com/matrix-org/synapse/pull/6776 introduced some code infrastructure to mark device lists as stale/out of sync.
This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now.
It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`.
Fixes#7418
We don't really make any promises about returning accurate presence data when
presence is disabled, so we may as well just return a static response, rather
than making the master handle a request.
`_is_server_still_joined` will throw if it is given state updates with non-user ID state keys with local user leaves. This is actually rarely a problem since local leaves almost always get persisted by themselves.
(I discovered this on a branch that was otherwise broken, so I haven't seen this in the wild)
* If an error occurs when stopping a process synctl now logs a warning.
* During a restart, synctl will avoid attempting to start Synapse if an error
occurs during stopping Synapse.
Make sure that the AccountDataStream presents complete updates, in the right
order.
This is much the same fix as #7337 and #7358, but applied to a different stream.
This is required as both event persistence and the background update needs access to this function. It should be perfectly safe for two workers to write to that table at the same time.
This allows us to have the logic on both master and workers, which is necessary to move event persistence off master.
We also combine the instantiation of ID generators from DataStore and slave stores to the base worker stores. This allows us to select which process writes events independently of the master/worker splits.
The specific headers that are passed using this new configuration format
are Host and X-Forwarded-For, which should be all that's required.
Note that for production another matcher should be added in the first
section to properly handle the base_url lookup:
reverse_proxy /.well-known/matrix/* http://localhost:8008
Signed-off-by: Jeff Peeler <jpeeler@gmail.com>
The aim here is to get to a stage where we have a `PersistEventStore` that holds all the write methods used during event persistence, so that we can take that class out of the `DataStore` mixin and instansiate it separately. This will allow us to instansiate it on processes other than master, while also ensuring it is only available on processes that are configured to write to events stream.
This is a bit of an architectural change, where we end up with multiple classes per data store (rather than one per data store we have now). We end up having:
1. Storage classes that provide high level APIs that can talk to multiple data stores.
2. Data store modules that consist of classes that must point at the same database instance.
3. Classes in a data store that can be instantiated on processes depending on config.
Before all streams were only written to from master, so only master needed to respond to `REPLICATE` commands.
Before all instances wrote to the cache invalidation stream, but didn't respond to `REPLICATE`. This was a bug, which could lead to missed rows from cache invalidation stream if an instance is restarted, however all the caches would be empty in that case so it wasn't a problem.
Proactively send out `POSITION` commands (as if we had just received a `REPLICATE`) when we connect to Redis. This is important as other instances won't notice we've connected to issue a `REPLICATE` command (unlike for direct TCP connections). This is only currently an issue if master process reconnects without restarting (if it restarts then it won't have written anything and so other instances probably won't have missed anything).
* release-v1.13.0:
Don't UPGRADE database rows
RST indenting
Put rollback instructions in upgrade notes
Fix changelog typo
Oh yeah, RST
Absolute URL it is then
Fix upgrade notes link
Provide summary of upgrade issues in changelog. Fix )
Move next version notes from changelog to upgrade notes
Changelog fixes
1.13.0rc1
Documentation on setting up redis (#7446)
Rework UI Auth session validation for registration (#7455)
Fix errors from malformed log line (#7454)
Drop support for redis.dbid (#7450)
For the record, the reason we need this is as follows:
each RDATA command comes down the redis pipe as a subscription message. txredisapi as written needs at least three reactor ticks to read each subscription message from the tcp buffer. Hence, once the process gets loaded, it starts getting behind, and eventually redis knifes the connection. it then takes ages for the master to work its way through the backlog, before it reconnects again, during which any commands from any workers are dropped.
An update of check-manifest shone some light on some issues with MANIFEST.in, specifically that we didn't ignore/prune the contrib directory, and that we were using prune instead of exclude for files. This fixes both issues.
Fixes#7403
We forgot to set the password on the subscriber connection, as well as
not calling super methods for overridden connectionMade/connectionLost
functions.
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
By persisting the user interactive authentication sessions to the database, this fixes
situations where a user hits different works throughout their auth session and also
allows sessions to persist through restarts of Synapse.
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.
The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.
Fixes#7334.
* Factor out functions for injecting events into database
I want to add some more flexibility to the tools for injecting events into the
database, and I don't want to clutter up HomeserverTestCase with them, so let's
factor them out to a new file.
* Rework TestReplicationDataHandler
This wasn't very easy to work with: the mock wrapping was largely superfluous,
and it's useful to be able to inspect the received rows, and clear out the
received list.
* Fix AssertionErrors being thrown by EventsStream
Part of the problem was that there was an off-by-one error in the assertion,
but also the limit logic was too simple. Fix it all up and add some tests.
Specifically some tests for the typing stream, which means we test streams that fetch missing updates via HTTP (rather than via the DB).
We also shuffle things around a bit so that we create two separate `HomeServer` objects, rather than trying to insert a slaved store into places.
Note: `test_typing.py` is heavily inspired by `test_receipts.py`
It doesn't seem to be documented anywhere and means that you suddenly start losing metrics without any obvious reason when you go from monolith to workers (e.g. #7312).
If the admin adds a `.yaml` file that's either empty or doesn't parse into a dict to a config directory (e.g. `conf.d` for debs installs), stuff like https://github.com/matrix-org/synapse/issues/7322 would happen. This PR checks that the file is correctly parsed into a dict, or ignores it with a warning if it parses into any other type (including `None` for empty files).
Fixes https://github.com/matrix-org/synapse/issues/7322
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
==============================
Features
--------
- Always send users their own device updates. ([\#7160](https://github.com/matrix-org/synapse/issues/7160))
- Add support for handling GET requests for `account_data` on a worker. ([\#7311](https://github.com/matrix-org/synapse/issues/7311))
Bugfixes
--------
- Fix a bug that prevented cross-signing with users on worker-mode synapses. ([\#7255](https://github.com/matrix-org/synapse/issues/7255))
- Do not treat display names as globs in push rules. ([\#7271](https://github.com/matrix-org/synapse/issues/7271))
- Fix a bug with cross-signing devices belonging to remote users who did not share a room with any user on the local homeserver. ([\#7289](https://github.com/matrix-org/synapse/issues/7289))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl6gS+IACgkQOSor00I9
eP8CTwf+KLehHQMcrz0yoHN+/O86mpNv3Uz83BJ0qDIgezVbtqT7+ZwnK2Jb8dwM
zTcGRq5a6V+JLdfSTnXfrPzvAPt0WLaEENSAEGwZ6unqqxcG2PgATkRb0XtY8Rzh
Feu8WgJTnORbiiM3sBLI8toZMxal3X3ZcftPEAan1SaoyqLPQ2s14edDwgx64S5e
rBGzPDNSxZU1McNBq7Q11QTg3zgBUF+TYwyhnLbC5X1RdDwQpKhNZhwhYBS1yOD6
TsPjtlixC+HXb21Rmob7XY8VFxTBSBdUdtVQKAT4h3+l9bNFQLpBfDSVc7xkPiLt
/Gj+PL+YgY9xQDvHO2fVw8m4ololZw==
=p53e
-----END PGP SIGNATURE-----
Merge tag 'v1.12.4rc1' into develop
Synapse 1.12.4rc1 (2020-04-22)
==============================
Features
--------
- Always send users their own device updates. ([\#7160](https://github.com/matrix-org/synapse/issues/7160))
- Add support for handling GET requests for `account_data` on a worker. ([\#7311](https://github.com/matrix-org/synapse/issues/7311))
Bugfixes
--------
- Fix a bug that prevented cross-signing with users on worker-mode synapses. ([\#7255](https://github.com/matrix-org/synapse/issues/7255))
- Do not treat display names as globs in push rules. ([\#7271](https://github.com/matrix-org/synapse/issues/7271))
- Fix a bug with cross-signing devices belonging to remote users who did not share a room with any user on the local homeserver. ([\#7289](https://github.com/matrix-org/synapse/issues/7289))
First some background: StreamChangeCache is used to keep track of what "entities" have
changed since a given stream ID. So for example, we might use it to keep track of when the last
to-device message for a given user was received [1], and hence whether we need to pull any to-device messages from the database on a sync [2].
Now, it turns out that StreamChangeCache didn't support more than one thing being changed at
a given stream_id (this was part of the problem with #7206). However, it's entirely valid to send
to-device messages to more than one user at a time.
As it turns out, this did in fact work, because *some* methods of StreamChangeCache coped
ok with having multiple things changing on the same stream ID, and it seems we never actually
use the methods which don't work on the stream change caches where we allow multiple
changes at the same stream ID. But that feels horribly fragile, hence: let's update
StreamChangeCache to properly support this, and add some typing and some more tests while
we're at it.
[1]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L301
[2]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L47-L51
Splitting based on the response code means we can avoid double logging here and identical information from line 164 while still logging at info if we don't get a good response and need to retry.
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.
This code was introduced in #7024, and I hope this fixes#7206.
Add changelog
Save retrieved keys to the db
lint
Fix and de-brittle remote result dict processing
Use query_user_devices instead, assume only master, self_signing key types
Make changelog more useful
Remove very specific exception handling
Wrap get_verify_key_from_cross_signing_key in a try/except
Note that _get_e2e_cross_signing_verify_key can raise a SynapseError
lint
Add comment explaining why this is useful
Only fetch master and self_signing key types
Fix log statements, docstrings
Remove extraneous items from remote query try/except
lint
Factor key retrieval out into a separate function
Send device updates, modeled after SigningKeyEduUpdater._handle_signing_key_updates
Update method docstring
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290.
After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.
I've also introduced a couple of new types, to take out some duplication.
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.
(think this was introduced in #7024)
I don't really remember why this was so complicated; I think it dates
back to the time when we had to instantiate the Config classes before
we could call `add_arguments` - ie before #5597. In any case, I don't
think there's a good reason for it any more, and the impact of it
being complicated is that `--help` doesn't work correctly.
We pass --daemonize on the commandline, which (since at least #4853) overrides
whatever the config file, so there is no need for it to be set in the config
file.
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
Fixes#6815
Before figuring out whether we should alert a user on MAU, we call get_notice_room_for_user to get some info on the existing server notices room for this user. This function, if the room doesn't exist, creates it and invites the user in it. This means that, if we decide later that no server notice is needed, the user gets invited in a room with no message in it. This happens at every restart of the server, since the room ID returned by get_notice_room_for_user is cached.
This PR fixes that by moving the inviting bit to a dedicated function, that's only called when the server actually needs to send a notice to the user. A potential issue with this approach is that the room that's created by get_notice_room_for_user doesn't match how that same function looks for an existing room (i.e. it creates a room that doesn't have an invite or a join for the current user in it, so it could lead to a new room being created each time a user syncs), but I'm not sure this is a problem given it's cached until the server restarts, so that function won't run very often.
It also renames get_notice_room_for_user into get_or_create_notice_room_for_user to make what it does clearer.
Occasionally we could get a federation device list update transaction which
looked like:
```
[
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D2', 'prev_id': [], 'stream_id': 12, 'deleted': True}},
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D1', 'prev_id': [12], 'stream_id': 11, 'deleted': True}},
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D3', 'prev_id': [11], 'stream_id': 13, 'deleted': True}}
]
```
Having `stream_ids` which are lower than `prev_ids` looks odd. It might work
(I'm not actually sure), but in any case it doesn't seem like a reasonable
thing to expect other implementations to support.
This broke in a recent PR (#7024) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
If there was an exception setting up one of the attributes of the Homeserver
god object, then future attempts to fetch that attribute would raise a
confusing "Cyclic dependency" error. Let's make sure that we clear the
`building` flag so that we just get the original exception.
Ref: #7169
* Remove `conn_id` usage for UserSyncCommand.
Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.
This really only effects UserSyncCommand.
* Add CLEAR_USER_SYNCS command that is sent on shutdown.
This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
That fallback sets the redirect URL to itself (so it can process the login
token then return gracefully to the client). This would make it pointless to
ask the user for confirmation, since the URL the confirmation page would be
showing wouldn't be the client's.
* Don't show the login forms if we're currently logging in with a
password or a token.
* Submit directly the SSO login form, showing only a spinner to the
user, in order to eliminate from the clunkiness of SSO through this
fallback.
* change debian package from python3-virtualenv to virtualenv
The virtualenv package is needed for the virtualenv command. The
virtualenv package depends on python3-virtualenv (at least since
debian jessie) so there is no need to specify python3-virtualenv
additionally.
Signed-off-by: Vieno Hakkerinen <vieno@hakkerinen.eu>
* Add changelog
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
* Pull Sentinel out of LoggingContext
... and drop a few unnecessary references to it
* Factor out LoggingContext.current_context
move `current_context` and `set_context` out to top-level functions.
Mostly this means that I can more easily trace what's actually referring to
LoggingContext, but I think it's generally neater.
* move copy-to-parent into `stop`
this really just makes `start` and `stop` more symetric. It also means that it
behaves correctly if you manually `set_log_context` rather than using the
context manager.
* Replace `LoggingContext.alive` with `finished`
Turn `alive` into `finished` and make it a bit better defined.
* Add 'device_lists_outbound_pokes' as extra table.
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
* Change device lists stream to have one row per id.
This will make it possible to process the streams more incrementally,
avoiding having to process large chunks at once.
* Change device list replication to match new semantics.
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
* Newsfile
* Remove handling of multiple rows per ID
* Fix worker handling
* Comments from review
This should be safe to do on all workers/masters because it is guarded by
a config option which will ensure it is only actually done on the worker
assigned as a pusher.
It was originally implemented by pulling the full auth chain of all
state sets out of the database and doing set comparison. However, that
can take a lot work if the state and auth chains are large.
Instead, lets try and fetch the auth chains at the same time and
calculate the difference on the fly, allowing us to bail early if all
the auth chains converge. Assuming that the auth chains do converge more
often than not, this should improve performance. Hopefully.
Fixes#7065
This is basically the same as https://github.com/matrix-org/synapse/pull/6847 except it tries to populate events from `state_events` rather than `current_state_events`, since the latter might have been cleared from the state of some rooms too early, leaving them with a `NULL` room version.
This is a bit fiddly because it all has to be done on one fell swoop:
* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
This currently causes presence notify code to log exceptions when there
is no state changes to process. This doesn't actually cause any problems
as we'd simply do nothing anyway.
Support for getting TLS certificates through ACMEv1 ended on November 2019.
Signed-off-by: Uday Bansal <43824981+udaybansal19@users.noreply.github.com>
Instead lets just warn if the worker has a media listener configured but
has the media repository disabled.
Previously non media repository workers would just ignore the media
listener.
When we get an invite over federation, store the room version in the rooms table.
The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
This is intended as a precursor to storing room versions when we receive an
invite over federation, but has the happy side-effect of fixing #3374 at last.
In short: change the store_room with try/except to a proper upsert which
updates the right columns.
* Give `notif_template_html`, `notif_template_text` default values (fixes#6960)
* Don't complain if `smtp_host` and `smtp_port` are unset, since they have sensible defaults (fixes#6961)
* Set the example for `enable_notifs` to `True`, for consistency and because it's more useful
* Raise errors as ConfigError rather than RuntimeError for nicer formatting
The state res v2 algorithm only cares about the difference between auth
chains, so we can pass in the known common state to the `get_auth_chain`
storage function so that it can ignore those events.
* Increase DB/CPU perf of `_is_server_still_joined` check.
For rooms with large amount of state a single user leaving could cause
us to go and load a lot of membership events and then pull out
membership state in a large number of batches.
* Newsfile
* Update synapse/storage/persist_events.py
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Fix adding if too soon
* Update docstring
* Review comments
* Woops typo
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Reject device display names that are too long.
Too long is currently defined as 100 characters in length.
* Add a regression test for rejecting a too long device display name.
==============================
Features
--------
- Filter out m.room.aliases from the CS API to mitigate abuse while a better solution is specced. ([\#6878](https://github.com/matrix-org/synapse/issues/6878))
Internal Changes
----------------
- Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](https://github.com/matrix-org/synapse/issues/6880))
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEv27Axt/F4vrTL/8QOSor00I9eP8FAl5BLBwACgkQOSor00I9
eP/9Ogf9HZ+XD5tt1uYkGqDSvUBS4sHdNcvxpNf1UMo6VaY4K5b9kF28MjY07ur2
nGq/nnLyLZZzwVXtV3q8upIdIK1hcMp2HCU0qDhEljepOkGmyQTqZooxrdrLTjPS
232bbifB6VFdK8P4gPIGopCRQI9a+TII4jODBobiJR43e0gEPfSREJKAnJs7DYaD
Quf3Svzu+jB+X8ymWLVbCFXuk4A4UqPzrRPb8TQBDN3w35G4JBDAgIpdxhQdJpdR
RIS1HgSm95P03doS23RLUjXuIpBfYveyStuGfPx4P9Lf30M2o6bxDeTPo2+IGbST
yhl1jB0l8Ve5/6veuTieA6vuXWU55g==
=2rgX
-----END PGP SIGNATURE-----
Merge tag 'v1.10.0rc3' into develop
Synapse 1.10.0rc3 (2020-02-10)
==============================
Features
--------
- Filter out m.room.aliases from the CS API to mitigate abuse while a better solution is specced. ([\#6878](https://github.com/matrix-org/synapse/issues/6878))
Internal Changes
----------------
- Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](https://github.com/matrix-org/synapse/issues/6880))
We're in the middle of properly mitigating spam caused by malicious aliases being added to a room. However, until this work fully lands, we temporarily filter out all m.room.aliases events from /sync and /messages on the CS API, to remove abusive aliases. This is considered acceptable as m.room.aliases events were never a reliable record of the given alias->id mapping and were purely informational, and in their current state do more harm than good.
A lot of the things we log at INFO are now a bit superfluous, so lets
make them DEBUG logs to reduce the amount we log by default.
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Brendan Abolivier <github@brendanabolivier.com>
We were looking at the wrong event type (`m.room.encryption` vs
`m.room.encrypted`).
Also fixup the duplicate `EvenTypes` entries.
Introduced in #6776.
I messed this up a bit in #6805, but fortunately we weren't actually doing
anything with the room_version so it didn't matter that it was a str not a RoomVersion.
When a server leaves a room it may stop sharing a room with remote
users, and thus not get any updates to their device lists. So we need to
check for this case and delete those device lists from the cache.
We don't need to do this if we stop sharing a room because the remote
user leaves the room, because we track that case via looking at
membership changes.
If we detect that the remote users' keys may have changed then we should
attempt to resync against the remote server rather than using the
(potentially) stale local cache.
We were sending device updates down both the federation stream and
device streams. This mean there was a race if the federation sender
worker processed the federation stream first, as when the sender checked
if there were new device updates the slaved ID generator hadn't been
updated with the new stream IDs and so returned nothing.
This situation is correctly handled by events/receipts/etc by not
sending updates down the federation stream and instead having the
federation sender worker listen on the other streams and poke the
transaction queues as appropriate.
Otherwise its just stale data, which may get deleted later anyway so
can't be relied on. It's also a bit of a shotgun if we're trying to get
the current state of a room we're not in.
These are easier to work with than the strings and we normally have one around.
This fixes `FederationHander._persist_auth_tree` which was passing a
RoomVersion object into event_auth.check instead of a string.
* Add note that user_dir requires disabling user dir
updates from the main synapse process.
* Add note that federation_reader should have
the federation listener resource.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
There are quite a few places that we assume that a redaction event has a
corresponding `redacts` key, which is not always the case. So lets
cheekily make it so that event.redacts just returns None instead.
The old statement returned `None` for such a `password_config` (like the one
created on first run), thus retrieval of the `pepper` key failed with
`AttributeError`.
Fixes#5315
Signed-off-by: Ivan Vilata i Balaguer <ivan@selidor.net>
* Raise an exception if there are pending background updates
So we return with a non-0 code
* Changelog
* Port synapse_port_db to async/await
* Port update_database to async/await
* Add version string to mocked homeservers
* Remove unused imports
* Convert overseen bits to async/await
* Fixup logging contexts
* Fix imports
* Add a way to print an error without raising an exception
* Incorporate review
Turns out that figuring out a remote user id for the SAML user isn't quite as obvious as it seems. Factor it out to the SamlMappingProvider so that it's easy to control.
Generally try to make this more comprehensible, and make it match the
conventions.
I've removed the documentation for all the settings which allow you to change
the names of the template files, because I can't really see why they are
useful.
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
AdditionalResource really doesn't add any value, and it gets in the way for
resources which want to support child resources or the like. So, if the
resource object already implements the IResource interface, don't bother
wrapping it.
This was ill-advised. We can't modify verify_keys here, because the response
object has already been signed by the requested key.
Furthermore, it's somewhat unnecessary because existing versions of Synapse
(which get upset that the notary key isn't present in verify_keys) will fall
back to a direct fetch via `/key/v2/server`.
Also: more tests for fetching keys via perspectives: it would be nice if we actually tested when our fetcher can't talk to our notary impl.