Commit graph

46 commits

Author SHA1 Message Date
Erik Johnston
930a64b6c1
Reintroduce #17291. (#17338)
This is #17291 (which got reverted), with some added fixups, and change
so that tests actually pick up the error.

The problem was that we were not calculating any new chain IDs due to a
missing `not` in a condition.
2024-06-24 14:40:28 +00:00
Erik Johnston
4243c1f074
Revert "Handle large chain calc better (#17291)" (#17334)
Some checks failed
Tests / tests-done (push) Has been cancelled
Tests / lint-newsfile (push) Has been cancelled
Tests / changes (push) Has been cancelled
Tests / check-lockfile (push) Has been cancelled
Tests / lint-crlf (push) Has been cancelled
Tests / lint-clippy (push) Has been cancelled
Tests / lint-clippy-nightly (push) Has been cancelled
Tests / lint-rustfmt (push) Has been cancelled
Deploy the documentation / GitHub Pages (push) Has been cancelled
Build release artifacts / Build .deb packages (push) Has been cancelled
Tests / lint-pydantic (push) Has been cancelled
Build release artifacts / Attach assets to release (push) Has been cancelled
Tests / check-sampleconfig (push) Has been cancelled
Tests / check-schema-delta (push) Has been cancelled
Tests / lint (push) Has been cancelled
Tests / Typechecking (push) Has been cancelled
Tests / linting-done (push) Has been cancelled
Tests / calculate-test-jobs (push) Has been cancelled
Tests / trial (push) Has been cancelled
Tests / trial-olddeps (push) Has been cancelled
Tests / trial-pypy (all, pypy-3.8) (push) Has been cancelled
Tests / sytest (push) Has been cancelled
Tests / export-data (push) Has been cancelled
Tests / portdb (11, 3.8) (push) Has been cancelled
Tests / portdb (15, 3.11) (push) Has been cancelled
Tests / complement (monolith, Postgres) (push) Has been cancelled
Tests / complement (monolith, SQLite) (push) Has been cancelled
Tests / complement (workers, Postgres) (push) Has been cancelled
Tests / cargo-test (push) Has been cancelled
Tests / cargo-bench (push) Has been cancelled
This reverts commit bdf82efea5  (#17291)

This seems to have stopped persisting auth chains for new events, and so
is causing state res to fall back to the slow methods
2024-06-19 17:39:33 +01:00
Erik Johnston
bdf82efea5
Handle large chain calc better (#17291)
We calculate the auth chain links outside of the main persist event
transaction to ensure that we do not block other event sending during
the calculation.
2024-06-19 10:33:53 +01:00
Erik Johnston
5d3850b038
Port EventInternalMetadata class to Rust (#16782)
There are a couple of things we need to be careful of here:

1. The current python code does no validation when loading from the DB,
so we need to be careful to ignore such errors (at least on jki.re there
are some old events with internal metadata fields of the wrong type).
2. We want to be memory efficient, as we often have many hundreds of
thousands of events in the cache at a time.

---------

Co-authored-by: Quentin Gliech <quenting@element.io>
2024-01-08 14:06:48 +00:00
Patrick Cloke
8e1e62c9e0 Update license headers 2023-11-21 15:29:58 -05:00
Patrick Cloke
f2f2c7c1f0
Use full GitHub links instead of bare issue numbers. (#16637) 2023-11-15 08:02:11 -05:00
Patrick Cloke
aa483cb4c9
Update ruff config (#16283)
Enable additional checks & clean-up unneeded configuration.
2023-09-08 11:24:36 -04:00
Erik Johnston
bd558a6dc3
Speed up state res in rare case we don't have all events (#16116)
If we don't have all the auth events in a room then not all state events will have a chain cover index. Even so, we can still use the chain cover index on the events that do have it, rather than bailing and using the slower functions.

This situation should not arise for newly persisted rooms, as we check we have the full auth chain for each event, but can happen for existing rooms.

c.f. #15245
2023-08-18 15:32:06 +01:00
Erik Johnston
95a96b21eb
Add foreign key constraint to event_forward_extremities. (#15751) 2023-07-05 09:43:19 +00:00
Eric Eastwood
0f02f0b4da
Remove experimental MSC2716 implementation to incrementally import history into existing rooms (#15748)
Context for why we're removing the implementation:

 - https://github.com/matrix-org/matrix-spec-proposals/pull/2716#issuecomment-1487441010
 - https://github.com/matrix-org/matrix-spec-proposals/pull/2716#issuecomment-1504262734

Anyone wanting to continue MSC2716, should also address these leftover tasks: https://github.com/matrix-org/synapse/issues/10737

Closes https://github.com/matrix-org/synapse/issues/10737 in the fact that it is not longer necessary to track those things.
2023-06-16 14:12:24 -05:00
Eric Eastwood
77156a4bc1
Process previously failed backfill events in the background (#15585)
Process previously failed backfill events in the background because they are bound to fail again and we don't need to waste time holding up the request for something that is bound to fail again.

Fix https://github.com/matrix-org/synapse/issues/13623

Follow-up to https://github.com/matrix-org/synapse/issues/13621 and https://github.com/matrix-org/synapse/issues/13622

Part of making `/messages` faster: https://github.com/matrix-org/synapse/issues/13356
2023-05-24 23:22:24 -05:00
Sean Quah
d9f694932c
Fix spinloop during partial state sync when a prev event is in backoff (#15351)
Previously, we would spin in a tight loop until
`update_state_for_partial_state_event` stopped raising
`FederationPullAttemptBackoffError`s. Replace the spinloop with a wait
until the backoff period has expired.

Signed-off-by: Sean Quah <seanq@matrix.org>
2023-03-30 13:36:41 +01:00
dependabot[bot]
9bb2eac719
Bump black from 22.12.0 to 23.1.0 (#15103) 2023-02-22 15:29:09 -05:00
Patrick Cloke
42aea0d8af
Add final type hint to tests.unittest. (#15072)
Adds a return type to HomeServerTestCase.make_homeserver and deal
with any variables which are no longer Any.
2023-02-14 14:03:35 -05:00
Patrick Cloke
3ac412b4e2
Require types in tests.storage. (#14646)
Adds missing type hints to `tests.storage` package
and does not allow untyped definitions.
2022-12-09 12:36:32 -05:00
Eric Eastwood
40bb37eb27
Stop getting missing prev_events after we already know their signature is invalid (#13816)
While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place.

Related to
 - https://github.com/matrix-org/synapse/issues/13622
    - https://github.com/matrix-org/synapse/pull/13635
 - https://github.com/matrix-org/synapse/issues/13676

Part of https://github.com/matrix-org/synapse/issues/13356

Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures.

With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now.

For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761

To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.

 - `backfill`
    - `outgoing-federation-request` `/backfill`
    - `_check_sigs_and_hash_and_fetch`
       - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
          -  `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
          -  `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
   - `_process_pulled_events`
      - `_process_pulled_event` for each validated event
         -  Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
            - `_get_state_ids_after_missing_prev_event`
               - `outgoing-federation-request` `/state_ids`
               -  `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
               -  `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
2022-10-15 00:36:49 -05:00
David Robertson
e8f30a76ca
Fix overflows in /messages backfill calculation (#13936)
* Reproduce bug
* Compute `least_function` first
* Substitute `least_function` with an f-string
* Bugfix: avoid overflow

Co-authored-by: Eric Eastwood <erice@element.io>
2022-09-30 11:54:53 +01:00
Eric Eastwood
df8b91ed2b
Limit and filter the number of backfill points to get from the database (#13879)
There is no need to grab thousands of backfill points when we only need 5 to make the `/backfill` request with. We need to grab a few extra in case the first few aren't visible in the history.

Previously, we grabbed thousands of backfill points from the database, then sorted and filtered them in the app. Fetching the 4.6k backfill points for `#matrix:matrix.org` from the database takes ~50ms - ~570ms so it's not like this saves a lot of time 🤷. But it might save us more time now that `get_backfill_points_in_room`/`get_insertion_event_backward_extremities_in_room` are more complicated after https://github.com/matrix-org/synapse/pull/13635 

This PR moves the filtering and limiting to the SQL query so we just have less data to work with in the first place.

Part of https://github.com/matrix-org/synapse/issues/13356
2022-09-28 15:26:16 -05:00
Eric Eastwood
ac1a31740b
Only try to backfill event if we haven't tried before recently (#13635)
Only try to backfill event if we haven't tried before recently (exponential backoff). No need to keep trying the same backfill point that fails over and over.

Fix https://github.com/matrix-org/synapse/issues/13622
Fix https://github.com/matrix-org/synapse/issues/8451

Follow-up to https://github.com/matrix-org/synapse/pull/13589

Part of https://github.com/matrix-org/synapse/issues/13356
2022-09-23 14:01:29 -05:00
reivilibre
c2fe48a6ff
Rename the EventFormatVersions enum values so that they line up with room version numbers. (#13706) 2022-09-07 11:08:20 +01:00
Richard van der Hoff
147f098fb4
Stop writing to event_reference_hashes (#12679)
This table is never read, since #11794. We stop writing to it; in future we can
drop it altogether.
2022-05-10 15:35:08 +01:00
Richard van der Hoff
e24ff8ebe3
Remove HomeServer.get_datastore() (#12031)
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
2022-02-23 11:04:02 +00:00
Richard van der Hoff
63c46349c4
Implement MSC3706: partial state in /send_join response (#11967)
* Make `get_auth_chain_ids` return a Set

It has a set internally, and a set is often useful where it gets used, so let's
avoid converting to an intermediate list.

* Minor refactors in `on_send_join_request`

A little bit of non-functional groundwork

* Implement MSC3706: partial state in /send_join response
2022-02-12 10:44:16 +00:00
Andrew Morgan
dc671d3ea7
Fix logic for dropping old events in fed queue (#11806)
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Richard van der Hoff <richard@matrix.org>
2022-01-24 12:20:01 +00:00
Patrick Cloke
3e0536cd2a
Replace uses of simple_insert_many with simple_insert_many_values. (#11742)
This should be (slightly) more efficient and it is simpler
to have a single method for inserting multiple values.
2022-01-13 19:44:18 -05:00
Sean Quah
a4dce5b53d
Remove redundant COALESCE()s around COUNT()s in database queries (#11570)
`COUNT()` never returns `NULL`. A `COUNT(*)` over 0 rows is 0 and a
`COUNT(NULL)` is also 0.
2021-12-14 12:34:30 +00:00
Erik Johnston
01d45fe964
Prune inbound federation queues if they get too long (#10390) 2021-08-02 13:37:25 +00:00
Jonathan de Jong
4b965c862d
Remove redundant "coding: utf-8" lines (#9786)
Part of #9744

Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.

`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
2021-04-14 15:34:27 +01:00
Patrick Cloke
2a99cc6524
Use the chain cover index in get_auth_chain_ids. (#9576)
This uses a simplified version of get_chain_cover_difference to calculate
auth chain of events.
2021-03-10 09:57:59 -05:00
Eric Eastwood
0a00b7ff14
Update black, and run auto formatting over the codebase (#9381)
- Update black version to the latest
 - Run black auto formatting over the codebase
    - Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
 - Update `code_style.md` docs around installing black to use the correct version
2021-02-16 22:32:34 +00:00
Erik Johnston
1315a2e8be
Use a chain cover index to efficiently calculate auth chain difference (#8868) 2021-01-11 16:09:22 +00:00
Erik Johnston
df4b1e9c74
Pass room_id to get_auth_chain_difference (#8879)
This is so that we can choose which algorithm to use based on the room ID.
2020-12-04 15:52:49 +00:00
Erik Johnston
c5b6abd53d
Correctly handle unpersisted events when calculating auth chain difference. (#8827)
We do state res with unpersisted events when calculating the new current state of the room, so that should be the only thing impacted. I don't think this is tooooo big of a deal as:

1. the next time a state event happens in the room the current state should correct itself;
2. in the common case all the unpersisted events' auth events will be pulled in by other state, so will still return the correct result (or one which is sufficiently close to not affect the result); and
3. we mostly use the state at an event to do important operations, which isn't affected by this.
2020-12-02 15:22:37 +00:00
Erik Johnston
a7bdf98d01
Rename database classes to make some sense (#8033) 2020-08-05 21:38:57 +01:00
Erik Johnston
4a17a647a9
Improve get auth chain difference algorithm. (#7095)
It was originally implemented by pulling the full auth chain of all
state sets out of the database and doing set comparison. However, that
can take a lot work if the state and auth chains are large.

Instead, lets try and fetch the auth chains at the same time and
calculate the difference on the fly, allowing us to bail early if all
the auth chains converge. Assuming that the auth chains do converge more
often than not, this should improve performance. Hopefully.
2020-03-18 16:46:41 +00:00
Richard van der Hoff
dc41fbf0dd Remove unused get_prev_events_and_hashes_for_room 2020-01-06 13:45:33 +00:00
Richard van der Hoff
5a04781643 rename get_prev_events_for_room to get_prev_events_and_hashes_for_room
... to make way for a new method which just returns the event ids
2020-01-06 13:45:33 +00:00
Erik Johnston
756d4942f5 Move DB pool and helper functions into dedicated Database class 2019-12-05 10:46:37 +00:00
Erik Johnston
bc244627ac Fix postgres unit tests 2019-10-10 15:37:53 +01:00
Neil Johnson
034db2ba21 Fix dummy event insertion consent bug (#6053)
Fixes #5905
2019-09-26 11:47:53 +01:00
Amber Brown
32e7c9e7f2
Run Black. (#5482) 2019-06-20 19:32:02 +10:00
Amber Brown
77055dba92
Fix tests on postgresql (#3740) 2018-09-04 02:21:48 +10:00
Amber Brown
99dd975dae
Run tests under PostgreSQL (#3423) 2018-08-13 16:47:46 +10:00
black
8b3d9b6b19 Run black. 2018-08-10 23:54:09 +10:00
Amber Brown
2511f3f8a0
Test fixes for Python 3 (#3647) 2018-08-09 12:22:01 +10:00
Richard van der Hoff
639480e14a Avoid creating events with huge numbers of prev_events
In most cases, we limit the number of prev_events for a given event to 10
events. This fixes a particular code path which created events with huge
numbers of prev_events.
2018-04-16 18:41:37 +01:00