mirror of
https://github.com/element-hq/synapse.git
synced 2024-12-20 19:10:45 +03:00
Initial room and user statistics documentation
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
This commit is contained in:
parent
20ec9698ac
commit
16e2ffd166
1 changed files with 146 additions and 0 deletions
146
docs/room_and_user_statistics.md
Normal file
146
docs/room_and_user_statistics.md
Normal file
|
@ -0,0 +1,146 @@
|
||||||
|
Room and User Statistics
|
||||||
|
========================
|
||||||
|
|
||||||
|
Synapse maintains room and user statistics (as well as a cache of room state),
|
||||||
|
in various tables.
|
||||||
|
|
||||||
|
These can be used for administrative purposes but are also used when generating
|
||||||
|
the public room directory. If these tables get stale or out of sync (possibly
|
||||||
|
after database corruption), you may wish to regenerate them.
|
||||||
|
|
||||||
|
|
||||||
|
# Synapse Administrator Documentation
|
||||||
|
|
||||||
|
## Various SQL scripts that you may find useful
|
||||||
|
|
||||||
|
### Delete stats, including historical stats
|
||||||
|
|
||||||
|
```sql
|
||||||
|
DELETE FROM room_stats_current;
|
||||||
|
DELETE FROM room_stats_historical;
|
||||||
|
DELETE FROM user_stats_current;
|
||||||
|
DELETE FROM user_stats_historical;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Regenerate stats (all subjects)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
BEGIN;
|
||||||
|
DELETE FROM stats_incremental_position;
|
||||||
|
INSERT INTO stats_incremental_position (
|
||||||
|
state_delta_stream_id,
|
||||||
|
total_events_min_stream_ordering,
|
||||||
|
total_events_max_stream_ordering,
|
||||||
|
is_background_contract
|
||||||
|
) VALUES (NULL, NULL, NULL, FALSE), (NULL, NULL, NULL, TRUE);
|
||||||
|
COMMIT;
|
||||||
|
|
||||||
|
DELETE FROM room_stats_current;
|
||||||
|
DELETE FROM user_stats_current;
|
||||||
|
```
|
||||||
|
|
||||||
|
then follow the steps below for **'Regenerate stats (missing subjects only)'**
|
||||||
|
|
||||||
|
### Regenerate stats (missing subjects only)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Set up staging tables
|
||||||
|
-- we depend on current_state_events_membership because this is used
|
||||||
|
-- in our counting.
|
||||||
|
INSERT INTO background_updates (update_name, progress_json) VALUES
|
||||||
|
('populate_stats_prepare', '{}', 'current_state_events_membership');
|
||||||
|
|
||||||
|
-- Run through each room and update stats
|
||||||
|
INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES
|
||||||
|
('populate_stats_process_rooms', '{}', 'populate_stats_prepare');
|
||||||
|
|
||||||
|
-- Run through each user and update stats.
|
||||||
|
INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES
|
||||||
|
('populate_stats_process_users', '{}', 'populate_stats_process_rooms');
|
||||||
|
|
||||||
|
-- Clean up staging tables
|
||||||
|
INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES
|
||||||
|
('populate_stats_cleanup', '{}', 'populate_stats_process_users');
|
||||||
|
```
|
||||||
|
|
||||||
|
then **restart Synapse**.
|
||||||
|
|
||||||
|
|
||||||
|
# Synapse Developer Documentation
|
||||||
|
|
||||||
|
## High-Level Concepts
|
||||||
|
|
||||||
|
### Definitions
|
||||||
|
|
||||||
|
* **subject**: Something we are tracking stats about – currently a room or user.
|
||||||
|
* **current row**: An entry for a subject in the appropriate current statistics
|
||||||
|
table. Each subject can have only one.
|
||||||
|
* **historical row**: An entry for a subject in the appropriate historical
|
||||||
|
statistics table. Each subject can have any number of these.
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
Stats are maintained as time series. There are two kinds of column:
|
||||||
|
|
||||||
|
* absolute columns – where the value is correct for the time given by `end_ts`
|
||||||
|
in the stats row. (Imagine a line graph for these values)
|
||||||
|
* per-slice columns – where the value corresponds to how many of the occurrences
|
||||||
|
occurred within the time slice given by `(end_ts − bucket_size)…end_ts`
|
||||||
|
or `start_ts…end_ts`. (Imagine a histogram for these values)
|
||||||
|
|
||||||
|
Currently, only absolute columns are in use.
|
||||||
|
|
||||||
|
Stats are maintained in two tables (for each type): current and historical.
|
||||||
|
|
||||||
|
Current stats correspond to the present values. Each subject can only have one
|
||||||
|
entry.
|
||||||
|
|
||||||
|
Historical stats correspond to values in the past. Subjects may have multiple
|
||||||
|
entries.
|
||||||
|
|
||||||
|
## Concepts around the management of stats
|
||||||
|
|
||||||
|
### current rows
|
||||||
|
|
||||||
|
#### dirty current rows
|
||||||
|
|
||||||
|
Current rows can be **dirty**, which means that they have changed since the
|
||||||
|
latest historical row for the same subject.
|
||||||
|
**Dirty** current rows possess an end timestamp, `end_ts`.
|
||||||
|
|
||||||
|
#### old current rows and old collection
|
||||||
|
|
||||||
|
When a (necessarily dirty) current row has an `end_ts` in the past, it is said
|
||||||
|
to be **old**.
|
||||||
|
Old current rows must be copied into a historical row, and cleared of their dirty
|
||||||
|
status, before further statistics can be tracked for that subject.
|
||||||
|
The process which does this is referred to as **old collection**.
|
||||||
|
|
||||||
|
#### incomplete current rows
|
||||||
|
|
||||||
|
There are also **incomplete** current rows, which are current rows that do not
|
||||||
|
contain a full count yet – this is because they are waiting for the regeneration
|
||||||
|
process to give them an initial count. Incomplete current rows DO NOT contain
|
||||||
|
correct and up-to-date values. As such, *incomplete rows are not old-collected*.
|
||||||
|
Instead, old incomplete rows will be extended so they are no longer old.
|
||||||
|
|
||||||
|
### historical rows
|
||||||
|
|
||||||
|
Historical rows can always be considered to be valid for the time slice and
|
||||||
|
end time specified. (This, of course, assumes a lack of defects in the code
|
||||||
|
to track the statistics, and assumes integrity of the database).
|
||||||
|
|
||||||
|
Even still, there are two considerations that we may need to bear in mind:
|
||||||
|
|
||||||
|
* historical rows will not exist for every time slice – they will be omitted
|
||||||
|
if there were no changes. In this case, the following assumptions can be
|
||||||
|
made to interpolate/recreate missing rows:
|
||||||
|
- absolute fields have the same values as in the preceding row
|
||||||
|
- per-slice fields are zero (`0`)
|
||||||
|
* historical rows will not be retained forever – rows older than a configurable
|
||||||
|
time will be purged.
|
||||||
|
|
||||||
|
#### purge
|
||||||
|
|
||||||
|
The purging of historical rows is not yet implemented.
|
||||||
|
|
Loading…
Reference in a new issue