mirror of
https://github.com/nextcloud/desktop.git
synced 2024-10-25 13:55:47 +03:00
255 lines
9.6 KiB
Text
255 lines
9.6 KiB
Text
CSYNC User Guide
|
|
================
|
|
Andreas Schneider <mail@cynapses.org>
|
|
:Author Initials: ADS
|
|
|
|
csync is a bidirectional file synchronizer for Linux and allows to keep two
|
|
copies of files and directories in sync. It uses uses widly adopted protocols
|
|
like smb or sftp so that there is no need for a server component of csync. It
|
|
is a user-level program which means you don't need to be a superuser.
|
|
|
|
Introduction
|
|
------------
|
|
|
|
It is often the case that we have multiple copies (called replicas) of a
|
|
filesystem or part of a filesystem (for example on a notebook and on a desktop
|
|
computer). Changes to each replica are often made independently and as a
|
|
result they do not contain the same information. In that case a file
|
|
synchronizer is used to make them consistent again, without loosing any
|
|
information.
|
|
|
|
The goal is to detect conflicting <<X13, updates>> (files which has been
|
|
modified) and propagate non-conflicting updates to each replica. If there
|
|
are no conflicts left we are done and the replicas are identical.
|
|
|
|
Basics
|
|
------
|
|
|
|
This section describes some basics you might need to understand how file
|
|
synchronization works.
|
|
|
|
Paths
|
|
~~~~~
|
|
A path normally refers to a point with a set of files which should be
|
|
synchronized. It is specified relative to the root of the replica. The path is
|
|
just a sequence of names separated by '/'.
|
|
|
|
NOTE: The path separator is always a forward slash '/', even for Windows.
|
|
|
|
csync is always using the absolute path. This could be '/home/gladiac' or
|
|
for sftp 'sftp://gladiac:secret@myserver/home/gladiac'.
|
|
|
|
|
|
[[X13]]
|
|
What is an update?
|
|
~~~~~~~~~~~~~~~~~~
|
|
The contents of a path could be a file, a directory or a symbolic link
|
|
(symbolic links are not supported yet). To be more precise, if the path refers
|
|
to:
|
|
|
|
- a regular file, the the contents of the file are the byte stream and the
|
|
metatdata of the file.
|
|
- a directory, then the content is the metadata of the directory.
|
|
- a symbolic link, then the content is the string where the link points to.
|
|
|
|
csync keeps a record of each path which has been successfully synchronized. The
|
|
path gets compared with the record and if it has changed since the last
|
|
synchronization, we have an update. This is done by comparing the modification
|
|
or change (modification time of the metadata) time.
|
|
|
|
What is a conflict?
|
|
~~~~~~~~~~~~~~~~~~~
|
|
A path is conflicting if it fulfills the following conditions:
|
|
|
|
1. it has been updated in one replica,
|
|
2. it or any of its descendants has been updated on the other replica too, and
|
|
3. its contents in are not identical.
|
|
|
|
File Synchronization
|
|
--------------------
|
|
|
|
The main goal of a file synchronizer is correctness. It changes whole or
|
|
separated pieces of a users file system. So a user is not able to monitor the
|
|
complete file synchronization process. So the synchronizer is in a position
|
|
where it can damage the file system. It is important that the implementation
|
|
behaves correctly under all conditions, even if there is an unexpected error
|
|
(for example disk full).
|
|
|
|
On problem concerning correctness is the handling of conflicts. Each file
|
|
synchronizer tries to propagate conflicting changes to the other replica. At
|
|
the end both replicas should be identical. There are different strategies to
|
|
fulfill these goals.
|
|
|
|
csync is a 3-phase file synchronizer. The desicion for this design was that
|
|
user interaction should be possible and it should be easy to understand the
|
|
process. The 3 phases are update detection, reconciliation and propagation.
|
|
These will be described in the following sections.
|
|
|
|
Update detection
|
|
~~~~~~~~~~~~~~~~
|
|
There are differnt strategies to do update detection. csync uses a state-based
|
|
modtime-inode update detector. This means it uses a the modification time to
|
|
detect updates. It doesn't require much resources. A record of each file is
|
|
stored in a database (called statedb) and compared with the current
|
|
modification time during update detection. If the file has changed since the
|
|
last synchronization a instruction is set to evaluate it during the
|
|
reconcilation phase. If we don't have a record for a file we invastigate, it is
|
|
marked as new.
|
|
|
|
There is a problem to detect renaming of files. This is sovled by the record we
|
|
store in the statedb too. If we don't find the file by the name in the database
|
|
we search for the inode number. If the inode number is found then the file has
|
|
been renamed.
|
|
|
|
Reconciliation
|
|
~~~~~~~~~~~~~~
|
|
The most improtant component is the update detector cause the reconciler depends
|
|
on it. The correctness of reconciler is mandatory cause it can damage a
|
|
filesystem. It decides which file:
|
|
|
|
* keeps untouched
|
|
* has a conflict
|
|
* gets synchronized
|
|
* or gets *deleted*
|
|
|
|
A wrong decision of the reconciler leads in most cases to a loss of data. So there
|
|
are several conditions a the file synchronizer has to follow.
|
|
|
|
Algorithms
|
|
^^^^^^^^^^
|
|
|
|
For conflict resolution several different algorithms could be implemented. The
|
|
most common algorithm are the merge and and the conflict algorithm. The first
|
|
is a batch algortihm and the second is one which needs user interaction.
|
|
|
|
Merge algorithm
|
|
+++++++++++++++
|
|
|
|
The merge algorithm is an algorithm which doesn't need any user interaction. It
|
|
is simple and used for example by Microsoft for Roaming Proflies. If it detects
|
|
a conflict (the same file changed on both replicas) then it will use the most
|
|
recent file and overwrite the other. This means you can loose some data, but
|
|
normally you want the latest file.
|
|
|
|
Conflict algorithm
|
|
++++++++++++++++++
|
|
|
|
This is not implemented yet.
|
|
|
|
If a file has a conflict the user has to decicde which file should be used.
|
|
|
|
Propagation
|
|
~~~~~~~~~~~
|
|
|
|
The next instance of the file synchronizer the propagator. It uses the
|
|
calculated records to apply them on the current replica.
|
|
|
|
|
|
The propagator uses a 2-phase-commit mechanism to simulate an atomic filesystem
|
|
operation.
|
|
|
|
In the first phase we copy the file to a temporary file on the opposite
|
|
replica. This has the advantage that we can check if file which has been copied
|
|
to the opposite replica has been transfered successfully. If the connection
|
|
gets interruppted during the transfer we still have the orignal states of the
|
|
file. This means no data will be lost.
|
|
In the second phase the the file on the opposite replica will be overwritten by
|
|
the temporary file.
|
|
|
|
After a successfull propagation we have to merge the trees to reflect the
|
|
current state of the filesystem tree. This updated tree will be written as a
|
|
journal into a database. The database is called the state database. It will be
|
|
used during the update detection of the next synchronization. See above.
|
|
|
|
Robustness
|
|
~~~~~~~~~~
|
|
|
|
This is a really important topic. The file synchronizer should not crash and if
|
|
it crashed, there should be no loss of data. To achieve this goal there are
|
|
several mechanism to prevent this. These mechnanism will be discussed in the
|
|
following sections.
|
|
|
|
Crash resistance
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
The synchronization process can be interrupted by different events, this can
|
|
be:
|
|
|
|
* the system could be halted due to errors.
|
|
* the disk could be full or the quota exceeded.
|
|
* the network or power cable could be pulled out.
|
|
* the user could force a stop of the synchronization process.
|
|
* different communication errors could occur.
|
|
|
|
That no data will be lost due to the occurance we enforce the following
|
|
invariant:
|
|
|
|
IMPORTANT: At every moment of the synchronization each file has either its
|
|
original content or its correct final content.
|
|
|
|
So each interupted synchronization process is a partial sync and can be
|
|
continued and completed by simply running csync again. The only problem could
|
|
be an error of the filesystem. So we reach this invariant only approximatly.
|
|
|
|
Transfer errors
|
|
^^^^^^^^^^^^^^^
|
|
|
|
With the Two-Phase-Commit we check the file size after the file has
|
|
transferred. So we can detect transfer erros. Better would be a transfer
|
|
protocol with checksums. This could possibly done in the future.
|
|
|
|
Future filesystems like btrfs will help to compare checksums instead of the
|
|
filesize. This will make the synchronization itself safer.
|
|
|
|
Database loss
|
|
^^^^^^^^^^^^^
|
|
|
|
It could be possible, that the state database get corrupted. If this happens
|
|
all files get evaluated. In this case the file synchronizer wont delete any
|
|
file, but it could occur that deleted files will be restored from the other
|
|
replica.
|
|
To prevent a corruption or loss of the database if an error occurs or the user
|
|
forces an abort, the synchronizer is working on a copy of the database and will
|
|
use a 2-Phase-Commit to save it at the end.
|
|
|
|
Getting started
|
|
---------------
|
|
|
|
Installing csync
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
See the `README` and `INSTALL` files for install prerequisites and
|
|
procedures. Packagers take a look at <<X90, Appendix B: Packager Notes>>.
|
|
|
|
Using the commandline client
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The synopsis of the commandline client is
|
|
|
|
csync [OPTION...] SOURCE DESTINATION
|
|
|
|
It synchronizes the content of SOURCE with DESTINATION and vice versa. The
|
|
DESTINATION can be a local directory or a remote file server.
|
|
|
|
csync /home/csync scheme://user:password@server:port/full/path
|
|
|
|
The remote destination is supported by plugins. By default csync ships with smb
|
|
and sftp support. For more information, see the manpage of csync(1).
|
|
|
|
The PAM module
|
|
~~~~~~~~~~~~~~
|
|
|
|
pam_csync is a PAM module to provide roaming home directories for a user
|
|
session. This module is aimed at environments with central file servers a user
|
|
wishes to store his home directory. The Authentication Module verifies the
|
|
identity of a user and triggers a synchronization with the server on the first
|
|
login and the last logout. More information can be found in the manpage of the
|
|
module pam_csync(8).
|
|
|
|
|
|
[[X90]]
|
|
Appendix A: Packager Notes
|
|
--------------------------
|
|
|
|
Read the `README`, `INSTALL` and `FAQ` files (in the distribution root
|
|
directory).
|