Add more test to the userguide.

2024-11-24 14:05:58 +03:00 · 2008-12-17 18:23:32 +01:00 · 2008-12-17 18:23:32 +01:00 · 0f6a55bb23
commit 0f6a55bb23
parent 2a9ac9a91a
2 changed files with 194 additions and 32 deletions
--- a/doc/csync.txt
+++ b/doc/csync.txt
@ -13,13 +13,13 @@ Introduction

 It is often the case that we have multiple copies (called replicas) of a
 filesystem or part of a filesystem (for example on a notebook and on a desktop
-    computer). Changes to each replica are often made independently and as a
+computer). Changes to each replica are often made independently and as a
 result they do not contain the same information. In that case a file
 synchronizer is used to make them consistent again, without loosing any
 information.

 The goal is to detect conflicting <<X13, updates>> (files which has been
-    modified) and propagate non-conflicting updates to each replica. If there
+modified) and propagate non-conflicting updates to each replica. If there
 are no conflicts left we are done and the replicas are identical.

 Basics
@ -115,10 +115,28 @@ filesystem. It decides which file:
 A wrong decision of the reconciler leads in most cases to a loss of data. So there
 are several conditions a the file synchronizer has to follow.

-Specification
-^^^^^^^^^^^^^
+Algorithms
+^^^^^^^^^^

-TODO
+For conflict resolution several different algorithms could be implemented. The
+most common algorithm are the merge and and the conflict algorithm. The first
+is a batch algortihm and the second is one which needs user interaction.
+
+Merge algorithm
+++++++++++++++
+
+The merge algorithm is an algorithm which doesn't need any user interaction. It
+is simple and used for example by Microsoft for Roaming Proflies. If it detects
+a conflict (the same file changed on both replicas) then it will use the most
+recent file and overwrite the other. This means you can loose some data, but
+normally you want the latest file.
+
+Conflict algorithm
++++++++++++++++++
+
+This is not implemented yet.
+
+If a file has a conflict the user has to decicde which file should be used.

 Propagation
 ~~~~~~~~~~~
@ -126,28 +144,73 @@ Propagation
 The next instance of the file synchronizer the propagator. It uses the
 calculated records to apply them on the current replica.

-* 2-phase-copy
-* merge trees and write journal
+
+The propagator uses a 2-phase-commit mechanism to simulate an atomic filesystem
+operation.
+
+In the first phase we copy the file to a temporary file on the opposite
+replica. This has the advantage that we can check if file which has been copied
+to the opposite replica has been transfered successfully. If the connection
+gets interruppted during the transfer we still have the orignal states of the
+file. This means no data will be lost.
+In the second phase the the file on the opposite replica will be overwritten by
+the temporary file.
+
+After a successfull propagation we have to merge the trees to reflect the
+current state of the filesystem tree. This updated tree will be written as a
+journal into a database. The database is called the state database. It will be
+used during the update detection of the next synchronization. See above.

 Robustness
 ~~~~~~~~~~

-TODO
+This is a really important topic. The file synchronizer should not crash and if
+it crashed, there should be no loss of data. To achieve this goal there are
+several mechanism to prevent this. These mechnanism will be discussed in the
+following sections.

 Crash resistance
 ^^^^^^^^^^^^^^^^

-TODO
+The synchronization process can be interrupted by different events, this can
+be:
+
+* the system could be halted due to errors.
+* the disk could be full or the quota exceeded.
+* the network or power cable could be pulled out.
+* the user could force a stop of the synchronization process.
+* different communication errors could occur.
+
+That no data will be lost due to the occurance we enforce the following
+invariant:
+
+IMPORTANT: At every moment of the synchronization each file has either its
+original content or its correct final content.
+
+So each interupted synchronization process is a partial sync and can be
+continued and completed by simply running csync again. The only problem could
+be an error of the filesystem. So we reach this invariant only approximatly.

 Transfer errors
 ^^^^^^^^^^^^^^^

-TODO
+With the Two-Phase-Commit we check the file size after the file has
+transferred. So we can detect transfer erros. Better would be a transfer
+protocol with checksums. This could possibly done in the future.
+
+Future filesystems like btrfs will help to compare checksums instead of the
+filesize. This will make the synchronization itself safer.

 Database loss
 ^^^^^^^^^^^^^

-TODO
+It could be possible, that the state database get corrupted. If this happens
+all files get evaluated. In this case the file synchronizer wont delete any
+file, but it could occur that deleted files will be restored from the other
+replica.
+To prevent a corruption or loss of the database if an error occurs or the user
+forces an abort, the synchronizer is working on a copy of the database and will
+use a 2-Phase-Commit to save it at the end.

 Getting started
 ---------------
@ -160,17 +223,33 @@ procedures. Packagers take a look at <<X90, Appendix B: Packager Notes>>.

 Using the commandline client
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-TODO
-csync /home/csync sftp://TODO:secret@server:port/profile/TODO
+
+The synopsis of the commandline client is
+
+  csync [OPTION...] SOURCE DESTINATION
+
+It synchronizes the content of SOURCE with DESTINATION and vice versa. The
+DESTINATION can be a local directory or a remote file server.
+
+  csync /home/csync scheme://user:password@server:port/full/path
+
+The remote destination is supported by plugins. By default csync ships with smb
+and sftp support. For more information, see the manpage of csync(1).

 The PAM module
 ~~~~~~~~~~~~~~
-TODO
+
+pam_csync is a PAM module to provide roaming home directories for a user
+session. This module is aimed at environments with central file servers a user
+wishes to store his home directory. The Authentication Module verifies the
+identity of a user and triggers a synchronization with the server on the first
+login and the last logout. More information can be found in the manpage of the
+module pam_csync(8).


 [[X90]]
 Appendix A: Packager Notes
 --------------------------

-Read the `README` and `INSTALL` files (in the distribution root
+Read the `README`, `INSTALL` and `FAQ` files (in the distribution root
 directory).
--- a/doc/userguide/csync.html
+++ b/doc/userguide/csync.html
@ -437,12 +437,12 @@ is a user-level program which means you don't need to be a superuser.</p></div>
 <div class="sectionbody">
 <div class="para"><p>It is often the case that we have multiple copies (called replicas) of a
 filesystem or part of a filesystem (for example on a notebook and on a desktop
-    computer). Changes to each replica are often made independently and as a
+computer). Changes to each replica are often made independently and as a
 result they do not contain the same information. In that case a file
 synchronizer is used to make them consistent again, without loosing any
 information.</p></div>
 <div class="para"><p>The goal is to detect conflicting <a href="#X13">updates</a> (files which has been
-    modified) and propagate non-conflicting updates to each replica. If there
+modified) and propagate non-conflicting updates to each replica. If there
 are no conflicts left we are done and the replicas are identical.</p></div>
 </div>
 <h2 id="_basics">2. Basics</h2>
@ -566,31 +566,98 @@ or gets <strong>deleted</strong>
 </ul></div>
 <div class="para"><p>A wrong decision of the reconciler leads in most cases to a loss of data. So there
 are several conditions a the file synchronizer has to follow.</p></div>
-<h4 id="_specification">3.2.1. Specification</h4>
-<div class="para"><p>TODO</p></div>
+<h4 id="_algorithms">3.2.1. Algorithms</h4>
+<div class="para"><p>For conflict resolution several different algorithms could be implemented. The
+most common algorithm are the merge and and the conflict algorithm. The first
+is a batch algortihm and the second is one which needs user interaction.</p></div>
+<h5 id="_merge_algorithm">Merge algorithm</h5>
+<div class="para"><p>The merge algorithm is an algorithm which doesn't need any user interaction. It
+is simple and used for example by Microsoft for Roaming Proflies. If it detects
+a conflict (the same file changed on both replicas) then it will use the most
+recent file and overwrite the other. This means you can loose some data, but
+normally you want the latest file.</p></div>
+<h5 id="_conflict_algorithm">Conflict algorithm</h5>
+<div class="para"><p>This is not implemented yet.</p></div>
+<div class="para"><p>If a file has a conflict the user has to decicde which file should be used.</p></div>
 <h3 id="_propagation">3.3. Propagation</h3><div style="clear:left"></div>
 <div class="para"><p>The next instance of the file synchronizer the propagator. It uses the
 calculated records to apply them on the current replica.</p></div>
+<div class="para"><p>The propagator uses a 2-phase-commit mechanism to simulate an atomic filesystem
+operation.</p></div>
+<div class="para"><p>In the first phase we copy the file to a temporary file on the opposite
+replica. This has the advantage that we can check if file which has been copied
+to the opposite replica has been transfered successfully. If the connection
+gets interruppted during the transfer we still have the orignal states of the
+file. This means no data will be lost.
+In the second phase the the file on the opposite replica will be overwritten by
+the temporary file.</p></div>
+<div class="para"><p>After a successfull propagation we have to merge the trees to reflect the
+current state of the filesystem tree. This updated tree will be written as a
+journal into a database. The database is called the state database. It will be
+used during the update detection of the next synchronization. See above.</p></div>
+<h3 id="_robustness">3.4. Robustness</h3><div style="clear:left"></div>
+<div class="para"><p>This is a really important topic. The file synchronizer should not crash and if
+it crashed, there should be no loss of data. To achieve this goal there are
+several mechanism to prevent this. These mechnanism will be discussed in the
+following sections.</p></div>
+<h4 id="_crash_resistance">3.4.1. Crash resistance</h4>
+<div class="para"><p>The synchronization process can be interrupted by different events, this can
+be:</p></div>
 <div class="ilist"><ul>
 <li>
 <p>
-2-phase-copy
+the system could be halted due to errors.
 </p>
 </li>
 <li>
 <p>
-merge trees and write journal
+the disk could be full or the quota exceeded.
+</p>
+</li>
+<li>
+<p>
+the network or power cable could be pulled out.
+</p>
+</li>
+<li>
+<p>
+the user could force a stop of the synchronization process.
+</p>
+</li>
+<li>
+<p>
+different communication errors could occur.
 </p>
 </li>
 </ul></div>
-<h3 id="_robustness">3.4. Robustness</h3><div style="clear:left"></div>
-<div class="para"><p>TODO</p></div>
-<h4 id="_crash_resistance">3.4.1. Crash resistance</h4>
-<div class="para"><p>TODO</p></div>
+<div class="para"><p>That no data will be lost due to the occurance we enforce the following
+invariant:</p></div>
+<div class="admonitionblock">
+<table><tr>
+<td class="icon">
+<img src="./images/icons/important.png" alt="Important" />
+</td>
+<td class="content">At every moment of the synchronization each file has either its
+original content or its correct final content.</td>
+</tr></table>
+</div>
+<div class="para"><p>So each interupted synchronization process is a partial sync and can be
+continued and completed by simply running csync again. The only problem could
+be an error of the filesystem. So we reach this invariant only approximatly.</p></div>
 <h4 id="_transfer_errors">3.4.2. Transfer errors</h4>
-<div class="para"><p>TODO</p></div>
+<div class="para"><p>With the Two-Phase-Commit we check the file size after the file has
+transferred. So we can detect transfer erros. Better would be a transfer
+protocol with checksums. This could possibly done in the future.</p></div>
+<div class="para"><p>Future filesystems like btrfs will help to compare checksums instead of the
+filesize. This will make the synchronization itself safer.</p></div>
 <h4 id="_database_loss">3.4.3. Database loss</h4>
-<div class="para"><p>TODO</p></div>
+<div class="para"><p>It could be possible, that the state database get corrupted. If this happens
+all files get evaluated. In this case the file synchronizer wont delete any
+file, but it could occur that deleted files will be restored from the other
+replica.
+To prevent a corruption or loss of the database if an error occurs or the user
+forces an abort, the synchronizer is working on a copy of the database and will
+use a 2-Phase-Commit to save it at the end.</p></div>
 </div>
 <h2 id="_getting_started">4. Getting started</h2>
 <div class="sectionbody">
@ -598,19 +665,35 @@ merge trees and write journal
 <div class="para"><p>See the <tt>README</tt> and <tt>INSTALL</tt> files for install prerequisites and
 procedures. Packagers take a look at <a href="#X90">Appendix B: Packager Notes</a>.</p></div>
 <h3 id="_using_the_commandline_client">4.2. Using the commandline client</h3><div style="clear:left"></div>
-<div class="para"><p>TODO
-csync /home/csync sftp://TODO:secret@server:port/profile/TODO</p></div>
+<div class="para"><p>The synopsis of the commandline client is</p></div>
+<div class="literalblock">
+<div class="content">
+<pre><tt>csync [OPTION...] SOURCE DESTINATION</tt></pre>
+</div></div>
+<div class="para"><p>It synchronizes the content of SOURCE with DESTINATION and vice versa. The
+DESTINATION can be a local directory or a remote file server.</p></div>
+<div class="literalblock">
+<div class="content">
+<pre><tt>csync /home/csync scheme://user:password@server:port/full/path</tt></pre>
+</div></div>
+<div class="para"><p>The remote destination is supported by plugins. By default csync ships with smb
+and sftp support. For more information, see the manpage of <tt>csync(1)</tt>.</p></div>
 <h3 id="_the_pam_module">4.3. The PAM module</h3><div style="clear:left"></div>
-<div class="para"><p>TODO</p></div>
+<div class="para"><p>pam_csync is a PAM module to provide roaming home directories for a user
+session. This module is aimed at environments with central file servers a user
+wishes to store his home directory. The Authentication Module verifies the
+identity of a user and triggers a synchronization with the server on the first
+login and the last logout. More information can be found in the manpage of the
+module pam_csync(8).</p></div>
 </div>
 <h2 id="X90">5. Appendix A: Packager Notes</h2>
 <div class="sectionbody">
-<div class="para"><p>Read the <tt>README</tt> and <tt>INSTALL</tt> files (in the distribution root
+<div class="para"><p>Read the <tt>README</tt>, <tt>INSTALL</tt> and <tt>FAQ</tt> files (in the distribution root
 directory).</p></div>
 </div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2008-11-20 12:16:02 CEST
+Last updated 2008-12-17 15:38:27 CEST
 </div>
 </div>
 </body>