Overview of the synchronization process
This document describes some key concepts needed to understand the synchronization process on Opensync.
Initial concepts
Members and Groups
Synchronization is done in Groups, that is a set of Members.
Each member stores a set of objects. It may correspond to a device, such as a mobile phone or palmtop, a place where data should be stored, such as a filesystem directory, or an application, such as a PIM suite.
Changes
Changes are first-class citzens on the synchronization engine. The synchronization engine is responsible of getting changes from members, check what should be done for synchronization, and send resulting changes for each member, in order to keep the data on the members synchronized.
Changes may be of three types:
- Deleting
- Modifying
- Adding
Deleting changes mean that an object was deleted on the member (when changes are reported by the member), or that an object should be deleted on a member (when changes are being written to the member).
Modifying changes mean that an object was modified on a member (when reported), or that an object should be modified on the database of a member (when being written)
Adding changes mean that a new object was added on the database of a member (when reported) or that an object should be added on the database of a member (when being written).
Slow-sync or fast-sync
The synchronization process may run in two modes: slow-sync and fast-sync.
Fast-sync is the normal case of synchronization. Fast-sync requires that only the changes made since the last synchronization should be reported by the member, and be written to the members.
Slow-sync is the case of synchronization where the members where never synchronized before, or the information about changes since the last synchronization on one of the members is not available. On this case, the members should report to the sync engine all data on their databases.
The first synchronization between two members is always a slow-sync.
Anchors
To be able to tell if a fast-sync may be done or not, the synchronization members use anchors to tell if the synchronization information on the engine and on the device/application are consistent. SyncML uses the elements Last and Next on the Sync commands inside SyncML messages, so the sync engine may tell if a fast-sync will be possible, or slow-sync will be necessary.
UIDs
An UID is an unique identifier for an object in the database on a member. The UIDs are transparent to the user, but necessary for the synchronization process, so the engine, the devices, the applications may identify individual objects.
On a directory on the filesystem, the UID is the name of the file containing the data. Other devices and applications have their own UID scheme. The only requirement is that the same object will always have the same UID on a member, and distinct objects[1] will always have different UIDs.
Note: Please notice that "same object" isn't a synonym of "same data". For example: two files containing the same data are different files. In the same way, the same file may be changed to contain different data, but it would still be considered the "same file"
Mappings
Key elements on the information necessary for the sync-engine to do proper synchronization are the mappings. The mappings contain the information needed to know which objects on an member correspond to which objects on other members. Generally, the mapping is simply a table stored on the SyncEngine? database, containing the UIDs for the objects on the same member.
Example, let's suppose that member A contains these objects:
| UID | Name | Phone number |
| 1 | Mary | 123456 |
| 2 | John | 987654 |
| 3 | Peter | 918278 |
And member B was synchronized with member A, and contains these objects:
| UID | Name | Phone number |
| 101 | John | 987654 |
| 102 | Peter | 918278 |
| 103 | Mary | 123456 |
The mapping table on the syncengine would look like:
| UID on A | UID on B |
| 1 | 103 |
| 2 | 101 |
| 3 | 102 |
Each line on the table above is called a mapping.
Mappings and conflicts
The mappings are essential to the most important feature of a synchronization engine, that is conflict detection.
A conflict happens when the same object is changed differently on two members. On this case, the user needs to tell the syncengine what should be done with the conflicting changes.
The user may choose to use only the changes made on one of the members (hence dropping the changes that were made on the other member).
The user may choose to keep the data on both members, by duplicating the entries. On this case, the objects wouldn't anymore be considered to be "the same" (i.e. mapped), but different objects, and both will be stored on both members. In short, the mapping between the objects will be "broken".
Another possible option is allowing the user to merge manually the information from both sides and create a new object, that will be stored on both members replacing the original objects. This would need some support on the user interface to be able to build a new object containing data from both sides.
OpenSync? also offers the possibility of ignoring the conflict. That means that the conflict won't be solved on the current synchronization process, and will be reported again on next synchronizations.
Mappings and slow-sync
Understanding mappings is essential to understand the differences between slow-sync and fast-sync.
When a fast-sync is possible, that means that the mapping table on the SyncEngine? is complete and consistent with the real state of the database on the members.
On slow-sync, the syncengine will drop all information about the mappings between the objects, because it may not exist yet, or - if it exists - it may be not consistent. On these cases, the syncengine will need to detect the mappings between the objects on the members, based on the data of each object.
To be able to detect these mappings, the engine will use some heuristics, based on which it will check if the objects on both sides are exactly the same, or similar.
If the objects are exactly the same, a mapping between them will be created, and no conflict will be raised because they contain the same data.
If the objects are "similar" (see note below), that means that they probably refer to the same contact, same appointment, or same object for the user. In this case, a mapping between them will be created, and a conflict will be raised. If the objects really refer to the same thing for the user, it will be a valid conflict and the user will choose how to solve it. If they aren't the same objects, the user may choose "duplicate", to break the mapping between them.
Note: The rules on how to assume that objects are "similar" or not are defined on the comparison functions for vcards and vcalendars, on the opensync code. See formats/vformats-xml/xml-vcard.c and xml-vcal.c on the opensync source code.
Putting it together
Slow-sync (first sync)
This table shows the behaviour of the sync engine when there are no mappings between the objects, on the sync. On this case, the data on the objects is used to detect which objects should be mapped together.
The Comp. Result column shows the result between the comparison of the data on both sides. The comparison of the data uses some fields as "key fields", as described on the previous section. If the key fields are equal, but there are differences on the data, the result is Similar, if the data is exactly the same, the result is Same, if the key fields are different, the result is Different.
| Before Sync | Comp. Result | Conflict | After Sync | ||||
| Left | Right | ' | ' | Left | Right | Mapping | |
| A | A | Same | - | A | A | A=A | |
| A | - | - | - | A | A | A=A | |
| A | B | Different | - | A,B | A,B | A=A, B=B | |
| A1 | A2 | Similar | Duplicate | A1,A2 | A1,A2 | A1=A1, A2=A2 | |
| A1 | A2 | Similar | Left | A1 | A1 | A1=A1 | |
| A1 | A2 | Similar | Right | A2 | A2 | A2=A2 | |
| A1(t1) | A2(t2) | Similar | Most Rec. | A2 | A2 | A2=A2 | |
| A1 | A2 | Similar | Ignore | A1 | A2 | A1=A2 | The conflict will be reported again on next sync |
Fast-sync
This table shows the results of the synchronization considering that A1 on left-member is mapped to A2 in right-member, from a previous synchronization. Except on the rows where A2 doesn't appear, meaning that A1 isn't mapped to any element on right-member.
- chg(A) means that the object was changed on the member, since the last sync
- del(A) means that the object was deleted on the member, since the last sync
- add(A) means that the object was added on the member, since the last sync
- nochg(A) means that the object wasn't changed on the member, since the last sync
| Before Sync | Comp. Result | Conflict | After Sync | |||||
| Left | Right | Map | ' | ' | Left | Right | Mapping | |
| chg(A1) | nochg(A2) | A1=A2 | - | - | A1 | A1 | A1=A1 | |
| chg(A) | chg(A) | A=A | Same | - | A | A | A=A | |
| chg(A1) | chg(A2) | A1=A2 | Sim. or Dif. | Left | A1 | A1 | A1=A1 | |
| chg(A1) | chg(A2) | A1=A2 | Sim. or Dif. | Right | A2 | A2 | A2=A2 | |
| chg(A1) | chg(A2) | A1=A2 | Sim. or Dif. | Duplicate | A1,A2 | A1,A2 | A1=A1, A2=A2 | |
| chg(A1) | chg(A2) | A1=A2 | Sim. or Dif. | Ignore | A1 | A2 | A1=A2 | The conflict will be reported again on next sync |
| chg(A1,t1) | chg(A2,t2) | A1=A2 | Sim. or Dif. | Most. Rec. | A2 | A2 | A2=A2 | |
| chg(A1,t2) | chg(A2,t1) | A1=A2 | Sim. or Dif. | Most. Rec. | A1 | A1 | A1=A1 | |
| chg(A1) | del(A2) | A1=A2 | - | Left | A1 | A1 | A1=A1 | |
| chg(A1) | del(A2) | A1=A2 | - | Right | - | - | - | |
| chg(A1) | del(A2) | A1=A2 | - | Duplicate | A1 | A1 | A1=A1 | Duplicate doesn't make sense on this case |
| chg(A1) | del(A2) | A1=A2 | - | Ignore | A1 | - | A1=A2 | The conflict will be reported again on next sync |
| chg(A1,t1) | del(A2,t2) | A1=A2 | - | Most. Rec. | - | - | - | |
| chg(A1,t2) | del(A2,t1) | A1=A2 | - | Most. Rec. | A1 | A1 | A1=A1 | |
| add(A1) | - | - | - | - | A1 | A1 | A1=A1 | |
| add(A) | add(A) | - | Same | - | A | A | A=A | |
| add(A) | add(B) | - | Different | - | A,B | A,B | A=A, B=B | |
| add(A1) | add(A2) | - | Similar | Left | A1 | A1 | A1=A1 | |
| add(A1) | add(A2) | - | Similar | Right | A2 | A2 | A2=A2 | |
| add(A1) | add(A2) | - | Similar | Duplicate | A1,A2 | A1,A2 | A1=A1, A2=A2 | |
| add(A1) | add(A2) | - | Similar | Ignore | A1 | A2 | A1=A2 | The conflict will be reported again on next sync |
| add(A1,t1) | add(A2,t2) | - | Similar | Most. Rec. | A2 | A2 | A2=A2 | |
| add(A1,t2) | add(A2,t1) | - | Similar | Most. Rec. | A1 | A1 | A1=A1 | |
| del(A1) | nochg(A2) | A1=A2 | - | - | - | - | - | |
| del(A1) | del(A2) | A1=A2 | - | - | - | - | - |
Impossible cases
For completeness, we list some impossible cases, explained below:
| Before Sync | Comp. Result | Conflict | After Sync | |||||
| Left | Right | Map | ' | ' | Left | Right | Mapping | |
| chg(A1) | add(A2) | A1=A2 | Impossible case. After a previous sync, A1 will always be mapped to an existing object on right-member. See next row for what would be a possible case. | |||||
| chg(A1) | nochg(A2),add(A3) | A1=A2 | - | - | A1,A3 | A2,A3 | A1=A2, A3=A3 | Note: A1 has the same contents of A2 |
| add(A1) | nochg(A2) | A1=A2 | Impossible case, too. An Add will never be reported to a already-mapped object |
