Discussion about the handling of UIDs and filenames by file-sync (started by ticket #209).
People:
- abauer - Armin Bauer
- boto - Eduardo Habkost
- grcobb - Graham Cobb
I don't have the log of the beginning of the discussion, if anybody has it, feel free to paste the log in this page. -- EduardoHabkost?
[Qui Jun 22 2006] [09:11:35] <grcobb> abauer: how does opensync decide two elements are the same object (in the slow-sync case)? [Qui Jun 22 2006] [09:11:47] <grcobb> abauer: do the UIDs play a role in that? [Qui Jun 22 2006] [09:12:58] <grcobb> abauer: thinking about your suggestion... how about we only assign a new uid if the uid contains / [Qui Jun 22 2006] [09:13:27] <grcobb> that would allow file-to-file to continue to work (no uid generated from a filename could contain a /) [Qui Jun 22 2006] [09:13:46] <abauer> grcobb: in a slow-sync case (where no mappings between uid exist) opensync only uses the data (not the uids) for comparison [Qui Jun 22 2006] [09:14:22] <abauer> grcobb: but then we have the same problem again. did we cover all characters with special meaning? [Qui Jun 22 2006] [09:14:33] <grcobb> abauer: ah, I see, so normally the uid is a "shorthand" for the file so that mappings can be remembered easily [Qui Jun 22 2006] [09:16:13] <grcobb> abauer: yes, but I think the problems with breaking file-to-file would be enormous -- I will see if I can find a definite statement in a POSIX document about filename characters [Qui Jun 22 2006] [09:17:45] <abauer> grcobb: if there is a switch in the config file we wouldnt break the behaviour [Qui Jun 22 2006] [09:18:40] <grcobb> abauer: let me make sure I understand what you are suggesting... [Qui Jun 22 2006] [09:19:22] <grcobb> abauer: with the switch in one setting the behaviour is exactly as today -- that will always work with file-to-file [Qui Jun 22 2006] [09:19:57] <grcobb> abauer: with the switch in the other position, we use keep the fact that the name of the file is used as the uid but.... [Qui Jun 22 2006] [09:20:25] <grcobb> abauer: we assign a new UID 9and hence name), of our own invention, for anyhting that we receive as an add [Qui Jun 22 2006] [09:25:24] <abauer> grcobb: the only constraints about UID for most plugins is that the uids have to be unique for each side. and the plugin must have the possibility to alter a object if its uid is given [Qui Jun 22 2006] [09:26:35] <abauer> grcobb: but the file-sync plugin introduces another constraint, since it enforces that the uids have to be the same for different plugins [Qui Jun 22 2006] [09:27:42] <grcobb> abauer: I suppose it doesn't really enforce that they MUST be the same -- the sync process would still work -- but user expectation is that file A on one side will appear as file a on the other side [Qui Jun 22 2006] [09:28:05] <abauer> grcobb: ok. true. its not a technical reason [Qui Jun 22 2006] [09:28:32] <grcobb> abauer: but as the user expectation is reasonable, a config-file switch sounds like a good idea [Qui Jun 22 2006] [09:29:50] <grcobb> abauer: maybe the switch should be called something like "preserve-filenames" and should default to off, as file-to-file is probably not a common real synchronisation case (but very common for testing and for new users learning) [Qui Jun 22 2006] [09:34:40] <boto> hi [Qui Jun 22 2006] [09:35:19] <boto> grcobb: I am not sure we really need a config switch for this. I mean, having to preserve filenames may not be a common case. But, is there an advantage for the user on disabling this option? [Qui Jun 22 2006] [09:36:05] <boto> if preserving filenames wouldn't break the cases where the user doesn't care about the UIDs in the file-sync side, this feature could be always enabled [Qui Jun 22 2006] [09:37:04] <grcobb> boto: enabling the reserve-filenames option would use the current behaviour: where the filename used for an ADD is set from the sending plug-ins uid [Qui Jun 22 2006] [09:37:30] <grcobb> boto: which is what causes exactly the problem reported in 209 when the sending plugin's uid contains / [Qui Jun 22 2006] [09:37:42] <boto> grcobb: I just don't see on which cases that disabling it would be useful for anything [Qui Jun 22 2006] [09:37:44] <boto> ah, ticket 209 [Qui Jun 22 2006] [09:37:48] <boto> I will check it :) [Qui Jun 22 2006] [09:38:56] <grcobb> boto: disabling preserve-filenames would cause new behaviour where the uid (and filename) for an ADD is created from scratch, instead of using the sending side's uid [Qui Jun 22 2006] [09:39:43] <boto> grcobb: why not just trying to preserve the UID always, but generating a new one only if needed? [Qui Jun 22 2006] [09:40:00] <boto> this way, the file-to-file case will still keep the UIDs, and the cases where we can't preserve UIDs would work, also [Qui Jun 22 2006] [09:40:47] <abauer> boto: you sent me a patch about the duplication issue, right? [Qui Jun 22 2006] [09:41:06] <boto> abauer: I've sent you an URL where the changes are available [Qui Jun 22 2006] [09:41:14] <abauer> boto: if we allow generated uids, this would not be an issue any more [Qui Jun 22 2006] [09:41:18] <boto> I didn't committed them because you didn't commented on it [Qui Jun 22 2006] [09:41:52] Sair zecke deixou este servidor (Read error: 113 (No route to host)). [Qui Jun 22 2006] [09:41:55] <boto> abauer: what is "allow generated uids", exactly? we already allow it, don't we? [Qui Jun 22 2006] [09:42:29] <abauer> boto: it would mean that if we add a file to the file-sync plugin, the uid is not used as the filename. instead a new one is generated [Qui Jun 22 2006] [09:42:56] <boto> my patch solved the problem where a new UID need to be generated by the engine because the engine already knows it is not unique. but it won't do anything in the case a unique UID is sent to file-sync but file-sync isn't able to use it as a filename [Qui Jun 22 2006] [09:43:22] <boto> abauer: why not just try to use the UID as filename if possible (then it will work the same way in the file-to-file cases), and generate an UID if needed [Qui Jun 22 2006] [09:43:42] <boto> we don't need to always generate a new UID, neither we need to always use exactly the UID that was sent [Qui Jun 22 2006] [09:43:55] <abauer> boto: how do we make sure that the uid is usable? [Qui Jun 22 2006] [09:44:03] <boto> abauer: just check if it can be a valid filename [Qui Jun 22 2006] [09:44:34] <boto> I see the UID sent by opensync to the plugin as a hint. the plugin may try to use it but it can also generate a new one if needed [Qui Jun 22 2006] [09:47:54] <grcobb> boto: I suggested that but abauer was concerned that we might not have the full list of what was and was not allowed in a filename [Qui Jun 22 2006] [09:48:33] CTCP Pedido de PING CTCP recebido de grcobb para o canal #opensync, enviando resposta. [Qui Jun 22 2006] [09:48:37] <boto> I think that every plugin needs to know what is a valid filename for it. file-sync is responsible of creating files, it should know what is a valid filename or not [Qui Jun 22 2006] [09:48:57] SaĆda - grcobb deixou este canal . [Qui Jun 22 2006] [09:49:01] Erro valid: Comando desconhecido. [Qui Jun 22 2006] [09:49:04] <boto> s/valid filename for it/valid UID for it/ [Qui Jun 22 2006] [09:49:17] <Whoopie> dgollub: any feedback from MrM regarding the gnokii number types? [Qui Jun 22 2006] [09:49:44] <abauer> boto: true. but at the moment the file-sync plugin does not know [Qui Jun 22 2006] [09:50:58] Entrada - O grcobb juntou-se a este canal (n=irc@adsl-f2s.home.cobb.me.uk). [Qui Jun 22 2006] [09:51:36] <boto> abauer: it needs to know. its his task to do this, IMO [Qui Jun 22 2006] [09:52:04] <abauer> boto: agreed. so we have to look what valid filenames are [Qui Jun 22 2006] [09:52:07] <boto> s/its his/it's its/ [Qui Jun 22 2006] [09:52:20] <grcobb> sorry, network problem, I dropped at 13:39 --anything I missed [Qui Jun 22 2006] [09:52:20] <boto> abauer: correct. I think we can start with a conservative set of valid filename characters [Qui Jun 22 2006] [09:52:41] <abauer> boto: but this also poses a portability issue. one valid file-name might be invalid somewhere else [Qui Jun 22 2006] [09:52:48] <abauer> (like "aux" on windows) [Qui Jun 22 2006] [09:53:17] <abauer> boto: ok. so what should we allow? [Qui Jun 22 2006] [09:53:22] <boto> abauer: this is a platform-specific problem that needs to be solved somewhere in the system, anyway [Qui Jun 22 2006] [09:54:25] <boto> we don't need to allow any and every valid filename in the system. if we don't allow a filename that would be valid, this isn't a big problem. the problem is allowing invalid filenames. so I think we can start with a conservative set of valid characters for filenames. and a blacklist of reserved filenames for the operating system we are running on [Qui Jun 22 2006] [09:57:11] <boto> something like valid_chars = "a...zA...Z0..9-_+". I think we can avoid the other symbols and not support them [Qui Jun 22 2006] [09:58:15] <abauer> boto: how do we escape invalid characters? [Qui Jun 22 2006] [09:58:46] <grcobb> boto: I think you missed out "." [Qui Jun 22 2006] [09:58:47] <boto> abauer: we don't need to escape them, necessarily. we just generate a new UID because the UID sent by opensync is invalid [Qui Jun 22 2006] [09:58:55] <boto> grcobb: yes, I've missed it :) [Qui Jun 22 2006] [09:59:51] <boto> the new UID could be very similar to the original UID, but it doesn't necessarily means escaping it [Qui Jun 22 2006] [09:59:57] <abauer> boto: i dont that would work. think of the file-file case. a user would expect that a file on one side is the same file on the other side [Qui Jun 22 2006] [10:00:35] <boto> abauer: in the file-file case, the UIDs sent to the plugin are expected to be valid [Qui Jun 22 2006] [10:00:40] <boto> abauer: in the file-file case, the UIDs sent to the plugin are expected to be valid [Qui Jun 22 2006] [10:00:44] <boto> ouch, sorry [Qui Jun 22 2006] [10:00:54] <boto> of course, the user can have filenames with weird characters [Qui Jun 22 2006] [10:01:02] <boto> in this case, we can either: [Qui Jun 22 2006] [10:01:09] <boto> - Really support every valid filename [Qui Jun 22 2006] [10:01:21] <abauer> boto: how do we know if we sync file-file or file-kde? [Qui Jun 22 2006] [10:01:29] <boto> - Just trust the UID sent by opensync to the plugin, if we have a "preserve_filename" flag set in the file-sync struct [Qui Jun 22 2006] [10:01:55] <abauer> boto: wait. now thats a good idea [Qui Jun 22 2006] [10:02:05] <abauer> a preserve filename flag. [Qui Jun 22 2006] [10:02:08] <dgollub> Whoopie: not yet ... [Qui Jun 22 2006] [10:02:13] <boto> i.e. a fileFormat struct sent by file-sync would have something like "you_can_trust_the_uid" set. the fileFormat struct generated by encapsulation wouldn't have this flag set [Qui Jun 22 2006] [10:02:23] <abauer> the file-sync plugin always sets it to true [Qui Jun 22 2006] [10:02:30] <Whoopie> dgollub: ok [Qui Jun 22 2006] [10:02:34] <abauer> the file encapsulator always to false [Qui Jun 22 2006] [10:02:53] <boto> abauer: correct [Qui Jun 22 2006] [10:03:33] <boto> but we can have a more difficult problem in the future, if we have other plugins: what if we have a "ftp" plugin, running on windows, and the server has a file named "aux"? [Qui Jun 22 2006] [10:03:55] <boto> independent of the implementation or approach used, this case is difficult to be done right. what would be expeected by the user? [Qui Jun 22 2006] [10:04:26] <abauer> boto: i would do the same as done by all systems at the moment: show an error :) [Qui Jun 22 2006] [10:04:34] <boto> abauer: heh. ok :) [Qui Jun 22 2006] [10:04:47] <boto> we have another option: "ftp" not necessarily should set "preserve_filename" [Qui Jun 22 2006] [10:05:08] <boto> the file-sync plugin would try to use the UID as a hint of preserve_filename isn't set (so most files will have the same name) [Qui Jun 22 2006] [10:05:17] <boto> but it can generate a new UID if it is not set [Qui Jun 22 2006] [10:06:08] <boto> but not always, as it will be more intuitive if we try to use similar UIDs on both sides, even on cases that aren't file-file [Qui Jun 22 2006] [10:06:13] <grcobb> boto: if a new UID is going to be created I don't think it should be at all similar to the existing uid -- otherwise name clashesa re very likely (many files have similar names) [Qui Jun 22 2006] [10:06:46] <boto> grcobb: we generate a filename and check if it exists. it is just a feature to make things more intuitive [Qui Jun 22 2006] [10:06:55] <abauer> boto: i dont think we should look at "valid" filenames. we either trust the uid, or we dont [Qui Jun 22 2006] [10:06:56] <boto> i.e. if we will generate a filename, we can just try to use the UID [Qui Jun 22 2006] [10:07:04] <boto> abauer: I think we can use the uid as a hint [Qui Jun 22 2006] [10:07:06] <abauer> if we dont trust it, we always generate a new uid [Qui Jun 22 2006] [10:07:08] <boto> using the UID as a hint is useful [Qui Jun 22 2006] [10:07:19] <abauer> boto: why? [Qui Jun 22 2006] [10:07:35] <boto> for example: if KDE sent us a UID 12345, simply using this as an UID is more intuitive [Qui Jun 22 2006] [10:07:50] <boto> abauer: because it is easier for debugging and seeing what is happening after synchronization [Qui Jun 22 2006] [10:08:25] <abauer> boto: but the approach might be more error prone [Qui Jun 22 2006] [10:08:26] <boto> I would ask: "why not": if the UID sent to file-sync is obviously valid (e.g. having only letters and numbers), and a file with this name doesn't exist, why not just using it as UID? [Qui Jun 22 2006] [10:08:31] <abauer> and i dont see a advantage [Qui Jun 22 2006] [10:08:34] <boto> abauer: why error prone? [Qui Jun 22 2006] [10:08:52] <abauer> because you have to make sure that the check is correct [Qui Jun 22 2006] [10:09:15] <boto> abauer: in this case we don't need to support all and every valid filename. just the "obviously right" would be used as a hint [Qui Jun 22 2006] [10:09:30] <abauer> boto: i still dont see the advantage [Qui Jun 22 2006] [10:10:11] <abauer> boto: if its only for debugging: the developer can use the utilities provided with opensync to dump the mapping table and get the uid of kde [Qui Jun 22 2006] [10:10:27] <boto> abauer: one advantage is that we can make file <-> some-remote-file-plugin synchronization work even if filenames in the remote side aren't all supported by the local filesystem [Qui Jun 22 2006] [10:11:26] <boto> abauer: I think the advantages are small. but the code for doing this (using the UID hint if it is obviously a valid filename), that I don't see why not doing it [Qui Jun 22 2006] [10:11:54] <boto> "the code for doing this is simple", I mean [Qui Jun 22 2006] [10:12:51] <boto> I think the knotes example could be a good one where using the UID as a hint could be nice for the user: the user created many notes, with may different titles [Qui Jun 22 2006] [10:13:04] <boto> having filenames similar to the note titles in the other side will be a nice feature. and a cheap one [Qui Jun 22 2006] [10:13:21] <abauer> boto: well. ok. convinced :) [Qui Jun 22 2006] [10:13:40] <boto> :) [Qui Jun 22 2006] [10:13:56] <abauer> grcobb: do you also agree? [Qui Jun 22 2006] [10:14:21] <boto> abauer: anyway, feel free not not necessarily implementing this use-uid-as-hint feature in the first version. it can be added later to the code [Qui Jun 22 2006] [10:14:35] <grcobb> abauer: yes, I think the solution you guys have worked out is good
