Discussion about the handling of UIDs and filenames by file-sync (started by ticket #209).

People:

  • abauer - Armin Bauer
  • boto - Eduardo Habkost
  • grcobb - Graham Cobb

I don't have the log of the beginning of the discussion, if anybody has it, feel free to paste the log in this page. -- EduardoHabkost?

[Qui Jun 22 2006] [09:11:35] <grcobb>	abauer: how does opensync decide two elements are the same object (in the slow-sync case)?
[Qui Jun 22 2006] [09:11:47] <grcobb>	abauer: do the UIDs play a role in that?
[Qui Jun 22 2006] [09:12:58] <grcobb>	abauer: thinking about your suggestion... how about we only assign a new uid if the uid contains /
[Qui Jun 22 2006] [09:13:27] <grcobb>	that would allow file-to-file to continue to work (no uid generated from a filename could contain a /)
[Qui Jun 22 2006] [09:13:46] <abauer>	grcobb: in a slow-sync case (where no mappings between uid exist) opensync only uses the data (not the uids) for comparison
[Qui Jun 22 2006] [09:14:22] <abauer>	grcobb: but then we have the same problem again. did we cover all characters with special meaning?
[Qui Jun 22 2006] [09:14:33] <grcobb>	abauer: ah, I see, so normally the uid is a "shorthand" for the file so that mappings can be remembered easily
[Qui Jun 22 2006] [09:16:13] <grcobb>	abauer: yes, but I think the problems with breaking file-to-file would be enormous -- I will see if I can find a definite statement in a POSIX document about filename characters
[Qui Jun 22 2006] [09:17:45] <abauer>	grcobb: if there is a switch in the config file we wouldnt break the behaviour
[Qui Jun 22 2006] [09:18:40] <grcobb>	abauer: let me make sure I understand what you are suggesting...
[Qui Jun 22 2006] [09:19:22] <grcobb>	abauer: with the switch in one setting the behaviour is exactly as today -- that will always work with file-to-file
[Qui Jun 22 2006] [09:19:57] <grcobb>	abauer: with the switch in the other position, we use keep the fact that the name of the file is used as the uid but....
[Qui Jun 22 2006] [09:20:25] <grcobb>	abauer: we assign a new UID 9and hence name), of our own invention, for anyhting that we receive as an add
[Qui Jun 22 2006] [09:25:24] <abauer>	grcobb: the only constraints about UID for most plugins is that the uids have to be unique for each side. and the plugin must have the possibility to alter a object if its uid is given
[Qui Jun 22 2006] [09:26:35] <abauer>	grcobb: but the file-sync plugin introduces another constraint, since it enforces that the uids have to be the same for different plugins
[Qui Jun 22 2006] [09:27:42] <grcobb>	abauer: I suppose it doesn't really enforce that they MUST be the same -- the sync process would still work -- but user expectation is that file A on one side will appear as file a on the other side
[Qui Jun 22 2006] [09:28:05] <abauer>	grcobb: ok. true. its not a technical reason
[Qui Jun 22 2006] [09:28:32] <grcobb>	abauer: but as the user expectation is reasonable, a config-file switch sounds like a good idea
[Qui Jun 22 2006] [09:29:50] <grcobb>	abauer: maybe the switch should be called something like "preserve-filenames" and should default to off, as file-to-file is probably not a common real synchronisation case (but very common for testing and for new users learning)
[Qui Jun 22 2006] [09:34:40] <boto>	hi
[Qui Jun 22 2006] [09:35:19] <boto>	grcobb: I am not sure we really need a config switch for this. I mean, having to preserve filenames may not be a common case. But, is there an advantage for the user on disabling this option?
[Qui Jun 22 2006] [09:36:05] <boto>	if preserving filenames wouldn't break the cases where the user doesn't care about the UIDs in the file-sync side, this feature could be always enabled
[Qui Jun 22 2006] [09:37:04] <grcobb>	boto: enabling the reserve-filenames option would use the current behaviour: where the filename used for an ADD is set from the sending plug-ins uid
[Qui Jun 22 2006] [09:37:30] <grcobb>	boto: which is what causes exactly the problem reported in 209 when the sending plugin's uid contains /
[Qui Jun 22 2006] [09:37:42] <boto>	grcobb: I just don't see on which cases that disabling it would be useful for anything
[Qui Jun 22 2006] [09:37:44] <boto>	ah, ticket 209
[Qui Jun 22 2006] [09:37:48] <boto>	I will check it  :)
[Qui Jun 22 2006] [09:38:56] <grcobb>	boto: disabling preserve-filenames would cause new behaviour where the uid (and filename) for an ADD is created from scratch, instead of using the sending side's uid
[Qui Jun 22 2006] [09:39:43] <boto>	grcobb: why not just trying to preserve the UID always, but generating a new one only if needed?
[Qui Jun 22 2006] [09:40:00] <boto>	this way, the file-to-file case will still keep the UIDs, and the cases where we can't preserve UIDs would work, also
[Qui Jun 22 2006] [09:40:47] <abauer>	boto: you sent me a patch about the duplication issue, right?
[Qui Jun 22 2006] [09:41:06] <boto>	abauer: I've sent you an URL where the changes are available
[Qui Jun 22 2006] [09:41:14] <abauer>	boto: if we allow generated uids, this would not be an issue any more
[Qui Jun 22 2006] [09:41:18] <boto>	I didn't committed them because you didn't commented on it
[Qui Jun 22 2006] [09:41:52] Sair	zecke deixou este servidor  (Read error: 113 (No route to host)).
[Qui Jun 22 2006] [09:41:55] <boto>	abauer: what is "allow generated uids", exactly? we already allow it, don't we?
[Qui Jun 22 2006] [09:42:29] <abauer>	boto: it would mean that if we add a file to the file-sync plugin, the uid is not used as the filename. instead a new one is generated
[Qui Jun 22 2006] [09:42:56] <boto>	my patch solved the problem where a new UID need to be generated by the engine because the engine already knows it is not unique. but it won't do anything in the case a unique UID is sent to file-sync but file-sync isn't able to use it as a filename
[Qui Jun 22 2006] [09:43:22] <boto>	abauer: why not just try to use the UID as filename if possible (then it will work the same way in the file-to-file cases), and generate an UID if needed
[Qui Jun 22 2006] [09:43:42] <boto>	we don't need to always generate a new UID, neither we need to always use exactly the UID that was sent
[Qui Jun 22 2006] [09:43:55] <abauer>	boto: how do we make sure that the uid is usable?
[Qui Jun 22 2006] [09:44:03] <boto>	abauer: just check if it can be a valid filename
[Qui Jun 22 2006] [09:44:34] <boto>	I see the UID sent by opensync to the plugin as a hint. the plugin may try to use it but it can also generate a new one if needed
[Qui Jun 22 2006] [09:47:54] <grcobb>	boto: I suggested that but abauer was concerned that we might not have the full list of what was and was not allowed in a filename
[Qui Jun 22 2006] [09:48:33] CTCP	Pedido de PING CTCP recebido de grcobb para o canal #opensync, enviando resposta.
[Qui Jun 22 2006] [09:48:37] <boto>	I think that every plugin needs to know what is a valid filename for it. file-sync is responsible of creating files, it should know what is a valid filename or not
[Qui Jun 22 2006] [09:48:57] SaĆ­da - 	grcobb deixou este canal .
[Qui Jun 22 2006] [09:49:01] Erro	valid: Comando desconhecido.
[Qui Jun 22 2006] [09:49:04] <boto>	s/valid filename for it/valid UID for it/
[Qui Jun 22 2006] [09:49:17] <Whoopie>	dgollub: any feedback from MrM regarding the gnokii number types?
[Qui Jun 22 2006] [09:49:44] <abauer>	boto: true. but at the moment the file-sync plugin does not know
[Qui Jun 22 2006] [09:50:58] Entrada - 	O grcobb juntou-se a este canal (n=irc@adsl-f2s.home.cobb.me.uk).
[Qui Jun 22 2006] [09:51:36] <boto>	abauer: it needs to know. its his task to do this, IMO
[Qui Jun 22 2006] [09:52:04] <abauer>	boto: agreed. so we have to look what valid filenames are
[Qui Jun 22 2006] [09:52:07] <boto>	s/its his/it's its/
[Qui Jun 22 2006] [09:52:20] <grcobb>	sorry, network problem, I dropped at 13:39 --anything I missed
[Qui Jun 22 2006] [09:52:20] <boto>	abauer: correct. I think we can start with a conservative set of valid filename characters
[Qui Jun 22 2006] [09:52:41] <abauer>	boto: but this also poses a portability issue. one valid file-name might be invalid somewhere else
[Qui Jun 22 2006] [09:52:48] <abauer>	(like "aux" on windows)
[Qui Jun 22 2006] [09:53:17] <abauer>	boto: ok. so what should we allow?
[Qui Jun 22 2006] [09:53:22] <boto>	abauer: this is a platform-specific problem that needs to be solved somewhere in the system, anyway
[Qui Jun 22 2006] [09:54:25] <boto>	we don't need to allow any and every valid filename in the system. if we don't allow a filename that would be valid, this isn't a big problem. the problem is allowing invalid filenames. so I think we can start with a conservative set of valid characters for filenames. and a blacklist of reserved filenames for the operating system we are running on
[Qui Jun 22 2006] [09:57:11] <boto>	something like valid_chars = "a...zA...Z0..9-_+". I think we can avoid the other symbols and not support them
[Qui Jun 22 2006] [09:58:15] <abauer>	boto: how do we escape invalid characters?
[Qui Jun 22 2006] [09:58:46] <grcobb>	boto: I think you missed out "."
[Qui Jun 22 2006] [09:58:47] <boto>	abauer: we don't need to escape them, necessarily. we just generate a new UID because the UID sent by opensync is invalid
[Qui Jun 22 2006] [09:58:55] <boto>	grcobb: yes, I've missed it  :)
[Qui Jun 22 2006] [09:59:51] <boto>	the new UID could be very similar to the original UID, but it doesn't necessarily means escaping it
[Qui Jun 22 2006] [09:59:57] <abauer>	boto: i dont that would work. think of the file-file case. a user would expect that a file on one side is the same file on the other side
[Qui Jun 22 2006] [10:00:35] <boto>	abauer: in the file-file case, the UIDs sent to the plugin are expected to be valid
[Qui Jun 22 2006] [10:00:40] <boto>	abauer: in the file-file case, the UIDs sent to the plugin are expected to be valid
[Qui Jun 22 2006] [10:00:44] <boto>	ouch, sorry
[Qui Jun 22 2006] [10:00:54] <boto>	of course, the user can have filenames with weird characters
[Qui Jun 22 2006] [10:01:02] <boto>	in this case, we can either:
[Qui Jun 22 2006] [10:01:09] <boto>	- Really support every valid filename
[Qui Jun 22 2006] [10:01:21] <abauer>	boto: how do we know if we sync file-file or file-kde?
[Qui Jun 22 2006] [10:01:29] <boto>	- Just trust the UID sent by opensync to the plugin, if we have a "preserve_filename" flag set in the file-sync struct
[Qui Jun 22 2006] [10:01:55] <abauer>	boto: wait. now thats a good idea
[Qui Jun 22 2006] [10:02:05] <abauer>	a preserve filename flag.
[Qui Jun 22 2006] [10:02:08] <dgollub>	Whoopie: not yet ...
[Qui Jun 22 2006] [10:02:13] <boto>	i.e. a fileFormat struct sent by file-sync would have something like "you_can_trust_the_uid" set. the fileFormat struct generated by encapsulation wouldn't have this flag set
[Qui Jun 22 2006] [10:02:23] <abauer>	the file-sync plugin always sets it to true
[Qui Jun 22 2006] [10:02:30] <Whoopie>	dgollub: ok
[Qui Jun 22 2006] [10:02:34] <abauer>	the file encapsulator always to false
[Qui Jun 22 2006] [10:02:53] <boto>	abauer: correct
[Qui Jun 22 2006] [10:03:33] <boto>	but we can have a more difficult problem in the future, if we have other plugins: what if we have a "ftp" plugin, running on windows, and the server has a file named "aux"?
[Qui Jun 22 2006] [10:03:55] <boto>	independent of the implementation or approach used, this case is difficult to be done right. what would be expeected by the user?
[Qui Jun 22 2006] [10:04:26] <abauer>	boto: i would do the same as done by all systems at the moment: show an error :)
[Qui Jun 22 2006] [10:04:34] <boto>	abauer: heh. ok  :)
[Qui Jun 22 2006] [10:04:47] <boto>	we have another option: "ftp" not necessarily should set "preserve_filename"
[Qui Jun 22 2006] [10:05:08] <boto>	the file-sync plugin would try to use the UID as a hint of preserve_filename isn't set (so most files will have the same name)
[Qui Jun 22 2006] [10:05:17] <boto>	but it can generate a new UID if it is not set
[Qui Jun 22 2006] [10:06:08] <boto>	but not always, as it will be more intuitive if we try to use similar UIDs on both sides, even on cases that aren't file-file
[Qui Jun 22 2006] [10:06:13] <grcobb>	boto: if a new UID is going to be created I don't think it should be at all similar to the existing uid -- otherwise name clashesa re very likely (many files have similar names)
[Qui Jun 22 2006] [10:06:46] <boto>	grcobb: we generate a filename and check if it exists. it is just a feature to make things more intuitive
[Qui Jun 22 2006] [10:06:55] <abauer>	boto: i dont think we should look at "valid" filenames. we either trust the uid, or we dont
[Qui Jun 22 2006] [10:06:56] <boto>	i.e. if we will generate a filename, we can just try to use the UID
[Qui Jun 22 2006] [10:07:04] <boto>	abauer: I think we can use the uid as a hint
[Qui Jun 22 2006] [10:07:06] <abauer>	if we dont trust it, we always generate a new uid
[Qui Jun 22 2006] [10:07:08] <boto>	using the UID as a hint is useful
[Qui Jun 22 2006] [10:07:19] <abauer>	boto: why?
[Qui Jun 22 2006] [10:07:35] <boto>	for example: if KDE sent us a UID 12345, simply using this as an UID is more intuitive
[Qui Jun 22 2006] [10:07:50] <boto>	abauer: because it is easier for debugging and seeing what is happening after synchronization
[Qui Jun 22 2006] [10:08:25] <abauer>	boto: but the approach might be more error prone
[Qui Jun 22 2006] [10:08:26] <boto>	I would ask: "why not": if the UID sent to file-sync is obviously valid (e.g. having only letters and numbers), and a file with this name doesn't exist, why not just using it as UID?
[Qui Jun 22 2006] [10:08:31] <abauer>	and i dont see a advantage
[Qui Jun 22 2006] [10:08:34] <boto>	abauer: why error prone?
[Qui Jun 22 2006] [10:08:52] <abauer>	because you have to make sure that the check is correct
[Qui Jun 22 2006] [10:09:15] <boto>	abauer: in this case we don't need to support all and every valid filename. just the "obviously right" would be used as a hint
[Qui Jun 22 2006] [10:09:30] <abauer>	boto: i still dont see the advantage
[Qui Jun 22 2006] [10:10:11] <abauer>	boto: if its only for debugging: the developer can use the utilities provided with opensync to dump the mapping table and get the uid of kde
[Qui Jun 22 2006] [10:10:27] <boto>	abauer: one advantage is that we can make file <-> some-remote-file-plugin synchronization work even if filenames in the remote side aren't all supported by the local filesystem
[Qui Jun 22 2006] [10:11:26] <boto>	abauer: I think the advantages are small. but the code for doing this (using the UID hint if it is obviously a valid filename), that I don't see why not doing it
[Qui Jun 22 2006] [10:11:54] <boto>	"the code for doing this is simple", I mean
[Qui Jun 22 2006] [10:12:51] <boto>	I think the knotes example could be a good one where using the UID as a hint could be nice for the user: the user created many notes, with may different titles
[Qui Jun 22 2006] [10:13:04] <boto>	having filenames similar to the note titles in the other side will be a nice feature. and a cheap one
[Qui Jun 22 2006] [10:13:21] <abauer>	boto: well. ok. convinced :)
[Qui Jun 22 2006] [10:13:40] <boto>	:)
[Qui Jun 22 2006] [10:13:56] <abauer>	grcobb: do you also agree?
[Qui Jun 22 2006] [10:14:21] <boto>	abauer: anyway, feel free not not necessarily implementing this use-uid-as-hint feature in the first version. it can be added later to the code
[Qui Jun 22 2006] [10:14:35] <grcobb>	abauer: yes, I think the solution you guys have worked out is good