wiki:trunk/features/queue

This page is based on work done on re-writing the timeout handling in opensync_queue.c in early 2009. The further we are from that date the less likely this is to be accurate any more.

OSyncClient

Before we dive into the OSyncQueue implementation, it is worth reminding ourselves how OSyncClient uses the queues.

The client makes use of two, completely separate, IPC queues which use four OSyncQueue structures.

  1. The command queue. This is an outgoing OSyncQueue on the engine side, connected to an incoming OSyncQueue on the plugin side and used to send commands to the plugin. In the description below these are sometimes referred to as the "outgoing command" queue and the "incoming command" queue, respectively. However, it is important to note that although OSyncQueue has the concept of an outgoing or an incoming queue there is no indication of whether a queue is a command queue or a reply queue ecept whether it is currently handling a command or a reply.
  2. The reply queue. This is an outgoing OSyncQueue on the plugin side, connected to an incoming OSyncQueue on the engine side and used to send replies back to the engine. In the description below these are sometimes referred to as the "outgoing reply" queue and the "incoming reply" queue, respectively.

OSyncQueue

Every OSyncQueue has the same structure (whether incoming or outgoing, command or reply):

There are four GSource and two GAsyncQueue (called, below, "asyncs" to avoid confusion with the OSyncQueue). The communication channel to the partner OSyncQueue consists of one fifo -- one or other of the two drawn, depending on the direction of the queue. There is also an OSyncList of pending commands waiting for replies.

Incoming async

This GAsyncQueue holds messages (OSyncMessage) waiting to be delivered to the client of the queue.

Outgoing async

This GAsyncQueue holds messages waiting to be sent to the partner.

Read source

This corresponds to the functions in opensync_queue.c called _source_something.

This GSource represents the data received from the partner over the inbound fifo. The job of this source is to receive data, package it up into OsyncMessage objects and put them on the incoming async.

If an error or hangup occurs on the fifo, the read source will generate an error message and put it on the incoming async.

Incoming source

This corresponds to the functions in opensync_queue.c called _incoming_something.

This GSource represents the incoming async. It takes messages from the async and then passes them on:

  • For commands, call the message handler callback provided by the client.
  • For replies, find the corresponding command on the pending list and call the callback provided by the sender.

Write source

This corresponds to the functions in opensync_queue.c called _queue_something.

When the client sends a message, it is put on the outgoing async. The write source represents the outgoing async. It takes messages from the async and sends them over the fifo to the partner.

Timeout source

This corresponds to the functions in opensync_queue.c called _timeout_something.

The main purpose of this source is to represent entries on the pending list who's timeout has expired. If a pending entry times out, this source calls the message sender's callback routine with an error response. Note that not all entries on the pending list have timeouts associated with them (in fact, timeout entries are only present on the pending list associated with incoming command queues, as explained below).

In addition to the timeouts on the pending list, a timeout is run to protect the fifo. This is a queue-wide timeout which checks that if there are outstanding commands (with timeouts or not), replies are still being received.

Types of queues

All the operations described above are common to all types of queue. However, they are not actually used for every type of queue because of the different way they are used.

In principle, a queue could handle any mixture of commands and responses but, in practice, OSyncClient doesn't use them in that way: each queue either handles commands or responses.

Most of the code can also handle a mixture of incoming and outgoing messages but, in fact, each queue can only handle one fifo so is either an incoming or an outgoing queue. However, even an outgoing queue sometimes handles incoming messages: messages generated internally to notify the client of errors with the fifo.

The description below describes the operation of the different types of queues.


Sending queue

This a queue of type OSYNC_QUEUE_SENDER, also known as an outgoing queue. This is a queue used to send messages, which may be commands or responses, to the other end.

Incoming

The incoming side is easy to deal with: the only messages received are internally generated errors reporting problems with the fifo. So, the read source does nothing, except detect errors. The incoming source then processes these error messages, which will always be sent to the client's handler callback as there is no pending list (see below).

Timeout

The timeout source also does nothing as there is no pending queue and no queue timer running (as nothing is ever received).

Outgoing

The outgoing side does, obviously do some work. The client calls the send function which puts the message on the outgoing async, as described above.

Outgoing command

If the message being sent is a command, the caller specifies the queue on which a reply is later expected (this is the queue called the "incoming reply" queue in the description of the client, above). The pending message is then put on the pending list of the reply queue (not, the pending list of this queue, which is never actually used). However, no timeout is associated with this pending entry.

The reason for there being no timeout is that the timeout refers to the length of time this command may take to execute. But there may be a large number of commands queued up ahead of this one so only the receiver can time the operation.

In order to protect against the fifo itself stalling (e.g. the receiver stops running), the queue timer of the reply queue is started. That timer makes sure that at least one response is received within a timeout period.

Outgoing reply

If the message being sent is a reply then this is an "outgoing reply" queue. In addition to sending the reply, the send routine searches the associated "incoming command" queue pending list to remove the corresponding command from the pending list (and stop the timer).

The association between the "incoming command" queue and the "outgoing reply" queue is set up by the client using the osync_quue_cross_link function. It can only be done after both queues are set up.


Receiving queue

This a queue of type OSYNC_QUEUE_RECEIVER, also known as an incoming queue. This is a queue used to receive messages, which may be commands or responses, from the other end.

Outgoing

The outgoing side is not used.

Incoming

The read source works as described above. However, the incoming source processing is a little more complex.

First, the incoming source is not even processed if there are more than some number of commands on the pending queue. This is so that timeout processing is dealing with a bounded queue length. While the pending queue is full, messages just sit in the incoming async (and are not being timed). All messages on the pending queue have timeouts so if the queue is fll there must be timouts protecting us.

Incoming command

If the incoming message is a command and it has a timeout, it is put on the pending list for this queue. It would be nice to put it on the pending list for the outgoing reply queue but that queue may not even exist yet (if this is an initialise command). So, we use the pending list for this incoming command queue.

Then the client handler is called to handle the message.

Incoming reply

If the incoming message is a reply then this is an incoming reply queue. The corresponding message is found on the pending list and the message callback is called to process the reply.

Timeout

The timeout code handles two sorts of timeouts (which are used on different sorts of queues).

Command timeouts

These occur on "incoming command" queues. The timeout code searches the pending list looking for commands which have timed out. If it finds any, it generates error responses for them and sends them on the corresponding "outgoing reply" queue (set up by cross-linking). If the cross-link has not happened yet, an error is reported to the client so it will disconnect the queue and the other end should notice (when its queue timeout expires even if it doesn't notice the actual disconnect).

Queue timeout

This occurs on "incoming reply" queues. As long as there are entries on the pending queue (which will not be being timed because timing only happens on incoming command queues), there is a timeout set to check that we see some response from the other end within a certain time. It is reset in the incoming source whenever a message is received. If it times out, this is treated as though the fifo had seen an error.


Timeout considerations

Setting timeout values

  • The timeout starts when the plugin routine is called and stops when the response is provided back.
  • If plugin routines are synchronous (possibly including blocking on network connections), the timeout just needs to take into account how long the operation could take.
  • Note that even if the plugin is synchronous, a timeout will occur, and an error response will be sent, if the plugin blocks for too long (e.g. if the timeout is set at 10 but the device takes 20 seconds to respond).
  • If the plugin routines are asynchronous and return instead of blocking, the timeout needs to take into account that several transactions may be started. For example, maybe 4 updates are started: if the critical path is how long the remote device takes to process an update (say 1 second), then the last update will not get a response until all the previous ones have been handled and will see a delay of 4 seconds (in this example). The maximum number of transactions that can be started at the same time is currently 5 (OSYNC_QUEUE_PENDING_LIMIT in opensync_queue_internals.h).

Attachments