Clustering Routing
Within the Cluster, a lot of meta information needs to be exchanged between instances of the Clustering_Component in order to keep a consistent view of the Cluster. Irrespective of which node a given message is received at, the Cluster must deliver the message to the appropriate node for delivery to the relevant session manager, or to the appropriate s2s instance.
To keep this going, a reliable routing protocol needs to be established. This will also encourage other Jabber server implementors to create compatible cluster implementations, such that a cluster can be made up of diverse code bases.
Desired features of a Routing Protocol
* Fast convergence in announcements
When a JID connects to a given node, this information must be quickly propagated to every other node in the cluster, so that messages for that JID are delivered appropriately, rather than being saved to offline storage. The same applies for connections to external domains, as the dreaded /. effect may result in multiple machines within the cluster attempting connections to the same remote domain, which may take offense at being hit from so many points.
* Fast convergence in withdraws
When a JID disconnects from a given node, this information needs to travel reasonably quickly to the other nodes, but it is not a problem if it is delayed; offline storage will take care of it. However, when a connection to a given domain is lost, valuable time is then lost in re-establishing the connection, which may result in the original sender retrying.
* Fast detection of 'dead' peers
This is an extension of the convergence topic. When remote peers are confirmed dead, this information should be very quickly distributed around the individual machines within the mesh, so that messages aren't sent to nodes which aren't connected.
* Loose structure of mesh layout and density
There should be no limitations on the logical layout of the cluster. Some clusters may be small enough to have a fully-interconnected mesh. Others may have a dedicated set of central routing machines handling the inter-cluster grunt work in a semi-star configuration. Or token-ring-like may be the configuration. The routing protocol should not restrict the design of the cluster mesh, as we simply do not know the intention of each site installation.
* Low overhead
The routing protocol itself should not cause excessive load on the network infrastructure or the local machine.
* Elimination of Black Holes
Although, strictly speaking, not part of the routing protocol itself, the clustering component should avoid sending packets to known black holes, or other nodes or destinations which aren't there.
* Seperation of Routers and Routing
The job of deciding where packets should be sent to should be able to be seperate from the job of handing packets where they should go to.
* Multiple Routes; choose 'best'
Best path selection, and JID priorities should be observed.
Basis for the Cluster Routing Protocol
Certain design features of both BGP and OSPF will be used, in particular the hop-by-hop propagation and loop detection, and the link metric methods.
Overkill?
This may be overkill for a lot of installations, where a simple full mesh would easily do the trick. However, an additional intent is to provide an extreme amount of scalability to a jabber installation, which will require the more advanced features.
Formal JEP
No formal JEP has been started as yet. (20050425)