- Enterprise Architecture
- Topics
- WS-Reliable Messaging
,
- WS Standards
,
- WSO2
,
- Messaging
,
- Web Services
,
- SOA Platforms
,
- SOA
,
- Architecture
,
- Enterprise Architecture
Introduction
The OASIS WS-RX Technical Committee recently released the Web
Services Reliable Messaging 1.1 specification for public review. As one
of the two co-chairs of the committee, this seemed like a really good
time to provide an introduction to WSRM and an overview of the
specification. This article provides an introduction to the
specification and talks about how it might be used in real systems. It
is based on the WSRM 1.1 Committee Draft 4 which is available for
public review.
Web Services Reliable Messaging (WSRM) is a specification that allows
two systems to send messages between each other reliably. The aim of
this is to ensure that messages are transferred properly from the sender
to the receiver. Reliable Messaging is a complex thing to define, but
you can think about WSRM as providing a similar level of guarantee for
XML messaging that a JMS system provides in the Java world. There is one
key difference though - JMS is a standard API or programming model,
with lots of different implementations and wire-protocols underneath it.
WSRM is the opposite - a standard wire-protocol with no API or
programming model of its own. Instead it
composes with existing
SOAP-based systems. Later in the article I will address the exact
meaning of reliability and what sort of guarantees the specification
offers.
Agents
Before I explain the wire protocol, I'd like to explain the way it
fits into an existing SOAP interaction. Unlike a queue-based system,
WSRM is almost transparent to the existing applications. In a
queue-based system, there is an explicit third party (the queue) where
messages the sender must
put messages and the receiver
get
messages from. In RM, there are handlers or agents that sit inside the
client's and server's SOAP processing engines and transfer messages,
handle retries and do delivery. These agents aren't necessarily visible
at the application level, they simply ensure that the messages get
re-transmitted if lost or undelivered. So if, for example, you have set
up a SOAP/JMS system to do reliable SOAP messaging, you will have had to
define queues, and change the URLs and endpoints of the service to use
those queues. In WSRM that isn't necessary, because it fits into the
existing HTTP (or other) naming scheme and URLs.
In WSRM there are logically two of these agents - the
RM Source (RMS) and the
RM Destination (RMD). They may be implemented by one or more handlers in any given SOAP stack.
The RM Source:
- Requests creation and termination of the reliability contract
- Adds reliability headers into messages
- Resends messages if necessary
The RM Destination:
- Responds to requests to create and terminate a reliability contract
- Accepts and acknowledges messages
- (Optionally) drops duplicate messages
- (Optionally) holds back out-of-order messages until missing messages arrive
It is important not to confuse the Source and Destination with the
"service client/requester" and "service server/provider". In a two-way
reliable scenario (where both requests and responses are delivered
reliably) there will be an RMS
and an RMD in the client, and the same in the server.
Wire Protocol
The main concept in WSRM is that of a
Sequence. A
sequence can be thought of as the "reliability contract" under which the
RMS and RMD agree to reliably transfer messages from the sender to the
receiver. Each sequence has a lifetime, which could range from being
very short (I create a sequence, deliver a few messages, and terminate)
to very long. In fact the default maximum number of messages in a
sequence is 2^63, which is equivalent to sending 1000 messages a second
for the next 292 million years!
A Sequence is created using a
CreateSequence interaction, and terminated when finished with a
TerminateSequence interaction.
Example of a CreateSequence message:
<soap:body>
<wsrm:createsequence>
<wsrm:acksto>
<wsa:address>http://Business456.com/serviceA/789</wsa:address>
</wsrm:acksto>
</wsrm:createsequence>
</soap:body>
Each message in a sequence has a
message number, which starts at one and increments by one for each message.
Example of a Sequence Header and message number:
<soap:header>
<wsrm:sequence>
<wsrm:identifier>http://Business456.com/RM/ABC</wsrm:identifier>
<wsrm:messagenumber>1</wsrm:messagenumber>
</wsrm:sequence>
</soap:header>
The message number is used to
Acknowledge the message in an
SequenceAcknowledgement header.
Example of a SequenceAcknowledgement Header:
<soap:header>
<wsrm:sequenceacknowledgement>
<wsrm:identifier>http://Business456.com/RM/ABC</wsrm:identifier>
<wsrm:acknowledgementrange lower="1" upper="1" />
<wsrm:acknowledgementrange lower="3" upper="3" />
</wsrm:sequenceacknowledgement>
</soap:header>
Example One-Way Scenario
Let's walk through a simple example. For simplicity we will add
reliability to a one-way interaction so in this case there is just an
RMS in the client and just an RMD in the server. After this I'll talk
through some of the options.
- The client wants to send an application message, so the the RMS
first sends a CreateSequence message to the same URL as the application
messages go to, and
- The RMD intercepts the message and responds with a CreateSequenceResponse. This includes the all important SequenceID which is the identifier by which this sequence will be known
- The RMS now adds a Sequence header into the original application message. This has the SequenceID and the message number (in this case it will be 1).
- The RMS continues to add incrementing Sequence headers into application messages.
- The RMD delivers these messages to the server application,
maintaining any guarantees that it offers, such as exactly-once and
in-order
- According to its timing policy, at some point the RMD will send SequenceAcknowledgements back to the RMS. When an RMS creates a sequence, it passes an address for acknowledgements (the AcksTo
address) to the RMD. In this particular scenario, we will assume that
the AcksTo address is the WS-A anonymous URI - which implies you use the
transport backchannel. In this case the RMD will send the
acknowledgement on the HTTP response channel. Because this is a one-way
interaction, there is no SOAP envelope flowing back to the client, so
the RMD will create an empty SOAP envelope, add the header, and return
it on the HTTP response. The RMS will pick this up before it gets to the
client application.
Note that the acknowledgement isn't just for one message, it acknowledges all the messages successfully received by the RMD.
- If there are any missing messages, the RMS will resend those
- Once the RMS has had all the messages that it has sent acknowledged, it can terminate the sequence. To do this is sends a TerminateSequence message to the RMD.
- The RMD responds to the RMS with a TerminateSequenceResponse, and
- That's all folks!
Actually, spelt out in that level of detail it seems like quite a
lot, but if we recap, there were two extra service calls (Create and
Terminate), and then a few extra headers floating around. I don't think
that is unnecessary overhead. At one point an early draft of the spec
had an inline or implicit CreateSequence. Unfortunately, that left the
first message in doubt. The current design means that once you have
successfully created a sequence, you have a "contract" with the other
end to deliver messages. In most implementations, if no
TerminateSequence is sent the sequence will be timed out automatically.
And of course, you do get extra message flows if messages are lost, as
in that case they will have to be resent.
So what could have gone differently? In other words, what options are there?
Well firstly, the acknowledgements don't have to use the backchannel.
The RMS can open up its own HTTP port (or other endpoint) to receive
acknowledgements on. This is specified in the AcksTo address. If the
AcksTo address is the same as the WS-Addressing ReplyTo address, the RMD
may piggyback acknowledgements in response messages flowing back to the client in some circumstances.
Secondly, the RMD doesn't have to acknowledge the messages it has
received. Instead, if it is missing just one message in a million, it
can
Nack just the missing message. This is like a
prompt to the RMS saying, I'd really like this missing message. Thirdly,
the RMS could have requested an acknowledgement. Suppose the RMD is set
to only acknowledge rarely (minimizing extra bandwidth), but the RMS
wants to clean up its store of messages, then it can ask for an
acknowledgement by adding an
AckRequested header. The RMD will respond immediately with a SequenceAcknowledgement.
Closing a sequence
The other thing that could have been different is that maybe for some
reason the RMS might decide to shut down the sequence before all
messages are delivered. Why? Maybe my server is being closed down and I
want to clean up in an orderly manner, or maybe there is one message
that In this case, its tricky. Once I terminate the sequence, I can't
ask for an acknowledgement, because the RMD will have cleared its state.
If I ask for an acknowledgement first and then terminate, I might not
get a true picture - maybe some extra messages might end up being
delivered after I receive the SequenceAcknowledgement but before the
Terminate happens. Arggg.
Well, we thought of this. So, we added the ability to Close a
sequence. This basically is an extra interaction that allows the RMS to
say that it won't be delivering any more messages. The RMD then responds
with a Final sequence acknowledgement showing the ultimate state of
delivered messages. After that its ok to terminate the sequence.
Request/Response
In the case of request response, there is very little difference,
except that there is a sequence in each direction. The sequences are
independent - so there is no linkage between transmission of the
messages on one sequence with transmission of the messages on the other
sequence. The only "linkage" is that you can optimize the creation of
the two sequences by sending an Offer of a return sequence in the
outgoing CreateSequence.
Imagine you are a client and it is clear that there will be a two-way
reliable connection. In that case the client can create a sequence and
Offer it to the server for responses. Effectively this lets you create
two sequences in one message exchange. However, after that the sequences
are independent: for example you can terminate one and still use the
other.
Firewall crossing
Most internet users can't just start up an HTTP server on their
machines and have other systems connect in. The problem doesn't come
with running an HTTP server - that's simple. The real problem comes with
getting the packets to your machine. For example, many home users have a
broadband router/firewall that performs Network Address Translation.
Without complex configuration these will drop all inbound packets.
Similarly if I walk into a coffee shop and use the wireless LAN, I have
the same problem - my IP address isn't globally accessible. Why do we
care? Well, if I just want to do one-way reliable, then this doesn't
matter. In fact, in the example above we showed how it works. By
piggybacking the acks on the HTTP response flow, everything works just
fine. But if I have a request-response flow, things change.
Suppose a response goes missing. The server wants to resend that
message to the client. But the client isn't addressable. There is no
open connection to resend the message on, and no way of the server
opening one. Help!!!
MakeConnection to the rescue
MakeConnection is a simple one-way message that logically flows from
the client to the server. By opening up an HTTP connection, this allows
the server to respond with any "queued" or waiting messages that need to
be transmitted to the client. Effectively the client "polls" the server
every once in a while for any waiting messages. If you think about this
carefully, you will see that this message flows from the RMD to the
RMS, because it is designed for the return (response) path. Effectively
the client's RMD is asking the server's RMS if there are any messages
waiting. Of course, the client has to identify itself to make this
happen. There are two options in MakeConnection. One is to modify the
WS-Addressing headers to use a special URI that includes a unique ID.
This is really there for complex scenarios. For simpler scenarios, the
following approach works well:
- Client creates a sequence and offer's a sequence at the same time
- Client sends requests, ideally receives response on the backchannel
- For some reason, some responses are timed out or connections lost
- Client initiates MakeConnection, passing the Sequence identifier of the offered sequence
- Server responds with missing message, plus a flag to indicate if more are waiting
- Once no more messages are waiting the client can terminate the sequences
Security
In many ways RM just plugs in with whatever other security model is
already in place. However, there are some issues that need watching out
for. In particular, there is the possibility of a "sequence attack". In
this model, imagine there are two valid "clients" each with a sequence.
Both are authorized at the service level, but one of the clients is
actually a maverick, and he wants to attack the other sequence. If he
can guess (or sniff) the sequence identifier, then he can start a Denial
of Service attack, by for example, terminating the sequence. So the RM
spec addresses how to associate the sequence with a particular
credential or security session. This means that the RM agent can protect
against this kind of attack. This is particularly important with
MakeConnection, because otherwise an unauthorized user could retrieve
messages destined for another system.
WSRM Policy
As well as the core spec, the TC has published a Policy Assertion Language for WSRM that can be used with the
WS-Policy Framework
model. In the previous spec (1.0) the policy model was fairly complex.
There were a number of timing parameters that were published in WSRMP.
Firstly the TC decided a number of these were "unhelpful" as they tied
the parties to using static timing models instead of dynamically
adjusting them. Secondly, it was felt that it would be better to have
any remaining timing agreed during the CreateSequence. This means that
WSRM can be used very successfully without needing to use WS-Policy. Now
WS-Policy is simply used to signal whether WSRM is optional or required
on a given endpoint.
So what does Reliability mean anyway?
Are you still reading? Congratulations on making it this far! Well
we've covered the protocol in a reasonable amount of depth. Now let's
step back and see what it actually gives us! The first question that
challenges people about WSRM is: "What level of reliability do I get?".
And the answer isn't that simple, unfortunately. WSRM was designed as a
wire protocol not as an end-to-end application level protocol. There are
two reasons for this. One is that the Web Services standards (WS-*) are
generally designed to cover the externally visible view of a service
and not the implementation, to promote the concept of loose-coupling.
The second reason is
composability: to provide end-to-end
reliability you need to have some kind of transaction manager associated
with the application. Because there are other WS-* specifications that
cover transactions, and different ways of implementing transactions, it
doesn't make sense for this specification to cover that aspect. This is a
thorny issue that comes up every time I discuss WSRM with customers or
potential users, who are looking for much more of a plug-in replacement
for existing messaging systems that tightly integrate with transactional
applications.
The guarantee that WSRM - by itself - offers, is simply that the
message was successfully transferred from the RMS to the RMD and that
the RMD acknowledged it. Different implementations can have different
guarantees behind this. For example,
Apache Sandesha2,
an open source implementation of WSRM, has a pluggable storage manager.
This means that you can have a persistent store behind the RMD, so the
acknowledgement is only sent when the message has been written to disk.
This means that Sandesha can support server failure and restart. The
WSO2 Tungsten server supports this model of operation.
The previous specification (WSRM 1.0) specifically talked about
delivery assurances such as AtLeastOnce, AtMostOnce, ExactlyOnce and
InOrder. However, these assurances are really guarantees between the RMD
and the application, not across the wire. So as a committee, we removed
these from the specification. We still expect implementations to offer
these levels of assurance, but they are part of the implementation not
the wire protocol.
Programming model and implications
If you are a JMS or messaging developer, you will be used to learning
a programming model (PM) for reliable messaging, such as JMS. So WSRM
might come as a shock to you, because it can be used
without any new programming model.
Of course its hard to generalize, because each implementation can have
its own approach, but the core spec doesn't imply any particular PM. For
example, Sandesha allows you to turn on RM. If there is no sequence in
place, it automatically creates one, and then when no more messages are
being sent, it times out and terminates the sequence. The fact that the
RMS and RMD are just "handlers" in the chain of processing also means
that there are no new "visible entities" such as queues that need to be
configured or that show up in the client code - the RM infrastructure
can share the same URIs that the existing Web Service uses. So WSRM can
be added into an existing Web Services interaction with no extra
application code. (By the way, Sandesha also has a full programming API
that gives access to sequences if users wish to hand-code the RM
behaviour).
Despite this transparentness, it is worth thinking about the
implications on coding. Many recent Web Service stacks and APIs,
including Microsoft WCF (Indigo), JAX-WS, and Apache Axis2, offer the
ability to call a Web service asynchronously (non-blocking). In this
model, instead of the client blocking until the response comes back, the
client passes a callback object in when it makes the outbound call.
Processing then continues on the client thread, and when the response
comes back a separate thread handles passes the response to the callback
handler.
This style of programming is very important for WSRM, because it
means that even if the server goes down, RM can resend the request and
response messages until the response is received. With a blocking call,
at some point the client would timeout, leaving the reliable response
"orphaned" - properly delivered back to the client but without any code
available to process it. So in general, if you think you might use RM,
it makes sense to write clients using this non-blocking approach. (Its
actually good practice anyway: imagine a web application server that is
making calls out to a third-party using Web services; if too many
requests are blocking waiting for responses the server's thread pool
would end up exhausted and the server couldn't handle incoming
requests).
History and differences from the existing 1.0 specification
WS ReliableMessaging dates all the way back to March 2003, when it
was originally published. In June 2005 the 1.0 specification was
submitted to OASIS for standardization. The current draft reflects a
number of changes from the 1.0 spec. Without listing all of them I can
summarize the main changes:
- Namespace changes Since the specifications have
significant changes they are not compatible at the wire-level. The 1.1
spec has a different set of namespaces reflecting the ownership by OASIS
- Cleanup The TC really worked through the
specification with a fine-toothed comb, and found many small issues
ranging from potential errors to potential problems interoperating.
- Addition of CloseSequence As discussed above,
there are cases where it is necessary to close an incomplete sequence,
and CloseSequence allows that to happen cleanly
- Removal of LastMessage The 1.0 spec had a marker on a message to indicate it was the last message, which was largely superfluous.
- Improved security composition The original spec
had very specific composition with WS-Security/WS-SecureConversation.
The 1.1 spec has a much more flexible approach that also supports
composition with SSL/TLS based security sessions.
- Updated to use the W3C WS-Addressing Recommendation The 1.1 spec uses the recommended version of WS-Addressing from the W3C.
- Simplification of WSRM-Policy The published
policy assertion is much simpler - basically is RM on or optional. The
previous spec had a number of timing parameters which would not allow
for dynamic adjustment of the protocol, so they were removed, or moved
into the CreateSequence.
- Support for two-way reliability with firewall crossing The MakeConnection support was added in the 1.1 spec
Implementations
There are a number of implementations of the existing WSRM 1.0
specification, including Microsoft WCF (formerly known as Indigo), and
Apache Sandesha2. The OASIS WSRX TC hosted an interop based on the last
Committee Draft earlier in 2006, and 5 companies turned up with
implementations. Although the interop didn't produce 100% coverage,
three companies managed to interop fully between their implementations
in all scenarios. The TC is hosting a second interop during the public
review period, to fully test the implementations on the latest
specification. We are also expecting more companies to take part this
time.
Summary
In this article we've covered a lot of ground, from the overall model
down to the main elements of the wire protocol. There are more
complicated scenarios I haven't covered, and I encourage you to read the
spec itself to understand the nuances, but I hope its been useful. I'd
like to finish off by looking at some of the potential uses I see for
WSRM, and some of the ideas that customers have talked to me about.
- B2B messaging A number of people see WSRM
playing a key part in business to business links. Many companies are
looking for a low-cost simple way of ensuring that orders, invoices,
etc. are reliably and securely transmitted over the Internet to
partners. WSRM is an ideal technology to provide the reliability for
those links.
- Internal department-to-department or server-to-server links
WSRM is also a very useful protocol inside the enterprise. More and
more companies are developing and using Web services and XML
communications internally, and as those links become "line-of-business"
WSRM will become a key technology to ensure reliability.
- JMS replacement Some companies are looking at
WSRM as a long-term replacement for existing proprietary JMS systems.
The next release of Windows, Vista, will include WSRM support built-in.
That makes it tempting if companies have currently got to install
proprietary JMS clients on many workstations.
- JMS bridge You could use WSRM as a standard protocol to bridge between two different JMS implementations. The Apache Synapse open source project is designed to help you do this, amongst other things.
- Browser-based scenarios and notifications As
AJAX applications get more interesting, the idea of doing reliable
messaging directly from a browser becomes pretty exciting, especially if
you were building, for example, an AJAX trading application. At least one effort
is creating a plug-in for the Firefox browser that supports a
SOAP-based AJAX model. RM support is coming and will make it very simple
to create reliable AJAX applications. Since AJAX already uses a
non-blocking asynchronous approach it is ideally suited to being
composed with WSRM. The ability to cross firewalls using the
MakeConnection facility also means that RM can be used without the
client needing to open ports. This approach can also be used to support
subscriptions, where the browser makes a single request (subscribe) and
receives multiple responses (notifications) back using MakeConnection.
All in all, I see a bright future for WSRM. Its taken a while to pull
together all the companies and the technology into a single approach,
but we are making good progress, and the public review of the
specification is a major milestone on that path