Lost in Time!!!: Clustering WSO2 WSAS

This document describes the WSAS clustering functionality and demonstrates it using examples.

Content

Why Web Services Clustering?

Any production deployment of a server has to fulfil two basic needs. They are high availability and scalability. High availability refers to the ability to serve client requests by tolerating failures. Scalability is the ability to serve a large number of clients sending a large number of requests without a degradation of the performance. Almost all successful server products support the above two features to some extent.

Another requirement that pops up naturally with the above two features is the ability to manage clustered deployments from a single point. There is no exception when it comes to Web services servers. Many large scale enterprises are adapting to Web services as the de facto middleware standard. These enterprises have to process millions of transactions per day, or even more. A large number of clients, both human and computers, connect simultaneously to these systems and initiate transactions. Therefore, the servers hosting the Web services for these enterprises have to support that level of performance and concurrency. In addition, almost all the transactions happening in such enterprise deployments are critical to the business of the organization. This imposes another requirement for production ready Web services servers. That is to maintain very low downtime. Another booming trend in the software industry is the AJAX based user interfaces. Such Web interfaces usually depend on a Web services based back-end. Therefore, Web services servers that provide the business logic for AJAX interfaces should exhibit the scalability of pure web servers used for serving pure HTTP requests.

It is impossible to support that level of scalability and high availability from a single server despite how powerful the server hardware or how efficient the server software. Web services clustering is needed to solve this, which allows to deploy and manage several instances of identical Web services across multiple Web services servers running on different server machines. Then we can distribute client requests among these machines using a suitable load balancing system to achieve the required level of availability and scalability. WSAS supports clustering out of the box, by providing functionalities like configuring clustered instances from a single point and replication of configurations as well as client sessions. We will dive into WSAS clustering features in the next sections.

WSAS Clustering

WSAS clustering is based on the Tribes group management system. It provides group membership handling and group communication for clustered WSAS instances. Although WSAS ships with this build in Tribes based implementation, other clustering implementations based on different group management systems can be plugged in easily. The only thing you have to do is to implement a set of clustering interfaces provided by WSAS. The WSAS clustering functionality can be divided into two categories as described below.

Node Management

Node management provides functionality to configure and maintain multiple WSAS instances running on different servers identically. There can be a large number of WSAS instances in clustered deployments. We have to make sure that all of those instances have the same run time parameters and the same set of services deployed. If we want to make a change in the configuration or deploy a new service, we should apply that change to all the WSAS instances at the same time. Failure to do so would result in unpredictable behaviours. For example, assume that we want to deploy a new version of a service, and we couldn't deploy the new version in all clustered instances at the same time. Then a client using that service may be directed to the new version and to the old version of the service in successive requests. WSAS handles this by using a URL based central repository and a command line admin tool to issue cluster wide commands. Figure 1 depicts the basic node management behaviour of WSAS.

Figure 1: WSAS Node management

As shown in figure 1, all the clustered WSAS instances access configuration files and service archives from a central repository. Each WSAS instance has to be configured to point to this repository at deployment time. In a production environment, this repository should be hosted in a web server (e.g., Apache HTTPD) and WSAS instances can access it via HTTP by using the URL of the repository. It is also possible to use a local file system repository for testing purposes. In that case all the WSAS instances should run on the same computer and they can access the repository by using the file system path of the repository. In either case, we can maintain a single repository for the entire cluster. Therefore, configuration and services in the repository are immediately applied to all WSAS instances of the cluster upon start up.

We should also be able make changes to the cluster at runtime. For example, we may want to add a new service, replace an old version of a service with a new version, or make changes to the configuration parameters of WSAS instances. WSAS provides an admin tool for performing such operations. It is a command line tool, where clustering commands can be issued as command line parameters. Administrators can connect to one node in the cluster and issue the required commands using this tool. Then the connected node replicates the command to all other nodes in the cluster. WSAS clustering layer takes care of applying such commands consistently and transactionally. The two phase commit protocol is used for achieving the above properties.

Let's consider the deployment shown in figure 1 and let's assume the administrator wants to deploy a new service named service A in the cluster. First, the administrator copies the service archive containing service A to the shared repository. Then he connects to the node management web service named NodeManagerService of Node 1 and issues the service deployment command. Then Node 1 becomes the coordinator for this transaction and executes the two phase commit protocol as listed below:

Commit-request phase

Node 1 sends a command to load service A from the shared repository.
All nodes try to load service A. Each node sends a success message to Node 1 upon successful loading of the service. If a node has failed to load the service, it sends a failure message to Node 1.
If Node 1 receives a failure message from any node, it sends a message to all the nodes to abort the action.
If Node 1 receives success messages from all the nodes, it sends the prepare message to all the nodes in the cluster.
In response to that, all the nodes try to block client requests temporally. All client requests will be responded with a notification message describing the server reloading state. This is required to allow clients to adjust existing transactions with the server's state change.
As in the previous step, each node sends a success message upon successful blocking of client requests. Any failure is notified to Node 1.
If Node 1 receives a failure message from any node, it sends a message to all the nodes to abort the action.
If Node 1 receives success messages from all the nodes, Node 1 sends a message indicating successful completion of phase 1 to the admin tool.
Admin tool informs this to the user.

Commit phase

The user should issue the commit command to start the second phase. Node 1 sends this commit message to all the nodes in the cluster.
All nodes try to deploy the service A, loaded in the first phase.
Each node sends a success message to Node 1 upon successful deployment of service A. Any failure is also notified to Node 1.
If Node 1 receives a failure message from any node, it sends a message to all the nodes to rollback the action.
If Node 1 receives success messages from all the nodes, Node 1 sends a message to all nodes to start serving clients.

The commands supported by the admin tool are listed below:

Reload configuration:

Linux: admin.sh --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager --operation reloadconfig
Windows: admin.bat --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager --operation reloadconfig

Reloads the configuration from the axis2.xml in the shared repository. This includes the changes in global parameters and global module engagement details.

Loading new service groups

Linux: admin.sh --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager 
                                     --operation loadsgs --service-groups <service-group1>,<service-group2>,...
Windows: admin.bat --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager 
                                     --operation loadsgs --service-groups <service-group1>,<service-group2>,...

Loads and deploys the specified service groups in all the clustered nodes. Service archives containing required service groups have to be available in the shared repository prior to issuing this command.

Unloading service groups

Linux: admin.sh --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager
                                     --operation unloadsgs --service-groups <service-group1>,<service-group2>,...
Windows: admin.bat --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager 
                                     --operation unloadsgs --service-groups <service-group1>,<service-group2>,...

Unloads previously deployed service groups from all the nodes in the cluster.

Apply service policy

Linux: admin.sh --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager 
                                     --operation applypolicy --service <service> --policy-file <policy.xml>
Windows: admin.bat --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager 
                                     --operation applypolicy --service <service> --policy-file <policy.xml>

Applies the policy defined in the policy.xml file to the specified service.

All the above commands only execute the first phase of the two phase commit protocol. Once that phase is complete, the administrator should issue the commit command using the admin tool to start the second phase. The syntax of the commit command is given below:

Linux: admin.sh --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager --operation commit
Windows: admin.bat --username admin --password admin --epr https://<ip>:9443/services/Axis2NodeManager --operation commit

The commit phase is executed separately as another command so that it is possible to write command line scripts by combining multiple admin commands as well as operating system specific commands. Such scripts can be written to issue the commit command after completing all the other commands. Thus all those admin commands belong to a single transaction, which can be rolled back if any of those failed. For example, we can write a script to load service groups A and B, then apply policy X to service C and apply policy Y to service D.

As we can deploy and maintain a cluster of identical WSAS instances using the above features, it is possible to support high availability and scalability for stateless services by only using the node management functionality of WSAS clustering.

Session State Replication

The ability to store session specific data in Web services is an important feature, specially when the business logic is written in the Web services itself. This might not be very useful when Web services are used just as wrappers for existing legacy systems. WSAS has built in support for Web service sessions as the former case is increasingly becoming popular. This introduces another requirement for clustering. That is to replicate session data among all the nodes in the cluster. WSAS stores session data in three levels. Data that should only be available to a service is stored in the ServiceContext. Common data for all the services in a service group is stored in the ServiceGroupContext. Common data for all the service groups is stored in the ConfigurationContext. In clustering deployments, WSAS replicates session data found in all these levels among all the nodes in the cluster. This ensures that all the nodes are identical in terms of session data as well.

High availability

Figure 2: WSAS high availability cluster

Session replication in WSAS can be used to achieve high availability for stateful services as shown in figure 2. All the client requests are always directed to Node 1, which is elected as the primary server by the load balancing system. Node 2 and Node 3 are kept as back up servers and none of the requests will be directed to them under normal operations. All three nodes are configured as a WSAS cluster, so that session state changes in Node 1 will be replicated to Node 2 and Node 3.

Now if Node 1 fails, the load balancing system elects Node 2 as the primary server and the other two are considered as backups. From that point, client requests are directed to Node 2, but the client does not notice any change or data loss, as all the session data is available in Node 2 as well.

Scalability

Figure 3: WSAS scalability cluster

By combining the WSAS clustering and a smart load balancing system, it is possible to achieve high availability as well as scalability. Such deployment is shown in figure 3. Two WSAS clusters are created in this scenario. Node 1 and Node 2 belong to the first cluster and Node 3 and Node 4 belong to the second cluster. Once a client sends a request through the load balancing system, it binds that client to a particular cluster. From that point onwards, all the requests from that client will always be directed to the same cluster. If a primary node of a cluster failed, the load balancing system should elect a backup node of that cluster as the primary node and direct requests to that. Thus, scalability is achieved as a growing number of clients can be handled simply by introducing new clusters, and the high availability is achieved as each cluster can have backup nodes with an identical configuration and session state.

WSAS clustering currently does not support distributed locking for session data replication. Therefore, we have to deploy primary backup clusters for stateful services as mentioned above. This restriction does not apply to stateless services and you can direct client requests to any node in the cluster for such services.

WSAS Clustering Configuration

WSAS clustering is configured using the axis2.xml file. As all instances of a WSAS cluster can be configured to load this file from the shared repository, initial clustering configuration can be done by editing a single file. The default clustering configuration ships with WSAS is listed below:

<cluster class="org.apache.axis2.clustering.tribes.TribesClusterManager">
    <parameter name="AvoidInitiation">true</parameter>
    <parameter name="domain">wso2wsas.domain</parameter>
    <configurationManager
            class="org.wso2.wsas.clustering.configuration.WSASConfigurationManager">
        <parameter name="CommitTimeout">20000</parameter>
        <parameter name="NotificationWaitTime">2000</parameter>
        <listener class="org.wso2.wsas.clustering.configuration.WSASConfigurationManagerListener"/>
    </configurationManager>
    <contextManager class="org.apache.axis2.clustering.context.DefaultContextManager">
        <listener class="org.apache.axis2.clustering.context.DefaultContextManagerListener"/>
        <replication>
            <defaults>
                <exclude name="local_*"/>
                <exclude name="LOCAL_*"/>
                <exclude name="wso2tracer.msg.seq.buff"/>
                <exclude name="wso2tracer.trace.persister.impl"/>
                <exclude name="wso2tracer.trace.filter.impl"/>
            </defaults>
            <context class="org.apache.axis2.context.ConfigurationContext">
                <exclude name="SequencePropertyBeanMap"/>
                <exclude name="WORK_DIR"/>
                <exclude name="NextMsgBeanMap"/>
                <exclude name="RetransmitterBeanMap"/>
                <exclude name="StorageMapBeanMap"/>
                <exclude name="CreateSequenceBeanMap"/>
                <exclude name="WSO2 WSAS"/>
                <exclude name="wso2wsas.generated.pages"/>
                <exclude name="ConfigContextTimeoutInterval"/>
                <exclude name="ContainerManaged"/>
            </context>
            <context class="org.apache.axis2.context.ServiceGroupContext">
                <exclude name="my.sandesha.*"/>
            </context>
            <context class="org.apache.axis2.context.ServiceContext">
                <exclude name="my.sandesha.*"/>
            </context>
        </replication>
    </contextManager>
</cluster>

The class attribute of the cluster element specifies the main class of the clustering implementation. This class should implement the org.apache.axis2.clustering.ClusterManager interface. As mentioned earlier, the WSAS built-in clustering implementation is based on Tribes. Therefore, the Tribes based ClusterManager implementation is specified by default. There are two top level parameters in the configuration. The AvoidInitiation parameter specifies whether the clustering should be initialized automatically on start up. By default this is set to True, which will not initialize the clustering on start up. WSAS will call the initialization mechanism appropriately and users are not supposed to change the value of this parameter. The domain parameter defines the domain of the cluster. All the nodes with the same domain name belongs to the same cluster. This allows us to create multiple clusters in the same network by specifying different domain names. Apart from these, there are two major sections in the configuration. They are the configurationManager and the contextManager.

The configurationManager section configures the node management activities of the cluster. The configurationManager element's class attribute specifies the class implementing the org.apache.axis2.clustering.configuration.ConfigurationManager interface. It should support all node management activities described earlier. There is an associated listener implementation for the configurationManager, which implements the org.apache.axis2.clustering.configuration.ConfigurationManagerListener interface. It should listen for node management events and take appropriate actions. Default implementations of these classes are based on Tribes. Configuration commands are applied using the two phase commit protocol as mentioned in the node management section. According to that, all nodes block client requests in the "prepare" step and wait for the "commit" command to apply the new configuration. But, if for some reason the "commit" command is not issued, all nodes block the client requests forever, making the entire cluster useless.

The CommitTimeOut parameter is introduced to handle this scenario. Nodes wait for the "commit" command, only for the time specified in the CommitTimeOut parameter. If the "commit" command is not issued during that time, all nodes rollback to the old configuration. The NotificationWaitTime parameter specifies the time for the coordinator node to wait for success/failure messages from other nodes. If any node fails to send a success or a failure message for a particular command within this time, the coordinator node assumes that the node has failed to perform the command, and the coordinator node sends an error message to the admin tool describing the failure to execute the command.

Session data replication is configured in the contextManager section. The class attribute of the contextManager element specifies the implementing class of the org.apache.axis2.clustering.context.ContextManager interface. There is an associated listener class implementing the org.apache.axis2.clustering.context.ContextManagerListener interface. This class is specified in the class attribute of the listener element. As in other implementations, these two classes are also based on Tribes by default. Data to exclude in the replication process can be specified in the replication element. As mentioned in the session replication section, session data is replicated for three context types. We can specify which data to exclude in each of these context types by listing them under the appropriate context element. Each context element can have one or more exclude elements. The name attribute of the exclude element specifies the name of the property to exclude. The default element of the replications section contains the data to exclude from all context types. It is possible to specify complete property names or the prefix or suffix of property names. Prefixes and suffixes are specified using the asterisk ( * ) character. For example, according to the above configuration, all session data beginning with the name my.sandesha. in service contexts and service group contexts will not be replicated. All the session data beginning with names local_ and LOCAL_ will not be replicated in all three contexts.

Now we have explored enough background information about WSAS clustering. It is time to see it in action.

WSAS Clustering in Action

Configuring the Cluster

Download the WSAS binary distribution by following the installation guide. As we are going to set up a WSAS cluster for the demonstration purpose, we are going to host all the instances in the same computer. Extract the WSAS distribution to two directories, so that we can start two instances. We refer to these two instances as Node 1 and Node 2. The next step is to configure the URL repository for both nodes. Open the server.xml of Node 1 and change the repository location to http://localhost/repository/. Please make sure that the WSAS repository is accessible from the given URL. Then change the configuration file location to point to the axis2.xml of Node 1. The section of the server.xml file containing the changed locations is shown below:

<Axis2Config>
        <!--
             Location of the Axis2 Services & Modules repository

             This can be a directory in the local file system, or a URL.

             e.g.
             1. /home/wso2wsas/repository/ - An absolute path
             2. repository - In this case, the path is relative to WSO2WSAS_HOME
             3. file:///home/wso2wsas/repository/
             4. http://wso2wsas/repository/
        -->
        <RepositoryLocation>http://localhost/repository/</RepositoryLocation>

        <!--
            Location of the main Axis2 configuration descriptor file, a.k.a. axis2.xml file

            This can be a file on the local file system, or a URL

            e.g.
            1. /home/wso2wsas/conf/axis2.xml - An absolute path
            2. conf/axis2.xml - In this case, the path is relative to WSO2WSAS_HOME
            3. file:///home/wso2wsas/conf/axis2.xml
            4. http://wso2wsas/conf/axis2.xml
        -->
        <ConfigurationFile>/home/wsas/node1/conf/axis2.xml</ConfigurationFile>

        <!--
          ServiceGroupContextIdleTime, which will be set in ConfigurationContex
          for multiple clients which are going to access the same ServiceGroupContext
          Default Value is 30 Sec.
        -->
        <ServiceGroupContextIdleTime>30000</ServiceGroupContextIdleTime>
</Axis2Config>

Now make the same changes for the server.xml file of Node 2. Make sure to set the configuration file location to the axis2.xml file of Node 2. Please note that we are setting different configuration file locations to the nodes to avoid port conflicts mentioned later. In production deployments, all nodes can be pointed to the same axis2.xml file located in the HTTP repository.

Now we are done with configuring the repository. Next we have to enable clustering for both nodes. By default, clustering is turned off to avoid additional overhead for individual deployments. Open the axis2.xml of both nodes and uncomment the clustering section. You may also change clustering properties to suit your deployment. However, the default configuration is sufficient for the demonstration.

There is one more step left as we are trying to run both nodes in the same machine. That is, we have to change the various ports opened by WSAS to avoid conflicts. As some of these ports are configured in the axis2.xml file, we have to use two axis2.xml files for the two WSAS instances, instead of sharing it from the central repository. However, we can share service archives from the central repository between both nodes.

Open the axis2.xml of Node 2 and change the HTTP transport receiver port to 9763. Then change the HTTPS transport receiver port to 9444. A portion of the axis2.xml file after changing the ports is shown below:

<transportReceiver name="http"
                       class="org.wso2.wsas.transport.http.HttpTransportListener">
        <parameter name="port">9763</parameter>
</transportReceiver>

<transportReceiver name="https"
                       class="org.wso2.wsas.transport.http.HttpsTransportListener">
        <parameter name="port">9444</parameter>
        <parameter name="sslProtocol">TLS</parameter>

Now open the server.xml file of Node 2 and change the command listener port to 6667. Portion of the server.xml file containing the command listener port is shown below:

<CommandListener>
       <Port>6667</Port>
</CommandListener>

We have completed configuring the WSAS nodes for clustered deployment. Now start both WSAS nodes using the wso2wsas.sh from the bin directory of Node 1 and Node 2.

Building a Stateful Service

Let's write a small service for the demonstration. This service has two methods called setValue() and getValue(). The setValue() method stores the value passed as the parameter in the service group context. The getValue() method retrieves that value from the service group context and returns it. If this service is deployed in any other scope than the request scope, this value is available between method invocations, making it a stateful service. Code of the service class is listed below:

public class Service1 {

    public OMElement setValue(OMElement param) {

        param.build();
        param.detach();

        String value = param.getText();
        MessageContext.
                getCurrentMessageContext().getServiceGroupContext().setProperty("myValue", value);

        System.out.println("==============================================================");
        System.out.println("Value: " + value + " is set.");
        System.out.println("==============================================================");

        param.setText("Value: " + value + " is set.");       

        return param;
    }

    public OMElement getValue(OMElement param) {

        param.build();
        param.detach();

        String value = (String) MessageContext.
                getCurrentMessageContext().getServiceGroupContext().getProperty("myValue");

        System.out.println("==============================================================");
        System.out.println("Value: " + value + " is retrieved.");
        System.out.println("==============================================================");

        param.setText(value);

        return param;
    }
}

We are going to deploy this in the SOAP session scope. Therefore, set the scope attribute to soapsession in the services.xml file.

<service name="service1" scope="soapsession">
   ...    
</service>

Now compile the service class and make a service archive named service1.aar from it.

Node Management Example

Open a browser and log in to the web console of Node 1. Go to the services section. You can see the currently deployed services. Now copy the service1.aar to the services directory of the shared repository. Then go to the bin directory of Node 1 using the command line and issue the following command:

./admin.sh --username admin --password admin --epr https://127.0.0.1:9443/services/Axis2NodeManager 
                                                               --operation loadsgs --service-groups service1

Once that command is completed, issue the commit command shown below:

./admin.sh --username admin --password admin --epr https://127.0.0.1:9443/services/Axis2NodeManager --operation commit

These two commands connect to Node 1 and orders it to deploy the service group "service1" in the cluster. Now view the services section of the web console again. You can see that the service1 is listed in the deployed services. Then login to the web console of Node 2 (note that the web console of the Node 2 runs on port 9444) and go to the services section. You can see that service1 is deployed in Node 2 as well.

Similarly issue the following unload command followed by the commit command.

./admin.sh --username admin --password admin --epr https://127.0.0.1:9443/services/Axis2NodeManager
                                                               --operation unloadsgs --service-groups service1

Then verify that the service1 is undeployed from both nodes by using the web consoles.

Session State Replication Example

Deploy the service1 in the cluster as mentioned in the previous section. Now we have a stateful service deployed in the cluster. We can verify the session state replication by setting some session data in Node 1 and accessing them from Node 2. This is possible as WSAS clustering layer replicates session data added to any node to all other nodes. We have to write a Web services client to set the value in Node 1 and access it from Node 2. This client code is listed below:

public class SessionClient {

    public static void main(String[] args) {
        new SessionClient().invoke();
    }

    private void invoke() {

        String repository = "/home/wso2/products/axis2/repository";
        String axis2xml_path = "/home/wso2/products/axis2/conf/axis2-client.xml";

        OMFactory fac = OMAbstractFactory.getOMFactory();
        OMNamespace ns = fac.createOMNamespace("http://services", "ns");
        OMElement value = fac.createOMElement("Value", ns);
        value.setText("Sample session data");

        try {
            ConfigurationContext configContext = ConfigurationContextFactory.
                    createConfigurationContextFromFileSystem(repository, axis2xml_path);            
            ServiceClient client = new ServiceClient(configContext, null);            

            Options options = new Options();
            options.setTimeOutInMilliSeconds(10000000);
            options.setManageSession(true);
            client.setOptions(options);

            // Set some session data in Node 1
            options.setTo(new EndpointReference("http://192.168.1.3:9762/services/service1"));
            options.setAction("setValue");
            OMElement response1 = client.sendReceive(value);
            System.out.println(
                    "Server response for setting the sample session data in Node 1: " + response1.getText());

            // Access the session data from Node 2
            options.setTo(new EndpointReference("http://192.168.1.3:9763/services/service1"));
            options.setAction("getValue");            
            OMElement response2 = client.sendReceive(value);
            System.out.println("Retrieved sample session data from Node 2: " + response2.getText());

        } catch (AxisFault axisFault) {
            axisFault.printStackTrace();
        }
    }
}

The above code first invokes the setValue() method of our sample service in Node 1 with the string "Sample session data". Then it invokes the getValue() method from Node 2 and displays the retrieved value. Compile and execute this code with required Axis2 jars in the class path. Please make sure that the "repository" string contains a path of a Axis2 repository and "axis2xml_path" contains a path for an axis2.xml file. You can see the following output in the console:

Server response for setting the sample session data in Node 1: Value: Sample session data is set.
Retrieved sample session data from Node 2: Sample session data

The session data "Sample session data" was set in Node 1 and retrieved from Node 2. You may also disable the clustering in the nodes and perform the above test again. You will see that this time session data cannot be accessed from Node 2.

We have completed the WSAS clustering guide. If you have more questions on WSAS clustering functionality, please feel free to ask them on WSAS user list: wsas-java-user@wso2.org.

Lost in Time!!!

Friday, September 9, 2011

Clustering WSO2 WSAS