Friday, September 9, 2011

Data Federation With WSO2 Data Service Server

What's Data?

If you're asked the question "What's Data?", what could possibly be your answer? Though it's somewhat a confusing question to answer, I would call data as a form of representing knowledge, experience, observations, statistics, facts, concepts, etc that could further be formatted or presented in an orderly manner to be used in decision making, pattern identification, etc. As per the definition given to it, it's clear that data could be one of the most obvious things out of the stuff that we deal with, in our day to day life and it also implies the fact that every bit of moment that we spend in our lives can be converted into some form of data.

Though the term "data" has such a generic meaning, we often use it to refer a small subset of data which is constrained by some context associated with it. For example, if we consider some organization engaged in some form of a business, that particular organization would mostly be interested in "data" related to the context of the business that they are involved in. Not only the organizations but also the individuals, applications and various other entities spanned across the world amass data of their own contexts to analyse and find certain patterns, keep track of the older activities, or do whatever the tasks they are interested in carrying out. There comes the need of having a mechanism to store those data in a meaningful manner together with the ease of access. The myriad data storage mechanisms ranging from Relational Database Management Systems to NoSQL databases, data ware houses, legacy systems, document repositories, google and Excel spreadsheets, CSV, etc which are currently available in the world, were originated to cater that requirement. And almost all the enterprise bodies use  one or a combination of the aforementioned data storage mechanisms to fulfill their various data manipulation needs.

What's Data Federation?

As the term itself implies, Data Federation usually refers to the integration of data scattered across numerous types of data sources into some sort of form which makes it easy to access. Before the introduction of the concept of Data Federation, the most commonly used practice was to, first copy the relevant data into some other additional storage space and then carry out the integration based on the previously described copied chunk of data. But the bottlenecks encountered while doing so, such as copyright infringements when copying data, the need of additional storage space, led the way out to find better alternatives that possess the potential to avoid such bottlenecks. Among such alternatives, the concept of Data Federation could be considered the most advanced and efficient solution which makes it possible for various organizations to collect and process data scattered across their various data sources efficiently.

How does the WSO2 DSS fit in?

If we delve into the enterprise data solutions that are currently available in the market which offers its users with Data Federation functionality, WSO2 Data Services Server comes handy with its capabilities over the Data Federation as it supports a wide range of data source types to be federated varying from Relational Database Management Systems (RDBMS) such as Mysql, Oracle, MSSql, Postgres, H2, Derby to tabular data sources such as Google Spread Sheets, Excel Spread Sheets, CSV, etc. In WSO2 Data Services Server, the users are provided with the functionality of manipulating data stored in multiple types of data sources and present them to the user with an unified format.

In WSO2 DSS, this is implemented by using two main functionalities, namely,
1. Multiple data source support.
2. Nested query support.
3. Export parameter support.

1. Multiple data source support

Multiple data source support is another enticing feature available via WSO2 Data Services Server which enables users to define multiple database configurations within the same data-service descriptor. The following diagram depicts how it's done using a sample descriptor. There, each database configuration is given an id to uniquely identify the data source and this particular id will be later used in the process of integrating the data extracted out from various types of data sources together.

2. Nested Queries

Nested queries can also be considered another vital feature used in the process of data federation which carries out the real integration of the data queried from different types of data sources together. In other words, this makes it possible for a particular data service query to feed the result obtained after the execution of that particular query, as an input to some other query and eventually integrate both results to an unified format before presenting it to the user. The following diagram depicts the configurations of such sample data service queries and how they are integrated together to from a nested query which could be used in the process of data federation.


 3. Export parameter support

With this particular feature, the user is given the ability to export values of the output parameters of a particular query to be used in another query.

Having discussed about the bits and pieces of the Data Federation implementation of WSO2 DSS, let's delve into some practical use cases where you can actually make use of this feature in the real production environment.

Sample use cases:

Usecase 1: Let's consider a hypothetical usecase where a particular organization has the data related to its employees and offices in two RDBMSs' of the type MySQL and Oracle. further imagine the MySQL database contains a table named "Employee" and the Oracle database contains a table named "Office" to store the relevant data. Here,  the user needs to present both those data sets queried from different databases merged as a list of offices which nests its employees under each listed office. This link would redirect you to the sample data service and the database configurations used to implement the aforementioned usecase.

Usecase 2: Assume a particular user has some data stored in the form of CSV files and he needs to get those data exported into a MySQL database. Click here to download the sample data service descriptor and database configuration files used in this particular example.

No comments:

Post a Comment