- type of Search needed by the application(Different Columns, colored Blue)
- amount of Scale needed by the application (colored green)
- amount of consistency required by the application (colored brown)
Here we are presenting 3D data using a 2D table. Repeated columns under
each type of scale column sets takes care of that. For example, "DB" in
the forth row forth column says "if you need where clause like search,
with small scale and transactions, use a DB". Other cells use the same
idea.
The notation I use is KV: Key-Value Systems, CF: Column Families, Doc:
document based Systems. Questions marks in the table means, it might
work, but you should verify. The table only put the recommendations and
does not exactly say how I come up with the recommendations. More
details are in the slides, and I will get out a writeup soon. Some of
the key ideas are
- Transactions and Joins does not scale great
- KV scale most, then CF and Doc models, then DB. So if KV is good enough, go for that.
- Offline case have time to do MapReduce and walk through the data.
- If you need transactions or Joins with scale, you have to try partitioned DBs. But you have to try and see, and it might not work either. If it does not, you are out of luck.
Small (1-3 nodes)
|
Scalable (10 nodes)
|
Highly Scalable (1000s nodes)
| |||||||
Loose
Consistency
|
Operation Consistency
|
ACID
Transactions
|
Loose
Consistency
|
Operation Consistency
|
ACID
Transactions
|
Loose
Consistency
|
Operation Consistency
|
ACID
Transactions
| |
Primary Key
|
DB/
KV/ CF
|
DB/
KV/ CF
|
DB
|
KV/CF
|
KV/CF
|
DB?
|
KV/CF
|
KV/CF
|
No
|
Where
|
DB/ CF/Doc
|
DB/ CF/Doc
|
DB
|
CF/Doc(?)
|
CF/Doc
(?)
|
DB?
|
CF/Doc
|
CF/Doc
|
No
|
JOIN
|
DB
|
DB
|
DB
|
??
|
??
|
??
|
No
|
No
|
No
|
Offline
|
DB/CF/Doc
|
DB/CF/Doc
|
DB/CF/Doc
|
CF/Doc
|
CF/Doc
|
No
|
CF/Doc
|
CF/Doc
|
No
|
Survey of System Management Frameworks
Following is couple of tables I took out of Survey I did on system management frameworks. A detailed writeup of this survey is in my thesis under related works. I want to write up a survey paper on this, but have not got to that yet.
Lets start with brief outline on rational for the choice of matrices to compare different implementations. System management frameworks monitor a system, take decisions on how to fix things, and execute those decisions. They often consist of three parts: Sensors, a brain that make decisions, and set of actuators to execute decisions. From 10,000-foot view, most architectures follow this model, but differ on how they collect data, make decisions, and execute them. First table compares them on this aspect.
System management frameworks themselves have to scale and provide High availability. To do those, they have to be composed of multiple servers (managers). Taking decisions with multiple managers (coordination) is one of the key challenges in system management framework design. The second table compares and contrast different coordination models using different architectural styles.
The 3rd table discusses decision models—in other words, implementation of the brain. More details can find from http://people.apache.org/~hemapani/research/system-management-survey.html.
Systems based on Functionality/Design
Sensors->Brain-> (Actuators)? | InfoSpect(Prolog),WRMFDS (AI), Rule-basedDignosis (Hierachy) (Monitoring & Dignosis) Ref Archi, Autopilot, Policy2Actions (Centralized) JADE, JINI-Fed, ReactiveSys, (Managers) Tivoli (Managers + Manual Coordinator) |
Sensors->Gauges->Brain-> (Actuators)? | Paradyn MRNet (Monitoring Only) Rainbow (Archi Model*), ACME-CMU, CoordOfSysMngServices, Code InfraMonMangGrid(Centralized) eXtreme(KX),Galaxy* (Managers) |
Sensors->DataModel->Brain->Actuators | DealingWithScale(CIM), Marvel( Centralized) Scalable Management (PLDB/Managers), Nester (Distributed directory service), SelfOrgSWArchi |
Sensors->ECA->Actuators | DREAM, SmartSub, IrisLog, HiFi, Sophia (Prolog) - (Managers) |
Management Hierarchy | NaradaBrokerMng, WildCat, BISE,ReflexEngine |
Queries on demand | ACME , Decentralizing Network Management, Mng4P2P (Managers) |
Workflow systems/ Resource management | Unity, RecipeBased (Centralized), SelfMngWFEngine, Automate Accord, ScWAreaRM, P2PDesktopGrid (Distributed) |
Fault tolerant systems | WSRF-Container, GEMS (Distributed) |
Decentralized | DMonA,Guerrilla, K-Componets |
Deployment frameworks | SmartFrog , Plush, CoordApat |
Monitoring | Inca (Centralized), PIPER (multicast over DHT), MDS (LDAP, Events), NWS(LDAP), Scalable Info management, Astorable (Gossip), Monitoring with accuracy, RGMA (DB like access) |
Systems based on Coordination and Communication model
Monitoring Only | One Manager without coordination | One Manager with coordination | Managers without coordination | Managers with coordination | |
Pull / events | InfoSpect Sophia |
Rainbow, Unity, Ref Archi, Autopilot, InfraMonMangGrid | CoordOfSysMngServices | JADE | DMonA, Tivoli (Manual) |
Pub/Sub hierarchy | DealingWithScale, ACME-CMU | Scalable Management, DREAM, SmartSub | NaradaBrokerMng, | ||
P2P | PIPER, | Automate Accord, Mng4P2P, WSRF-Container, | |||
Hierarchy | Gangila, Globus MDS, Paradyn MRNet, WRMFDS | IrisLog, HiFi, JINI-Fed, eXtreme(KX), ScWAreaRM | WildCat | ||
Gossip | Astorable, GEMS | Galaxy | |||
Spanning Tree (Network/P2P) | Scalable Info management, ACME | ||||
Group Communication | ReactiveSys, Galaxy | SelfOrgSWArchi | |||
Distributed Queue | SelfMngWFEngine |
Decision Models in management Systems
Rules | Conflict resolution | Verifier | Batch Mode used? | Meta-model used? | Planning | |
DIOS++ | If/then | yes, use priority | No | results of rules applied in next iteration | Yes | No |
Rainbow | If/then | No | Yes | No | ||
InfoSpect | Prolog Like | Only Monitoring | No | No | Yes | No |
Marvel(1995) | PreCnd->action->PostCond | Yes | No | Yes | No | |
CoordOfSysMngServices | Yes (Detect ->Human Help) | Yes | Yes | No | Yes | |
Sophia | Prolog like | No | No | No | ||
RecipeBased | Java Code | Yes | No | No | No | |
HiFi, DREAM | pub/sub Filter->action | No | No | No | No | No |
IrisLog | DB triggers | No | No | No | No | No |
ACME | (timer/sensor/completion) conditions->action |
No | No | possible with timer conditions | Yes | No |
ReactiveSys (1993) | if/then | No | No | No | Yes | No |
Policy2Actions (2007) | Policy- name, conditions (name of method to run + parameters), actions, target component | Yes, based on runtime state + history | There are tests associated with each actions that decide should action need to run | - | No | Yes |
Friday, August 19, 2011
WSO2 Stratos in contrast to Other Java PasS Offerings
In the article "Java PaaS shootout"
Michael J. Yuan provides a pretty nice comparison of Google AppEngine,
Amazon Elastic Beanstalk, and CloudBees RUN@Cloud. Following table
provides a summary of that while adding WSO2 Stratos to the comparison.
Also let me listout couple of key differentiators of WSO2 Stratos.
App Engine
|
Amazon beanstalk
|
CloudBee's Run@Cloud
|
WSO2 Stratos
|
|
What is it?
|
Users can upload servlets. AppEngine hosts them and manage them.
|
Managed Tomcat
Expensive
|
Tomcat, load balancer. Integrated with SVN. Can change
source code and update all deployment aspects.
|
SOA platform as a service.
Fully multi-tenant.
|
Java support
|
Yes, does not support some I/O and network operations
|
Full java
|
Full Java |
Yes, but File access is limited
|
Outbound connections
|
Time out in 10 seconds
|
OK
|
OK
|
OK
|
Support for standard java Libs
|
Have problems when they use unsupported APIs
|
Yes
|
Yes
|
Yes (java security manager limits file accesses)
|
Performance and scalability
|
Auto scale, High scalability, but have bit high latency.
Swapping the app out might slow down first request
|
Auto scale by creating EC2 instances
|
Can swap unused
processes out of JVM. Can load balance multiple tomcats in the same EC2
instance
|
Can lazy load services and other artifacts.
Auto scale (up down) by monitoring the load and creating new nodes. LB route the requests. |
Storage
|
Support Big Table and Hosted MySQL.
However, search support in BigTable case is limited.
e.g. Each query
can only have 100 results.
|
Support RDS (relational) , SimpleDB (NoSQL) or can run with your
own DB
|
Has managed MySQL databases and provide a console to
manage them
|
Support Cassandra as a Service, managed MySQL, and HDFS.
Cassandra and HDFS
support native multi-tenancy
|
Import/ export data
|
No (hard due to 30 sec limit)
|
Can write code to automate
|
Can write code to automate
|
Can write code to automate
|
Integration with others
|
Integrate well with other Google services
|
SQS, SES (email service), payment APIs
|
S3, SQS, SES etc.
|
With Google auth model and other WSO2 services. Also S3, SQS, SES etc.
|
Session handling
|
Store sessions to storage and handles them seamlessly
|
Only sticky sessions
|
Transparent session management
|
Only sticky sessions
|
Multi-tenancy
|
Yes
|
No
|
No
|
Yes
|
Also let me listout couple of key differentiators of WSO2 Stratos.
- All these offerings support Web App Hosting as a Service. WSO2 Stratos supports that, but provides much more. In addition to Web App Hosting, it supports hosting Axis2 based services, Mediation, and Workflow hosting as a Service. It is real SOA platform as a Service, the only one to does that.
- It let you move your Axis2 based Web Services (.aar files) and workflows to the Cloud (to WSO2 Statos Live) without any change to them. If you have some Axis2 based services, chances are that you can upload them to WSO2 Stratos and it will just work.
- WSO2 Stratos provides real multi-tenancy support. That is different tenants will think that he has his own server, while actually all are served from one Java Server. In other words, Isolation is done at Java level, not at Virtualization level. That means it can provide greater sharing and provide "Pay as you go" and "Pay for what you use" better than VM based model. Only AppEngine does that out of other three PaaS offering. More details are in following papers.
- A. Azeez, S. Perera, D. Gamage et al. (2010) Multi-tenant SOA Middleware for Cloud Computing, 458–465. In 2010 IEEE 3rd International Conference on Cloud Computing.
- A. Azeez and S. Perera et al., WSO2 Stratos: An Industrial Stack to Support Cloud Computing, IT: Methods and Applications of Informatics and Information Technology Journal, the special Issue on Cloud Computing, 2011.
- Milinda Pathirage, Srinath Perera, Sanjiva Weerawarana, Indika Kumara, A Multi-tenant Architecture for Business Process Execution, 9th International Conference on Web Services (ICWS), 2011
What can you do with WSO2 Platform?
We have explained WSO2 platform in many ways. Here I am
trying to take a more scenario driven approach.
1. Implementing Business Logic
If you are looking to build a SOA based architecture, first step is to implement your business logic as a set of services. Users can use WSO2 Application Server (AS) to do this, and AS enables users to implement a business logic using any of the following methods.- Using Java Code (WSO2 AS, any Axis2 .aar file works)
- Exposing data in a data source (e.g. Relational Database, CSV file etc., see WSO2 Data services Server)
- Using Business Rules (Supports Drools, see WSO2 BRS)
- Using a script language (e.g. Javascript, Phython, see WSO2 AS)
Once the service is created, users can secure them using
HTTPs or WS-Security. Furthermore, he can enable most of the WS-* support also
using WSO2 Application Server.
2. Mediating Messages
Often SOA deployment has third party services, legacy
applications and other types of frication that force the architecture to mediate
messages that flow between services. Users
can use WSO2 Enterprise Service Bus (ESB) to do this. Mediation can be of
several forms. (* there are others, and following are only some of them.)
- Message transformations
- Message editing
- Message filtering
- Route (e.g. load balancing, Pub/Sub)
- Support Enterprise Integration Patterns (EIP)
3. Creating the User Interface Layer
Often these services have user facing components. WSO2
platform provides two choices to do this.
- Java Web Applications (.war file) in WSO2 AS
- Google Gadget support in WSO2 Gadget Server
4. Composing Services
If the target application is complicated, specially when
service execution flows are dynamic and themselves contains business logic,
users may opt to compose those services to create business processes. WSO2
platform provides three choices for doing this.
- Create the process as a BPEL document, deploy, and run the business process using WSO2 Business Process Server
- Create the process as a mediation sequence using WSO2 ESB
- Create the process using java scripts with WSO2 Mashup Server
5. Scaling
If some of the services receive higher loads that exceed single node capacity, users would want to scale the system by clustering those bottleneck service. Exact details are usecase specific, but WSO2 platform support clustering through Axis2 clustering
Cross Cutting Concerns
In addition, WSO2 platform supports two cross cutting
aspects: namely governance and business actively monitoring.
1. Govern SOA
If a SOA deployment has many services and they goes through frequent
business logic and configurations changes, ad-hoc management of the
deployment could be very complex. WSO2 Governance registry let users
store all service WSDL in a single repository and manage their
configurations and lifecycles from a single point.
2. Monitoring the Business
Most businesses needs a detailed close to real time view of their
activities. WSO2 Business Activity Monitoring (BAM) let users define
trace points in activities (service executions). At the runtime, BAM
and monitors data collected at those trace points, and present them to
end users through Gadgets that aggregate and visualize them.
Wednesday, August 17, 2011
Tuesday, August 16, 2011
Useful data sets
Often you can find data in the CSV format, and then parsing and using it is pretty easy.
- DLPB catalog - http://kdl.cs.umass.edu/data/dblp/dblp-info.html - this is data about publications. About 900MB raw size.
- Google Fusion tables, http://www.google.com/fusiontables/Home
- this has several useful datasets as CSV. - Federal reserve economic data - http://research.stlouisfed.org/fred2/
- Amazon public datasets - aws.amazon.com/publicdatasets/
There are lot more. Following are some of them. If anyone knows list
giving a sizes of datasets and nature of datasets, please let me know.
- http://dvn.iq.harvard.edu/dvn/dv/cid
- http://www.thejanuarist.com/9-fascinating-datasets-available-online-for-free/
- http://bios.dfg.ca.gov/dataset_index.asp
- http://www.uic.edu/orgs/rin/dataset.html
- http://www.nas.nasa.gov/Resources/datasets.html
- http://www.datawrangling.com/some-datasets-available-on-the-web
- http://news.ycombinator.com/item?id=2165497
- https://bitly.com/bundles/hmason/1
- http://www.gutenberg.org/wiki/Gutenberg:Feeds
- http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public
- http://getthedata.org/
- http://news.ycombinator.com/item?id=2165497
No comments:
Post a Comment