Wednesday, June 29, 2011

What is Java Content Repository

by Sunil Patil

JSR-170 defines itself as "a standard, implementation independent way to access content bi-directionally on a granular level within a content repository," and goes on to define a content repository as "a high-level information management system that is a superset of traditional data repositories, [which] implements 'content services' such as: author based versioning, full textual searching, fine grained access control, content categorization and content event monitoring."
The Java Content Repository API (JSR-170) is an attempt to standardize an API that can be used for accessing a content repository. If you're not familiar with content management systems (CMS) such as DocumentumVignette, or FileNet, then you must be wondering what a content repository is. Think of a content repository as a generic application "data store" tht can be used for storing both text and binary data (images, word processor documents, PDFs, etc.). One key feature of a content repository is that you don't have to worry about how the data is actually stored: data could be stored in a RDBMS or a filesystem or as an XML document. In addition to providing services for storing and retrieving your data, most content repositories provide advanced services such as uniform access control, searching, versioning, observation, locking, and more.
Various CMSs from different vendors have been on the market for quite some time, and all of these CMSs ship their own version of a content repository. The problem is, each CMS vendor provides its own API for interacting with the content repository shipped with that vendor's CMS. This is a problem for the application developer, since he has to learn a particular vendor's API and potentially tie up his code with one particular CMS implementation.
JSR-170 tries to solve this problem by standardizing the API that should be used for connecting to any content repository. With JCR-170, you develop code by only using thejavax.jcr.* classes and interfaces. This should be able to work with any JSR-170 compliant content repository.
This article is a step-by-step tutorial for newcomers to JSR-170. I've decided to use Apache Jackrabbit, the reference implementation of JSR-170, as the content repository. I'll start the discussion by talking a little more about what content repository is and what is needed for standardizing the content repository API. After that I'll introduce you to JSR-170 by discussing the repository model defined by JSR-170. Next I will talk about what Apache Jackrabbit is, how to build it, and configure it for use. Once Apache Jackrabbit is set up, I will develop a sample application for demonstrating the basic features of JSR-170 API.

Need for Java Content Repository API

As the number of vendors offering proprietary content repositories has increased, the need for common programmatic interface to these repositories has become apparent and that's where JSR-170 comes into play. JSR-170 defines a programmatic interface that should be used for connecting to content repository. You can think about JSR-170 as a JDBC-like API for content repositories, allowing you to develop your program independently of any particular content repository implementation. At runtime, you can configure this program to work either with a natively JSR-170 compliant content repository (e.g.,Communique or Apache Jackrabbit) if your repository is not natively JSR-170 compliant (e.g., Documentum or Vignette), then you can use some kind of repository-specific JSR-170 driver that takes care of converting your JSR-170 method calls to repository-specific method calls.
CMSs are a quite old concept. Some of the common applications of CMSs include a web content management system used to manage content (static HTML files and images) on a company's web site, or a document management system where a company stores scanned copies of all sales orders. There are different CMS vendors in the market that provide this type of application. CMS vendors need a content repository as a backend, one that handles both structured and non-structured content efficiently. By "structured content," we mean content like a news item or press release that is posted in the system and retrieved by queries (e.g., your application's front page should display, say, the 3 latest press releases or 10 latest news items). An example of unstructured content is a scanned copy of a sales order or an image that should be displayed on your corporate website.
To support these CMS systems, vendors have developed their own content repositories that ship with their CMS systems. They also provide proprietary APIs that can be used for accessing this repository. As the number of CMS vendors increases, standardizing this API becomes apparent and that's where JSR-170 comes into play.
Figure 1 describes the structure of an application developed using the JSR-170 API. At run time, this application can work with either content repository 1, 2 or 3. Of these, only content repository 2 is natively JSR-170 compliant; the other two repositories need JSR-170 drivers for interacting with a JSR-170 application. Note one more thing: your application does not have to worry about how actual content is stored. Content repository 1 may use RDMBS as underlying data store where as content repository 2 may use the filesystem as its underlying data store, while some other repository could use a mix of these.
Structure of JSR-170 compliant application
Figure 1. Structure of JSR-170 compliant application
The JCR-170 API has different advantages for different stakeholders in content repository space.
  • Developers do not have to spend time learning each vendor's repository-specific API. Instead, once she is comfortable with JSR-170, a developer should be able to work with any JSR-170 compliant content repository. In the past, developers had to make choice between a CMS with great features and poor development tools, or one with great development tools but poor features. Now that the interface between content repository and CMS applications is standardized, you can choose the best choices from both worlds.
  • Corporations won't have to face problem of vendor lock-in. More commonly, many corporations have more than one CMS either because different departments choose to use different CMSs in the past, or because some acquired company used a different CMS system. In the past, corporations spent a lot of money getting these different systems to interact with each other. With JSR-170, they can be assured that same application will work with all CMSs.
  • CMS vendors were forced to develop and maintain their own content repository implementations, which meant lots of infrastructure code. Now they can leave development of the content repository to some other vendor and concentrate more on their core competency: developing CMS applications.

No comments:

Post a Comment