It is not a secret for us what the term OLAP (OnLine Analytical Processing) really means. The Internet contains a lot of information about this technology, about its implementation and values. The idea of simplifying the reporting and analytical part of the business with the help of OLAP sounds really brilliant. But is there any clear answer: why exactly should we use OLAP? Let’s try to figure it out.
The Business Case
Imagine the following case: you are a manager in the middle-size company; you have different tools for dealing with papers, plans, invoices and orders. It is really a lot of papers, a bunch of different information. And does anybody have an idea of how to get benefits of this in the future? Could this information be used in the future for gaining additional knowledge? How can we use historical data for the planning and forecasting?
If you just have a look at daily sales in your imaginary company, you soon realize that the sales are interrelated with many business dimensions, like: products, distribution channels, stores and even few more. A multidimensional view is an inherently representative of any business model. Thus, that is a task for OLAP, isn’t it? And the answer is “Yes”.
This task can be easily solved by using OLAP, the technology which is designed for the real time analytical information processing systems. These kinds of systems are built on the specially designed data warehouses. Traditionally, OLAP applications are founded on multidimensional modelling that represents data under the metaphor of a cube whose cells correspond to events that occurred in the business domain. Each event is quantified by a set of measures; each edge of the cube corresponds to a relevant dimension for analysis. With the usage of different systems based on this model the process of building reports/queries becomes extremely easy even for not experienced PC user. And this is the main benefit of OLAP.
Types of OLAP
As the data sources are operational systems that are everyday in use, their content is updated frequently. As a consequence, the content of multidimensional database has to be refreshed. According to this fact OLAP systems can be divided in several main categories by the way the data is stored there:
- Multidimensional (MOLAP)
- Multidimensional (MOLAP) systems transfer data from transaction sources in separated and optimized multidimensional array storage. MOLAP technology requires such operations like pre-computation and storage information in the cube (processing). It provides fast query performance connected with quite lengthy processing step.
- Relational (ROLAP)
- Relational (ROLAP) systems’ dimensions are stored in a relational database. ROLAP technology does not require processing step and data is read directly from relational database, but it requires the database that is designed for ROLAP use. It provides more scalable solution in handling large data volumes.
- Hybrid (HOLAP)
- Hybrid (HOLAP) systems are mixed type ones in which some large amount of data can be stored in relational tables.
Which kind of OLAP is suitable for solving a concrete analytical problem? It depends on many factors. MOLAP is more preferable when the speed of queries is the most crucial parameter. But it can hardly be used when the data warehouse, which is used for OLAP, is refreshed frequently. ROLAP is an ideal method if not permanent data warehouses are used, but it is slower than the usual MOLAP.
Thus the choice of OLAP server type should be connected with certain task.
OLAP in Open Source Domain?
OLAP is often used by large and middle size companies and maybe that is the main reason why there aren’t so much free open source OLAP servers available. That is the main concern when you will try to use OLAP for free.
Nowadays on the modern market there are several well-known free OLAP servers available:
- Mondrian (Pentaho Analysis Services)
- A Mondrian is a ROLAP type OLAP system consists of the presentation layer, the dimensional layer, the star layer, and the storage layer.
- Palo OLAP
- Palo OLAP Server is an in-memory MOLAP type server. Palo OLAP Server supports real-time modelling. This means users can modify hierarchies and even create new cubes on the fly in the familiar Excel environment. The database returns new aggregated values immediately.
- The icCube Server is an in-memory MOLAP type server. icCube solution has a web based interface for running the server, building the cube and querying the system.
All of the listed reporting servers are written in Java programming language. That is why it will be easy to develop applications for OLAP if you are familiar with Java, because almost all listed servers have Java connectors and APIs.
But are there any limitations if we try these products in practice? Unfortunately, yes. Both Palo and icCube have a lot of limitations in their “community” versions. And deep analysis of the available functionality of each certain free version shows that only Mondrian seems to be ready for productive usage.
The OLAP System Implementation
Let’s imagine that we don’t have any inconveniences with the limitations on the open source OLAP software or you use professional commercial business suites like SAP BI/BOJB, Microsoft Analysis services or something else. So what will be the sequence of the development cycle in this case? (The following steps are relevant for Mondrian OLAP server implementation, but in case of another server the sequence should be almost the same).
- On the early stage the whole structure of the data stored in data warehouse should be determined; models, facts and measures should be described.
- Then it will be necessary to extract the preliminary scenarios from the analysed data models (it will help to clarify the models, facts and dimensions).
- On the next stage the definition of the multidimensional representation takes place. The OLAP cube structures can be described in server native XML structures (i.e. Mondrian cube XML schema, which contains the full description of multidimensional database). Depends from the OLAP system you are going to use, there could be different techniques how these files can be edited, for instance, by editing plain text files, by using integrated cube builders, by using external software for designing OLAP cubes and etc.
- Now it is a good time to deploy/start your web server, install additional querying/reporting tools and to start writing MDX queries according scenarios which were created earlier.
Seems quite easy isn’t it? If we talk about open source OLAP engines the main problems at this point can be caused by application settings, installation of the different Java libraries and settings implementation for Java Web servers.
Now you can see that term OLAP doesn’t sound like a part of Oracle, SAP or Microsoft. OLAP is a technology that is available for improving your business right now! By historical data analysis it is possible to plan and forecast, define gaps and new strategy that can help to increase sales or production. The “community” versions of the OLAP servers can help to build low-cost analytical systems based on open standards, protocols and products.