Pentaho Business Intelligence offers a variety of ways to make reports. A large variety. Or should I say a really very large variety. In this post I'm trying to list out some of the options as well as how to manage that diversity. Given the title of this post, I couldn't help but include the appropriate background music. So please, hit play, and read away.
8 tools to get the job done
As a starter, I've tried to make a little drawing of the different tools included (or includable) in the Pentaho BI server. As you can see, to make a simple report, you can already choose between 6 different reporting tools, namely:
- Pentaho Reports (made with Pentaho Report Designer)
- WAQR, the Web Ad-hoc Query Reporting tool, an online wizard to generate PRD reports.
- BIRT reports (made with Eclipse based reporting system: BIRT)
- Pentaho Analyzer (LucidEra's ClearView product, acquired by Pentaho)
- JFree Report (the Ad-hoc reporting engine Pentaho offered before Analyzer)
- Saiku aka PAT.
And if that wasn't enough, I'm leaving out of the picture (literally) 2 tools for dashboarding, which also could easily by (ab)used to create a simple report.
- Pentaho Dashboards and
- the Community Dashboard Framework and the Community Dashboard Editor by WebDetails
OK. So if we count also the dashboarding tools, we have 8 different tools to make a "report", that is by any Business Intelligence standard, a large choice. But there are more choices to make.Endless ways to get to the data
If you decide to go for Pentaho Reporting, you will have to decide how to fetch your data. Again, there seems hardly any reason to lament about the number of choices at your disposal. Here is what you get.
- JDBC: This allows you to define your own JDBC connection (or use an existing one) and manually write a SQL query that will be executed against that connection.
- Metadata: This method will use the Pentaho Metadata Layer to access the data. You don't get to see the database, but you'll have to use the Metadata Query Builder to generate an MQL (Metadata Query Language) query. (Quick start guide here)
- PDI: You can use a kettle transformation as a "data source" for your report. This of course opens up a again an endless series of options as PDI can use even your grandmother as a data source provided there is a JDBC driver available. That opens up data sources as: MS Access, MS Excel, flat files (fixed width and "something" separated), directory structures with file names, LDAP, Mondrian & OLAP, Salesforce data, SAP R/3 data, any SQL database with a JDBC driver, ... I guess we've made the point.
- OLAP: This option allows you use an MDX query as a basis for your report.
- XML: How about using an XML file as a basis for your report and defining your query against it here?
- Advanced: Seemingly the people at Pentaho don't consider any of the above options 'advanced' enough, because under the advanced menu you'll find some more options to toy around with.
- custom JDBC connection
- scriptable data access: use beanshell, groovy, netrexx, javascript, xlst, jacl, jython
- (named) java method invocation
- external
Personally, I haven't gotten round to using all of these methods, and though they intrigue me, I also hope I'll never have to use all of them. That is just too much to get my head around.
If you decide to go for Analyzer, JFreeReport or Saiku, your options are much more limited. Basically they all live on top of Pentaho Analysis Services aka Mondrian. So your choices here would be simply to create an MDX query. The difference between creating an MDX query with the 3 fore-mentioned tools and Pentaho Report Designer, is that these tools have a nice GUI to create the MDX for you (drag and drop or point and click).
When using BIRT reporting, you get a series of options that are closer to PRD again. I haven't listed all the features out, but they described here. The BIRT online demo also shows clearly how BIRT works.
Intermezzo
Depending on your reading speed, I believe your song must be finished by now, so maybe give this version a try.
The very best of
So why am I writing all of this out? Well, first of all, many customers don't understand the flexibility they have at their disposal when working with Pentaho. Often they have seen a demo or read some documentation, and they believe that what they have seen is: THE way "it works" with Pentaho. Consequently they ignore the other 49 ways to make a report. So when the consultant comes in and shows the options, they usually say "ah, I didn't know that was possible" or "why didn't any one tell me this could be done".
Once customers understand that 'THE way' doesn't exist, but that there are "50 ways to make your report", they automatically get to the next question, being: "what is the best way to make my report?". (Let's face it, people want to simplify things). And here the consulting work gets tricky, as it is impossible to make the answer fully customer independent. One thing however is sure, using all the "50 ways" in the same environment is not recommended. Using all the different possibilities, will require a large set of skills from your IT personnel and will hamper the maintenance work on those reports.
So, imho, a key element of implementing Pentaho Reporting at a customer, includes a clear study of which different reporting tools and data access methods fit best in the customer's IT architecture, and making a clear selection of which methods they should adopt as standards and which ones they should only use if the standard options don't work. Obviously this "customer strategy" should be aligned with the official Pentaho road map.
Good, bad or ugly?
Now the 64.000$ question is whether all this richness, actually makes a good "Reporting strategy" from a customer point of view? You could say that the picture I made looks pretty ugly or at the least very confusing. And in my experience, that is often how customers perceive it. Once they understand how many possibilities there are, the usually are profoundly confused.
What they ignore when making this assessment is that Pentaho is an open source initiative, which means that any one can extend the capabilities of the BI server with new reporting possibilities. This happens and will continue to happen, because it is an immediate consequence of an open source environment, So customer must first of all understand that Pentaho solutions will allow to do the same thing in more than one way.
Now again is that good or bad? I believe I have given the answer to that question already. If a customer doesn't overcomplicate his usage of Pentaho technology, and adapts clear standards, then Pentaho offers reporting strategies that are simple to learn and implement, as well as easy to maintain. It is up to the customer to make the right choices.
And where does Pentaho stand in all this? As far as I can see, I believe Pentaho should somehow monitor that all the richness of possibilities is explained to their customers and that they are guided in using the right set of possibilities. Over my career as a BI consultant I have seen many BI implementations. Other BI suites that allow for a high "diversity" of possible solutions as SAS BI or Microsoft BI, often resulted into Business Intelligence environments that became technologically hard to understand and impossible to maintain solely because 50 different programmers with a different opinion have come by. Are the vendors to blame for that? Not really. But still some guidelines from the vendor would have helped those poor customers. My experience is that Pentaho deliver this kind of service to its customers. Pentaho's Support, which is extremely well appreciated by customers, typically includes advice that is crucial in a start up phase, and that is a service that few BI vendors offer.
What they ignore when making this assessment is that Pentaho is an open source initiative, which means that any one can extend the capabilities of the BI server with new reporting possibilities. This happens and will continue to happen, because it is an immediate consequence of an open source environment, So customer must first of all understand that Pentaho solutions will allow to do the same thing in more than one way.
Now again is that good or bad? I believe I have given the answer to that question already. If a customer doesn't overcomplicate his usage of Pentaho technology, and adapts clear standards, then Pentaho offers reporting strategies that are simple to learn and implement, as well as easy to maintain. It is up to the customer to make the right choices.
And where does Pentaho stand in all this? As far as I can see, I believe Pentaho should somehow monitor that all the richness of possibilities is explained to their customers and that they are guided in using the right set of possibilities. Over my career as a BI consultant I have seen many BI implementations. Other BI suites that allow for a high "diversity" of possible solutions as SAS BI or Microsoft BI, often resulted into Business Intelligence environments that became technologically hard to understand and impossible to maintain solely because 50 different programmers with a different opinion have come by. Are the vendors to blame for that? Not really. But still some guidelines from the vendor would have helped those poor customers. My experience is that Pentaho deliver this kind of service to its customers. Pentaho's Support, which is extremely well appreciated by customers, typically includes advice that is crucial in a start up phase, and that is a service that few BI vendors offer.
While writing this post, I realized I left out some reporting options. I quickly throw in what I remember now, but there might be some more stuff. Any one reading this post and want to add something, please add it to the comments section, I would love to see this grow out to a completely complete overview :-)
- You can create Excel based reports, using only Pentaho Data Integration and Excel Writer, see also my previous blog post.
- Similarly you can create PRD reports using Pentaho Data Integration and PRD step, as demonstrated here.
- I didn't mention anything on embedding Pentaho Reports into other applications, as e.g. the Confluence Pentaho reports
Outtro
To end this post, I wanted to include a little tribute to Mr. Steve Gadd, the man who wrote the incredible drum riff that kicks off Paul Simon's "50 ways to leave your lover". An extremely unusual drum riff but some times the unusual methods deliver the best result. I guess Paul Simon was just lucky to have the right musician available that could deliver him the best groove to fit his song, even if that was a very unconventional one. Which shows in the end that diversity is good.

4 comments:
Nice writeup :)
I actually have one customer for which excel reports were the best choice. People may go, "yuck...", but I dare them any time of the day to create reports as complex as those either with JFree or an OLAP tool (generating reports allowed) - I think it can't be done in any convincing way.
So what we did was, have kettle generate the datasets in a very plain and simple XML format (multiple datasets involved), then use XSLT to generate MS Excel files.
And oh, We let Excel calculate the aggregates based on the cell values - this allows them to continue manipulating the data once they have the Excel sheet (handy for what-if questions)
One correction: There are only 5 options, not 6.
JFreeReport is the old name of what is now known as "Pentaho Reporting". So up to version 0.8.11 Pentaho Reporting is called "JfreeReport". After that (with 3.5 being the first version number), the same codeline transformed into Pentaho Reporting.
WAQR sits on top of either JFreeReport (BI-Server 3.0 and earlier) or Pentaho Reporting (3.5 and later).
Jan,
Thanks for another great article from kJube! Believe it or not, you may have missed a couple. Along with the component for displaying BIRT reports, we have a component for displaying JasperReports, although I'm not sure how well maintained it is. The BIRT and Jasper components don't get much development resources from Pentaho but are maintained by the community. These components currently rely on Action Sequences (xactions) which is yet another building block that can be used for creating content and to script report building. In the OLAP world, there is JPivot, the slicer-dicer which will be replaced when Saiku is ready for prime time (soon I hope).
Just in case that's not enough, in the coming year, you can watch out for a new RESTful visualization framework that will make it easier to build web mash-ups and deploy new types visualization components.
So, how does Pentaho plan to simplify all of this? We are continuing development of our Agile BI initiative which moves all data and metadata manipulations to a single development environment, spoon. We will continue adding the functionality currently found in Metadata Editor and Schema Workbench into one modeling environment. A single model that creates the schemas that Mondrian uses and the metadata that adhoc reporting requires. The idea is to define the data relationships one time, coupled to the ETL and be able to use that same definition everywhere. Schema Workbench and Metadata Editor will then be retired.
You mentioned using PDI to generate reports. PDI jobs have a tremendous overlap with Action Sequences and there is currently very little that you can do with xactions that can't be done with PDI. We will be adding capabilities to PDI (like the recent reporting component) in order to make it the batch processing engine of the platform and allow xactions to fade away.
Another area we are working on is to simplify access to all this data on the clients, both fat and thin. Part of Agile BI is a move toward a standard data definition, storage of that definition and a UI that is shared by all clients. Pentaho has been working with the Community Data Access team and will continue to move in the direction of a unified data source definition.
By making data access easier and similar across the Pentaho suite, people can concentrate on data presentation. This may actually lead to more ways to do things as developing and deploying new visualizations becomes easier. One thing is very clear from our customers and partners, there is no one size fits all when it comes to presentation. The options are:
- Traditional highly formatted reporting via Pentaho Report Designer. Also, the ability to deploy BIRT and Jasper Reports if companies already have an investment there. A crystal plugin would be awesome too hint, hint… PDI is used for batch reporting and report bursting.
- Interactive and end user reporting using metadata models and cubes that have been pre defined. Analyzer (EE only), Saiku and Web based Adhoc.
- Dash-boarding using CDF and Dashboard Designer (EE Only.) One thing we don't do well is explaining that the EE dashboards is actually build upon the CDF framework. They are not two different directions but are compatible technologies. Dashboard Designer is intended to make building dashboards easy for end users by hiding the complexities and capabilities of CDF. For highly customized dashboards, CDF is always available.
- Web 2.0 style Mash-ups. This is possible currently but is very difficult and not well documented. This year we will make it easier for web developers to include Pentaho content in their own web applications and make it easier integrate more third party visualizations.
Bottom line for 2011, less tools, more integration and even more choice. In the meantime, we really need to help people choose which option is best for them and articles like this one helps a lot. We are building a great story but we also need to tell that story.
Doug Moran
Pentaho
BTW - I hit the 4096 char limit...
I guess I was too long winded. I'd like to hear from more of our community experts about which tools they use for solving which problems and come up with some helpful guidelines for evaluators.
Thanks for starting the conversation.
Doug
Post a Comment