Expertise

Pentaho

Pentaho® is a prominent open source reporting project with a great vision of a single, comprehensive BI suite that covers the gamut from reporting to data mining. Note, though, that reporting is but a small part of Pentaho’s overall vision of providing a soup-to-nuts, open source Business Intelligence Suite. The Pentaho BI Suite Community Edition (BI) encompasses several open source projects: the Pentaho BI Platform & Server, Pentaho Reporting, Pentaho Analysis Services (Mondrian), Pentaho Data Integration (Kettle), and Pentaho Data Mining (Weka).

 

This article focuses on the reporting components, primarily Pentaho Reporting, along with the reporting functionality included Pentaho BI Platform & Server and Pentaho Analysis Services. 

Top

Changes Since Our Last Review

Since our last look in 2010, Pentaho’s open source reporting offerings have improved significantly, with more improvements to its open source reporting products than either Jaspersoft or BIRT.

Between the 3.5 version of 2010, and the 4.1 version of the Pentaho BI Suite in 2012, here are some of the changes we noticed:

  • The stability of the Report Designer has greatly improved. In 2010, much of our frustration with the Pentaho Report Designer was from the instability and poor performance of the tool. This time around, however, we experienced no issues: no crashes, and report generation performance was on par with the report designers of BIRT and JasperReports.
  • The report creation wizard in the Pentaho Report Designer has been improved greatly and is now quite useful for both novices and experienced report developers. You can take a report design created by the wizard, further refine it with the typical editing functionality of the Pentaho Report Designer, and then even go back into the wizard to make additional changes like adding columns. This gives even the experienced report author a leg-up in creating reports, as you can quickly get started with the wizard, eliminating some of the tedious up-front work, and then later add to and refine the design. While this wizard is not capable of creating very complex reports, and is limited to grouped listing and detail reports, it is far better than Jaspersoft’s wizard (which does not help with creating aggregates for group sections) and BIRT which does not even have a wizard.
  • The Pentaho Report Designer now makes it very easy to set up connectivity to a wide variety of data sources, putting it on par with Jaspersoft and ahead of BIRT. It provides guidance on connecting to nearly 40 databases via JDBC (see screenshot).
  • pentaho-image001

    Figure 1 - Pentaho Connectivity

     

    Of note, however, is that open-source Pentaho cannot connect to Hadoop Hive directly, unlike BIRT and Jaspersoft; instead you need to access Hive with Pentaho Data Integration, either via the Pentaho BI Server or by separately installing and configuring the Pentaho Kettle Data Integration engine. Direct access to Hive, without using Pentaho Data Integration, is available in Pentaho’s commercial products.

     

  • Pentaho has changed the license of some portions of the BI Suite from the GNU Lesser Public License to the Apache license. This makes it easier to incorporate Pentaho within commercial products. As of this writing, this mainly impacts the Pentaho Integration and some analysis components, not the reporting components.  If Pentaho continues this trend with its reporting components, we would consider this a favorable development.
  • Although we are not covering the BI Servers in much detail within this write-up, in our 2010 we mentioned how much we liked the Ad-Hoc Query and Reporting functionality built into the Pentaho BI Server, as it was the only open source suite to provide this functionality. This has changed, though, in 2012. Both the OLAP Viewer (JPivot) and the “Web Ad-Hoc Query and Reporting” in the Pentaho BI Server have reached the end of the line. You can still use them, however, via nags built into the user interface of the User Console, Pentaho has put users and developers on notice that these components are not going to be enhanced further and may be completely removed from the Server in future releases.

Also notable is what hasn’t changed with Pentaho:

  • Charting. Creating charts with Pentaho is more difficult to learn and less flexible than both BIRT and Jaspersoft.
  • Cross-tabs. Two years later, Pentaho’s crosstab builder is still considered experimental, and its functionality is still significantly lagging behind both BIRT and Jaspersoft. This effectively rules out Pentaho for a huge portion of reports which are crosstab-based.

 

Top

Components

As of this writing, the Pentaho BI Suite Community Edition is in its 4.1 version, but most of the Pentaho Reporting components within the Pentaho BI Suite are only at the 3.8.3 version. This version discrepancy can be a bit confusing. 

 

The open-source Pentaho BI Suite Community Edition (CE) includes the following components for reporting: 

  1. Pentaho Report Designer – a WYSIWIG tool that lets you create reports using a graphical user interface, as opposed to creating reports programmatically or by directly creating and manipulating XML. These reports can then be run by the Pentaho Reporting Engine Classic or the Pentaho BI Server. The Pentaho Report Designer is a stand-alone, desktop-installed client tool, and is not available as an Eclipse or NetBeans plug-in.
  2. Pentaho Reporting Engine Classic – formerly known as “JFree Report”, this is a collection of Java classes and APIs that execute Pentaho’s XML-based reports. The Pentaho Reporting Engine runs report designs against data sources, and renders report output in HTML, PDF, Excel, and other output formats. You can embed the Pentaho Reporting Engine inside your Java applications. You don’t need the Pentaho Reporting Engine if you use the BI Server.
  3. Pentaho Reporting SDK – The SDK is the Pentaho Reporting Engine Class, plus the documentation and supporting libraries that developers need to embed the Pentaho Engine Classic in their applications.
  4. Pentaho BI Server – the BI Server is a J2EE application that provides an infrastructure for multiple users to run reports and OLAP cubes through a web-based user interface.  At the core of the BI Server are the Classic Engine and the Mondrian ROLAP Engine (which run the reports and OLAP cubes respectively), plus a host of server capabilities including authentication, user management, logging, email notification, server APIs, and report scheduling. The BI Server also provides the infrastructure for reports and analytic cubes to access data and metadata via the Pentaho Data Integration’s ETL functionality (Kettle).
    • Pentaho User Console – end-users can login, browse reports, run them, view report results in HTML or PDF, and download report results in other formats. Users can also, for the time being, create basic ad-hoc reports, and conduct some OLAP analysis, however this functionality is likely to be removed in future releases.
    • Pentaho Administrator Console – administrators and developers can deploy reports, manage users, set up security access privileges, and deploy workflows.
  5. The BI Server includes two web-based user interfaces. The open-source Community Edition of the BI Server does not include the Pentaho Dashboard Designer, Pentaho Analyzer or Pentaho Enterprise Console. That’s a shame, because we are big fans of the Pentaho Analyzer as a modern, web-based OLAP viewer.

  6. Pentaho Design Studio – At the 4.0.0 version, the Pentaho Design Studio is an Eclipse plug-in that lets you create XML-based Action Sequence documents (XACTION files). Think of Action Sequence files as lists of instructions that you can deploy to the BI Server to control the behavior of the BI Server. Action Sequence documents can be used to run data queries, prompt BI Server users for input parameters, run one or more reports in succession, and execute Java Script. For example, an Action Sequence document can instruct the server to prompt the user for parameter values, run a report with those parameter values, and then send an email notification to specified users. Note that the Pentaho Report Studio is not used to create the reports themselves – use the Pentaho Report Designer for that. The Pentaho Report Studio is not required if you just want to deploy a report to the BI Server and let your users run it, schedule it, and immediately view results. Action Sequences are only needed if your workflows are more complicated.
  7.  

    In general, it seems that Pentaho is working hard to eliminate the need for XACTION files for most reporting use cases, such as using report parameters to change the report’s query. It is very likely that you will not need the Pentaho Design Studio at all, but it is still available if your application logic is beyond what can be handled within the report design itself.

 

Top

General Impressions

We found Pentaho to be the easiest to learn for creating basic listing reports and for grouped listing reports with aggregations. Its UI was not overly cluttered with sophisticated, less-commonly used functionality, and the tool was easy to learn and performed well. In short, the User Interface is attractive and functionality is sensibly laid out.

 

Pentaho’s stated goal is to provide an all-encompassing, integrated BI suite, covering the gamut from simple reportig to OLAP analysis to data transformation to data mining. This broad company vision might explain why the reporting functionality is not as deep as BIRT or Jaspersoft. Functionality that report developers take for granted in BIRT and JasperReports – such as side-by-side report components, cross-tabs, and robust charting – are not as fully developed in Pentaho. Unfortunately this means when using only Pentaho reporting, it can be frustrating and sometimes impossible to create more complex reports. 

 

Top

Pentaho Report Designer

The Pentaho Report Designer is a WYSIWIG tool that lets you create reports using a graphical user interface, as opposed to creating reports by directly creating and manipulating XML. These reports can then be run by the Classic Engine or the Pentaho BI Server.

 

In general, our impression is that the usability and stability of the Pentaho’s Report Designer has improved significantly over the past few years. The Pentaho Report Designer’s usability now surpasses both Jaspersoft iReport and BIRT for grouped listing reports of simple to moderate complexity.

pentaho-image003

Figure 2 - Pentaho Report Designer

 

Pentaho Report Designer has much of the same functionality as the other report designers, as demonstrated by our Feature Comparison Grid. However, Pentaho Report Designer is different from the other report designers in a few major ways:

  1. Like JasperReports iReport, but different than BIRT, Pentaho uses “pixel positioning” as its central approach, and at its core it emphasizes paginated reports, as opposed to dynamic web layouts.
  2. Pentaho relies on sub-reports to support reports with multiple queries, multiple group sections, and multiple data sources. This is different than BIRT, but the same as JasperReports.
  3. Charting is more difficult and less full-featured with Pentaho than with either JasperReports or BIRT.
  4. Pentaho includes a nice, re-entrant report wizard for creating grouped listing reports. We found ourselves using it often to eliminate some of the tedious work involved in the initial layout of a report.

Pixel Positioning

The Pentaho Report Designer is in the “pixel positioning” school of report design. Like Jasper (and unlike BIRT) users specify precisely where each report element is to be displayed. This gives users fine-grain control over the look of a report, but also limits the report’s ability to adapt to different-sized displays. For example, if you want a report to look good when printed on an 8.5”x11” sheet of paper, then the report will only be as wide as a sheet of paper even when displayed on a widescreen monitor with lots more horizontal screen real estate.

 

Sub-Reports

Like Jasper, Pentaho is very dependent on sub-reports. If you want to use multiple grouping sections, multiple data sources, have side-by-side report components, or re-use the results of a query within a different section of a report, you need to use sub-reports.

 

While sub-reports are great for re-using report pieces across many different reports, requiring sub-reports for the above use cases adds unnecessary difficulty and complexity to the report design process:

  1. You need to gracefully hand parameters and sometimes query data between the master report and sub-report (and sub-sub-report, etc).
  2. Report Developers need to manually manage the dependency between the master report and sub-report files.
  3. Too many sub-reports can result in poor performance [KM6]because each sub-report opens its own thread, and queries. So, for example, if you have a sub-report within a group section that expands into 70 different groups, then the sub-report will initialize and run 70 times.
  4. Sub-reports need to be precisely designed so that their size fits exactly into the space provided by the master report

 

Report Wizard

The Pentaho Report Designer has the best built-in report design “wizard.” It’s great for getting beginning users started, allowing them to quickly create regular listing-style reports with up to four levels of grouping, skipping the tedious work of manually placing and formatting each individual report control. (see screenshots) An important thing about this wizard is that it is re-entrant, you can make changes by pulling up the wizard again.

pentaho-image004

Figure 3 - Pentaho Report Wizard

 

Charting

In Pentaho Report Designer, the process of creating a chart involves providing property values for a really large dialog box (see below screenshot). There is no wizard to take you through the process, unlike JasperReports or BIRT.  We find it interesting that Pentaho provides such an excellent report wizard but no charting wizard, and BIRT provides a great chart wizard but no report wizard. JasperReports iReport has middle-of-the road wizards for both reports and charts.

 

This is sufficient to create many types of charts, but does not offer as many levers to customize the report’s contents, look, and behavior as the other tools. Setting up the chart’s category, series, and values is difficult, as most of the time you cannot use pick-lists to choose the fields to use, and instead you have to type them into a pop-up window that doesn’t provide any guidance as to what you are supposed to type. Although the chart dialog seems to imply that you can just change the chart type by just clicking on the appropriate button along the top, when you do so you often lose the values you already typed in for category, series, etc. This can make it very frustrating to use.

Adding to the usability issues, there is no chart preview – you can’t see the results of what you created until you leave the Chart Dialog and actually run the report.

pentaho-image005

Figure 4 - Pentaho Chart Designer

 

 

Top

Strengths and Weaknesses of the Pentaho Report Designer

Below are some of the strengths and weaknesses of the Pentaho Report Designer, as compared to BIRT and JasperReport’s report designers. Note that some of these strengths and weaknesses are really due to the behavior of the underlying Classic Engine, not the Pentaho Report Designer itself. However, we include them here because the report developer is most likely to encounter them.

Strengths

  • Pentaho’s Report Designer is easily the most visually attractive of the three design tools. Pentaho has clearly put a greater emphasis on user experience, usability, and look and feel than the other projects. When you use the Pentaho tools, they just feel nice.
  • The Pentaho Report Designer has the best built-in report design wizard.
  • With Pentaho, you can create “row-banded” reports, with alternating colors for each report row, by simply checking a box. Much easier than either BIRT or Jasper!
  • Pentaho does not require that reports be compiled prior to running (unlike Jasper but like BIRT).
  • Pentaho reports are in XML format, and thus can be effectively put under revision control.
  • Pentaho’s Excel-like expression language is easier for non-coders to understand, where Jaspersoft and BIRT expressions require knowledge of Javascript.

Weaknesses

  • Cross-tabs are still “experimental” and have been for two years now. As an experimental feature, cross-tabs are not nearly as full-featured, customizable, or as stable as cross-tabs in BIRT or JasperReports. In particular, formatting a Pentaho cross-tab can be quite frustrating. You need to enable experimental features (Edit-->Preferences-->General) to enable the feature in Pentaho Report Designer.
  • It is difficult to create charts (see previous section). The process has usability issues and the Pentaho Report Designer does not offer as many levers to customize the report’s contents, look, and behavior as the other tools.
  • Pentaho requires that the report query do the “heavy lifting” for grouping, filtering, sorting and aggregates. If the data does not arrive in the report in the proper way, Pentaho has less ability to further manipulate the data than BIRT. The report developer is responsible for ensuring that groups in the report design are in the same order as the data groups returned by the query.
  • Pentaho does not support “newspaper layouts” with multiple columns (BIRT doesn’t, Jasper does), and does not yet support vertical text.

 

Top

Conclusion

Over the past two years, Pentaho has shown the most improvement of the open source reporting tools we evaluated. We believe that Pentaho is an excellent choice for reports that are of simple to moderate complexity and don’t require crosstabs or charts. We still believe that BIRT and Jaspersoft are better for more complex reports. Pentaho might also be a good choice if printed reports are an important requirement.

 

One of the most compelling reasons to go with Pentaho, however, is how Pentaho Reporting integrates with the rest of the Pentaho BI Suite. While we do not cover these other Pentaho products in this write-up, they are worth a look if you have requirements around data mining, data integration, OLAP analysis, or just need a report server to deploy and run reports via a web-based user interface.

 

Pentaho is a registered trademark of Pentaho, Inc.