Pentaho Reports Review
Pentaho® is a prominent open source reporting project with a great vision of a single, comprehensive BI suite that covers the full spectrum of business intelligence (BI) life cycle including ETL, reporting, analytics, visualization, and data mining. Reporting is a only a portion of Pentaho’s overall vision of providing a soup-to-nuts, open source Business Intelligence Suite. The Pentaho BI Suite Community Edition (BI) encompasses the following open source projects: the Pentaho BI Platform & Server, Pentaho Reporting, Pentaho Analysis Services (Mondrian), Pentaho Data Integration (Kettle), and Pentaho Data Mining (Weka).
Hitachi Data Systems (HDS) announcement an agreement to acquire Pentaho in February of 2015. At this time, there is very little information available about HDS's plans for Pentaho, and we do not anticipate any information until after the deal has closed. We feel that given HDS's historical commitment to open source projects, that there will be little change to the open source nature of Pentaho.
Since our last review in 2012, Pentaho has added two additional distinct open source projects. Community Tools (CTools) delivers a wide range of extensions for the server platform including dashboarding. Sparkl is a project that allows developers to build new plugins to the Pentaho BI Suite.
Pentaho supports the Pentaho Marketplace which provides a place for developers to share their Pentaho plugins with Pentaho system administrators. This has allowed companies such as Meteorite BI to deploy their Saiku Analytics plugin (available in community and enterprise editions) to the Pentaho community.
This article focuses on the reporting components, primarily Pentaho Reporting, along with the reporting functionality included Pentaho BI Platform & Server and Pentaho Analysis Services.
- Changes Since Our Last Review
- General Impressions
- Pentaho Report Designer
- Strengths & Weaknesses
Changes Since Our Last Review
Since our last look in 2012 the Pentaho Report Designer (PRD) has seen steady progress, although the progress has slowed compared with the period between 2010 - 2012. Jaspersoft Studio has made more significant advancements during this period, but PRD has had more changes than BIRT.
Between the 4.1 version of 2012, and the 5.2 version of the Pentaho BI Suite in 2015, here are some of the changes we noticed:
- Performance improvements for report generation and report export
- Add CSS external style sheets, including default styles
- Change version control to GIT
- Added query re-use in sub reports
- Support opening multiple files in PRD
- Add thermometer chart, upgrade JFreeCharts version
- Improved some formatting options
- Support for bi direction (BIDI) text
- Pentaho Reporting components synchronized the version system to match the Pentaho BI Server version numbers
- Improvements to JNDI connections
- Improved integration with the CTools project
- The Pentaho Report Designer now makes it very easy to set up connectivity to a wide variety of data sources, putting it ahead of Jaspersoft and BIRT. It provides guidance on connecting to 47 databases via JDBC (see screenshot) including added support for MongoDB and Impala.
Figure 1 - Pentaho Connectivity
The open-source Pentaho BI Suite Community Edition (CE) includes the following components for reporting:
- Pentaho Report Designer – a WYSIWIG tool that lets you create reports using a graphical user interface, as opposed to creating reports programmatically or by directly creating and manipulating XML. These reports can then be run by the Pentaho Reporting Engine Classic or the Pentaho BI Server. The Pentaho Report Designer is a stand-alone, desktop-installed client tool, and is not available as an Eclipse or NetBeans plug-in.
- Pentaho Reporting Engine Classic – formerly known as “JFree Report”, this is a collection of Java classes and APIs that execute Pentaho’s XML-based reports. The Pentaho Reporting Engine runs report designs against data sources, and renders report output in HTML, PDF, Excel, and other output formats. You can embed the Pentaho Reporting Engine inside your Java applications. You don’t need the Pentaho Reporting Engine if you use the BI Server.
- Pentaho Data Integration (Kettle) - Kettle is a graphical data integration tool that allows developers to build Jobs and Transactions that can be used to Extract, Transform, and Load (ETL) data from a wide variety of sources.
- Pentaho Reporting SDK – The SDK is the Pentaho Reporting Engine Class, plus the documentation and supporting libraries that developers need to embed the Pentaho Engine Classic in their applications.
- Pentaho BI Server – the BI Server is a J2EE application that provides an infrastructure for multiple users to run reports and OLAP cubes through a web-based user interface. At the core of the BI Server are the Classic Engine and the Mondrian ROLAP Engine (which run the reports and OLAP cubes respectively), plus a host of server capabilities including authentication, user management, logging, email notification, server APIs, and report scheduling. The BI Server also provides the infrastructure for reports and analytic cubes to access data and metadata via the Pentaho Data Integration’s ETL functionality (Kettle).
- Pentaho User Console – end-users can login, browse reports, run them, view report results in HTML or PDF, and download report results in other formats. Users can also, for the time being, create basic ad-hoc reports, and conduct some OLAP analysis, however this functionality is likely to be removed in future releases.
- Pentaho Administrator Console – administrators and developers can deploy reports, manage users, set up security access privileges, and deploy workflows.
We found Pentaho to be the easiest to learn for creating basic listing reports and for grouped listing reports with aggregations. Its UI was not overly cluttered with sophisticated, less-commonly used functionality, and the tool was easy to learn and performed well. In short, the User Interface is attractive and functionality is sensibly laid out.
Pentaho’s stated goal is to provide a comprehensive solution for Data Integration and Business Analytics. This includes solutions for extract, transformation, and load (ETL), basic reporting, data analytics, data exploration, data visualization, and data mining. This broad company vision might explain why the reporting functionality is not as deep as BIRT or Jaspersoft. Functionality that report developers take for granted in BIRT and JasperReports – such as side-by-side report components, cross-tabs, and robust charting – are not as fully developed in Pentaho. Unfortunately this means when using only Pentaho reporting, it can be more difficult to create complex reports.
Pentaho Report Designer
The Pentaho Report Designer (PRD) is a WYSIWIG tool that lets you create reports using a graphical user interface, as opposed to creating reports by directly creating and manipulating XML. These reports can then be run by the Classic Engine or the Pentaho BI Server.
In general, our impression is that the usability and stability of PRD continues to improve. We feel that for simple to moderate reports PRD's usability is at part for most tasks with Jaspersoft Studio and BIRT.
Figure 3 - Pentaho Report Designer
PRD has much of the same functionality as the other report designers, as demonstrated by our Feature Comparison Grid. However, Pentaho Report Designer is different from the other report designers in a few major ways:
- Like Jaspersoft Studio, but different than BIRT, Pentaho uses “pixel positioning” as its central approach, and at its core it emphasizes paginated reports, as opposed to dynamic web layouts.
- Pentaho relies on sub-reports to support reports with multiple queries, multiple group sections, and multiple data sources. This is different than BIRT, but the same as JasperReports.
Things We Like
- Pentaho includes a nice, re-entrant report wizard for creating grouped listing reports. We found ourselves using it often to eliminate some of the tedious work involved in the initial layout of a report.
- PRD can use Pentaho Data Integration (Kettle) as a source of data. Kettle is a Swiss Army Knife data ETL tool that allows developer to acquire and process data using a combination of pre-built graphical components and code as required. PRD has the native ability to retrieve data from a particular step within any Kettle transformation. Using Kettle with PRD allows developers to specialize on either the data acquisition or presentation, which we feel is a significant advantage over both BIRT and Jasper.
Areas for Improvement
- Charting is more difficult and less full-featured with Pentaho than with either JasperReports or BIRT.
- Crosstabs continue to be marked as experimental. Review of the Pentaho JIRA system indicated that a substantial amount of time was spent on crosstabs, unfortunately crosstabs have not reach a fully functional state in the three years since our last review. This may be partially explained by the presence of different tools within the Pentaho stack that can duplicate crosstab functionality, meaning that it is not strictly required within reporting.
- The PRD property editor lags behind the property editors from both BIRT and Jaspersoft Studio. Both of those products provided dialog screens that provide guidance and structure to the property settings. PRD supports properties through a name/value pair list, that allows for sorting and hierarchy, our experience indicates that novice users find these lists more difficult to use than dialogs.
Figure 2 - Pentaho Property Editor
The Pentaho Report Designer is in the “pixel positioning” school of report design. Like Jasper (and unlike BIRT) users specify precisely where each report element is to be displayed. This gives users fine-grain control over the look of a report, but also limits the report’s ability to adapt to different-sized displays. For example, if you want a report to look good when printed on an 8.5”x11” sheet of paper, then the report will only be as wide as a sheet of paper even when displayed on a widescreen monitor with lots more horizontal screen real estate.
Like Jasper, Pentaho is very dependent on sub-reports. If you want to use multiple grouping sections, multiple data sources, have side-by-side report components, or re-use the results of a query within a different section of a report, you need to use sub-reports.
While sub-reports are great for re-using report pieces across many different reports, requiring sub-reports for the above use cases adds unnecessary difficulty and complexity to the report design process:
- You need to gracefully hand parameters and sometimes query data between the master report and sub-report (and sub-sub-report, etc).
- Report Developers need to manually manage the dependency between the master report and sub-report files.
- Too many sub-reports can result in poor performance [KM6]because each sub-report opens its own thread, and queries. So, for example, if you have a sub-report within a group section that expands into 70 different groups, then the sub-report will initialize and run 70 times.
- Sub-reports need to be precisely designed so that their size fits exactly into the space provided by the master report
The Pentaho Report Designer has the best built-in report design “wizard.” It’s great for getting beginning users started, allowing them to quickly create regular listing-style reports with up to four levels of grouping, skipping the tedious work of manually placing and formatting each individual report control. (see screenshots) An important thing about this wizard is that it is re-entrant, you can make changes by pulling up the wizard again.
Figure 3 - Pentaho Report Wizard
In Pentaho Report Designer, the process of creating a chart involves providing property values for a really large dialog box (see below screenshot). There is no wizard to take you through the process, unlike JasperReports or BIRT. We find it interesting that Pentaho provides such an excellent report wizard but no charting wizard, and BIRT provides a great chart wizard but no report wizard. Jaspersoft Studio has middle-of-the road wizards for both reports and charts.
This is sufficient to create many types of charts, but does not offer as many levers to customize the report’s contents, look, and behavior as the other tools. Setting up the chart’s category, series, and values is difficult, as most of the time you cannot use pick-lists to choose the fields to use, and instead you have to type them into a pop-up window that doesn’t provide any guidance as to what you are supposed to type. Although the chart dialog seems to imply that you can just change the chart type by just clicking on the appropriate button along the top, when you do so you often lose the values you already typed in for category, series, etc. This can make it very frustrating to use.
Adding to the usability issues, there is no chart preview – you can’t see the results of what you created until you leave the Chart Dialog and actually run the report.
Figure 4 - Pentaho Chart Designer
Strengths and Weaknesses
Below are some of the strengths and weaknesses of the Pentaho Report Designer, as compared to BIRT and Jaspersoft Studio report designers. Note that some of these strengths and weaknesses are really due to the behavior of the underlying Classic Engine, not the Pentaho Report Designer itself. However, we include them here because the report developer is most likely to encounter them.
- Pentaho’s Report Designer is easily the most visually attractive of the three design tools. Pentaho has clearly put a greater emphasis on user experience, usability, and look and feel than the other projects. When you use the Pentaho tools, they just feel nice.
- The Pentaho Report Designer has the best built-in report design wizard.
- With Pentaho, you can create “row-banded” reports, with alternating colors for each report row, by simply checking a box. Much easier than either BIRT or Jasper!
- Pentaho does not require that reports be compiled prior to running (unlike Jasper but like BIRT).
- Pentaho reports are in XML format, and thus can be effectively put under revision control.
- Crosstabs are still “experimental” and have been for five years now. Pentaho appears to have dedicated a considerable amount of time to the crosstab component, but it is still not fully functional, which could prove frustrating if you use a lot of crosstab evaluations. You need to enable experimental features (Edit-->Preferences-->General) to enable the feature in Pentaho Report Designer.
- It is difficult to create charts (see previous section). The process has usability issues and the Pentaho Report Designer does not offer as many levers to customize the report’s contents, look, and behavior as the other tools.
- Pentaho requires that the report query do the “heavy lifting” for grouping, filtering, sorting and aggregates. If the data does not arrive in the report in the proper way, Pentaho has less ability to further manipulate the data than BIRT. The report developer is responsible for ensuring that groups in the report design are in the same order as the data groups returned by the query.
- Pentaho does not support “newspaper layouts” with multiple columns (BIRT doesn’t, Jasper does), and does not yet support vertical text.
Over the past five years, Pentaho has shown the consistent improvement in its open source reporting tool. We believe that Pentaho is an excellent choice for reports that are of simple to moderate complexity and don’t require crosstabs or charts. We still believe that BIRT and Jaspersoft are better for more complex reports. Pentaho might also be a good choice if printed reports are an important requirement.
One of the most compelling reasons to go with Pentaho, however, is how Pentaho Reporting integrates with the rest of the Pentaho BI Suite. While we do not cover these other Pentaho products in this write-up, they are worth a look if you have requirements around data mining, data integration, OLAP analysis, or just need a report server to deploy and run reports via a web-based user interface.
Pentaho is a registered trademark of Pentaho, Inc.