I wanted to do this review much sooner but I've been too busy using the book.
Jos and Roland have taken the proven formula they used in Pentaho Solutions and focused it on ETL and Kettle, AKA Pentaho Data Integration. Their magic formula is to seamlessly mix a product users guide with equal parts of real world examples and best practices training. With the addition of Matt Casters, Mr Kettle himself, the depth of knowledge in the book is now equal to it's breadth. The result is a book that you can read cover to cover and learn about all aspects of building and deploying ETL solutions, and is equally useful as a day to day reference.
The book is divided into five parts starting with an obligatory Getting Started. Getting Started, however, goes beyond the traditional "here's how to install it guide" and presents a nice tutorial on the sometimes confusing terminology and practices used in the data world. It explains how Kettle fits into this world and talks about the key concepts in Kettle. The first part ends with an excellent example ETL solution to populate a non trivial yet easily understood star schema. The example covers fact and dimension tables, change data capture, generating date dimensions and the ETL jobs and transforms required to populate the data.
The organization of the second part of the book is based on the 34 subsystems of ETL as defined by Ralph Kimball in "The Data Warehouse Lifecycle Toolkit", considered by many (including me) as the bible of data warehousing. For each subsystem, Kettle Solutions refers to the original chapters that describe the topic and provides examples on how to solve those issues using Kettle. It is a must have for anyone struggling with the concepts presented in the Kimble book. For the rare cases that Kettle does not have a straight forward solution, the book points you to other open source software that can get the job done. The authors stay true to the task of helping the ETL developer solve real problems regardless of whether Kettle is the complete solution or not.
The first two parts take up about half the book and if the authors stopped there, it would be worthy of at least 4 stars. But like most software development, the base code (in this case, the jobs and transforms) are the easy part and usually the most fun. The real hard stuff comes when you have deploy your solution into the real world, keep it running, add new capability, explain it to others and be confident that it is actually working. Part three walks you through the ETL lifecycle with best practices and pitfalls by 3 people who do (or have done) this for a living. Everything is covered from development and testing through documentation, monitoring, migrating and auditing. Part four finishes what part 3 started by covering performance tuning and scaling with topics like clustering, partitioning, cloud and real time ETL.
The last part is my favorite since it covers the advanced stuff like writing Kettle plugins, complex data formats, integrating data from web services, dynamic ETL, embedding Kettle, etc. There are many ways to extend Kettle via defined APIs and Kettle Solutions covers them all.
As you can probably tell I like the book and I use it often. I have the luxury of being able to ask Matt questions when I run into trouble. After writing the book, he now answers "it's in the book" and needless to say, it is. I can honestly say, having this book sitting on your desk is better than having Matt sitting on your desk. Kettle Solutions is also available for Kindle which, much to my surprise, has proven very useful. I use it from my iPhone and Mac via Kindle app and despite some of the Kindle app limitations like cut and paste and a good search, it is always available as a reference. The links are live which is a bonus.
I'm a fifteen year veteran of building BI software, one of the original Pentaho developers and am currently the Pentaho community guy. I work with Matt Casters, I'm not professionally affiliated with Jos, Roland or Wiley and receive no benefit from this book beyond the satisfaction of having Pentaho software be so well represented. I do consider all three of them good personal friends and I provide this review with the risk that it may greatly inflate their heads.
Doug Moran
Pentaho